Paper Reading

논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파악한 자료이다 보니 없는 내용도 많습니다. 혹시 사용하실 경우 댓글 부탁드립니다.
PAPER On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention Scene text recognition (STR) is the task of recognizing character sequences in natural scenes. While there have been great advances in STR methods, current methods still fail to recognize texts in arbitrary shapes, such as heavily curved or rotated texts, arxiv.org GITHUB GitHub - clovaai/SATRN: Official Tensorflow Implementat..
PAPER What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis Many new proposals for scene text recognition (STR) models have been introduced in recent years. While each claim to have pushed the boundary of the technology, a holistic and fair comparison has been largely missing in the field due to the inconsistent ch arxiv.org BLOG What Is Wrong With Scene Text Re..
PAPER Chasing Ghosts: Instruction Following as Bayesian State Tracking A visually-grounded navigation instruction can be interpreted as a sequence of expected observations and actions an agent following the correct trajectory would encounter and perform. Based on this intuition, we formulate the problem of finding the goal lo arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파악한 자료이..
PAPER Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Following a navigation instruction such as 'Walk down the stairs and stop at the brown sofa' requires embodied AI agents to ground scene elements referenced via language (e.g. 'stairs') to visual content in the environment (pixels corresponding to 'stairs' arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위..
PAPER BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps Learning to follow instructions is of fundamental importance to autonomous agents for vision-and-language navigation (VLN). In this paper, we study how an agent can navigate long paths when learning from a corpus that consists of shorter ones. We show that arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵..
PAPER Vision-Dialog Navigation by Exploring Cross-modal Memory Vision-dialog navigation posed as a new holy-grail task in vision-language disciplinary targets at learning an agent endowed with the capability of constant conversation for help with natural language and navigating according to human responses. Besides th arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파악한 자료이다 보니 없는 ..
PAPER VALAN: Vision and Language Agent Navigation VALAN is a lightweight and scalable software framework for deep reinforcement learning based on the SEED RL architecture. The framework facilitates the development and evaluation of embodied agents for solving grounded language understanding tasks, such as arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파악한 자료이다 보니 없는 내용도 많습니다. 혹시 ..
PAPER Cross-Lingual Vision-Language Navigation Vision-Language Navigation (VLN) is the task where an agent is commanded to navigate in photo-realistic environments with natural language instructions. Previous research on VLN is primarily conducted on the Room-to-Room (R2R) dataset with only English ins arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파악한 자료이다 보니 없는 내용도 많습니다. 혹시 사용하..
PAPER Environment-agnostic Multitask Learning for Natural Language Grounded Navigation Recent research efforts enable study for natural language grounded navigation in photo-realistic environments, e.g., following natural language instructions or dialog. However, existing methods tend to overfit training data in seen environments and fail to arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의..
PAPER Multi-View Learning for Vision-and-Language Navigation Learning to navigate in a visual environment following natural language instructions is a challenging task because natural language instructions are highly variable, ambiguous, and under-specified. In this paper, we present a novel training paradigm, Learn arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파악한 자료이다 보니 없는 내용..
PAPER Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning Mobile agents that can leverage help from humans can potentially accomplish more complex tasks than they could entirely on their own. We develop "Help, Anna!" (HANNA), an interactive photo-realistic simulator in which an agent fulfills object-finding tasks arxiv.org 논..
PAPER Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training Learning to navigate in a visual environment following natural-language instructions is a challenging task, because the multimodal inputs to the agent are highly variable, and the training data on a new task is often limited. In this paper, we present the arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 ..
VLBERT PAPER VL-BERT: Pre-training of Generic Visual-Linguistic Representations We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short). VL-BERT adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguis arxiv.org UNITER PAPER UNITER: UNiversal Image-TExt Represent..
PAPER ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a multi-modal two-stream model, pro-cessing both visual and arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주..
PAPER A Behavioral Approach to Visual Navigation with Graph Localization Networks Inspired by research in psychology, we introduce a behavioral approach for visual navigation using topological maps. Our goal is to enable a robot to navigate from one location to another, relying only on its visual input and the topological map of the env arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위..