PAPER Cross-Lingual Vision-Language Navigation Vision-Language Navigation (VLN) is the task where an agent is commanded to navigate in photo-realistic environments with natural language instructions. Previous research on VLN is primarily conducted on the Room-to-Room (R2R) dataset with only English ins arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파악한 자료이다 보니 없는 내용도 많습니다. 혹시 사용하..

전체 글
PAPER Environment-agnostic Multitask Learning for Natural Language Grounded Navigation Recent research efforts enable study for natural language grounded navigation in photo-realistic environments, e.g., following natural language instructions or dialog. However, existing methods tend to overfit training data in seen environments and fail to arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의..
PAPER Multi-View Learning for Vision-and-Language Navigation Learning to navigate in a visual environment following natural language instructions is a challenging task because natural language instructions are highly variable, ambiguous, and under-specified. In this paper, we present a novel training paradigm, Learn arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파악한 자료이다 보니 없는 내용..
PAPER Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning Mobile agents that can leverage help from humans can potentially accomplish more complex tasks than they could entirely on their own. We develop "Help, Anna!" (HANNA), an interactive photo-realistic simulator in which an agent fulfills object-finding tasks arxiv.org 논..
PAPER Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training Learning to navigate in a visual environment following natural-language instructions is a challenging task, because the multimodal inputs to the agent are highly variable, and the training data on a new task is often limited. In this paper, we present the arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 ..
VLBERT PAPER VL-BERT: Pre-training of Generic Visual-Linguistic Representations We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short). VL-BERT adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguis arxiv.org UNITER PAPER UNITER: UNiversal Image-TExt Represent..
PAPER ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a multi-modal two-stream model, pro-cessing both visual and arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주..
PAPER A Behavioral Approach to Visual Navigation with Graph Localization Networks Inspired by research in psychology, we introduce a behavioral approach for visual navigation using topological maps. Our goal is to enable a robot to navigate from one location to another, relying only on its visual input and the topological map of the env arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위..
PAPER Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation Vision-and-Language Navigation (VLN) is a challenging task in which an agent needs to follow a language-specified path to reach a target destination. In this paper, we strive for the creation of an agent able to tackle three key issues: multi-modality, lon arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주..
PAPER Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation Advances in learning and representations have reinvigorated work that connects language to other modalities. A particularly exciting direction is Vision-and-Language Navigation(VLN), in which agents interpret natural language instructions and visual scenes arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만..