분류 전체보기

VLBERT PAPER VL-BERT: Pre-training of Generic Visual-Linguistic Representations We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short). VL-BERT adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguis arxiv.org UNITER PAPER UNITER: UNiversal Image-TExt Represent..
PAPER ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a multi-modal two-stream model, pro-cessing both visual and arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주..
PAPER A Behavioral Approach to Visual Navigation with Graph Localization Networks Inspired by research in psychology, we introduce a behavioral approach for visual navigation using topological maps. Our goal is to enable a robot to navigate from one location to another, relying only on its visual input and the topological map of the env arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위..
PAPER Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation Vision-and-Language Navigation (VLN) is a challenging task in which an agent needs to follow a language-specified path to reach a target destination. In this paper, we strive for the creation of an agent able to tackle three key issues: multi-modality, lon arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주..
PAPER Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation Advances in learning and representations have reinvigorated work that connects language to other modalities. A particularly exciting direction is Vision-and-Language Navigation(VLN), in which agents interpret natural language instructions and visual scenes arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만..
PAPER General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping In instruction conditioned navigation, agents interpret natural language and their surroundings to navigate through an environment. Datasets for studying this task typically contain pairs of these instructions and reference trajectories. Yet, most evaluati arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로..
PAPER Robust Navigation with Language Pretraining and Stochastic Sampling Core to the vision-and-language navigation (VLN) challenge is building robust instruction representations and action decoding schemes, which can generalize well to previously unseen instructions and environments. In this paper, we report two simple but hig arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파악한 ..
PAPER Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks Vision-Language Navigation (VLN) is a task where agents learn to navigate following natural language instructions. The key to this task is to perceive both the visual scene and natural language sequentially. Conventional approaches exploit the vision and l arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로..
PAPER Transferable Representation Learning in Vision-and-Language Navigation Vision-and-Language Navigation (VLN) tasks such as Room-to-Room (R2R) require machine agents to interpret natural language instructions and learn to act in visually realistic environments to achieve navigation goals. The overall task requires competence in arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파..
PAPER Multi-modal Discriminative Model for Vision-and-Language Navigation Vision-and-Language Navigation (VLN) is a natural language grounding task where agents have to interpret natural language instructions in the context of visual scenes in a dynamic environment to achieve prescribed navigation goals. Successful agents must h arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파악한 ..
PAPER Vision-based Navigation with Language-based Assistance via Imitation Learning with Indirect Intervention We present Vision-based Navigation with Language-based Assistance (VNLA), a grounded vision-language task where an agent with visual perception is guided via language to find objects in photorealistic indoor environments. The task emulates a real-world sce arxiv.org 논문을 깊게 읽고 만든 자료가 아니므..
PAPER Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout A grand goal in AI is to build a robot that can accurately navigate based on natural language instructions, which requires the agent to perceive the scene, understand and ground language, and act in the real-world environment. One key challenge here is to arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로..
PAPER Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the Room-to-Room (R2R) Vision-and-Language navigation challenge of Anderson et. al. (2018). Given a arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모..
PAPER The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making. Specifically, the Vision and Language Navigation (VLN) task involves navigating to a goal purely f arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위..
PAPER 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파악한 자료이다 보니 없는 내용도 많습니다. 혹시 사용하실 경우 댓글 부탁드립니다.
Js.Y
'분류 전체보기' 카테고리의 글 목록 (20 Page)