'Paper Reading' 카테고리의 글 목록 (2 Page)

On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention

2021.09.01· Paper Reading/Scene Text Recognition(OCR)

PAPER On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention Scene text recognition (STR) is the task of recognizing character sequences in natural scenes. While there have been great advances in STR methods, current methods still fail to recognize texts in arbitrary shapes, such as heavily curved or rotated texts, arxiv.org GITHUB GitHub - clovaai/SATRN: Official Tensorflow Implementat..

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis

2021.09.01· Paper Reading/Scene Text Recognition(OCR)

PAPER What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis Many new proposals for scene text recognition (STR) models have been introduced in recent years. While each claim to have pushed the boundary of the technology, a holistic and fair comparison has been largely missing in the field due to the inconsistent ch arxiv.org BLOG What Is Wrong With Scene Text Re..

Chasing Ghosts: Instruction Following as Bayesian State Tracking

2020.08.18· Paper Reading/Vision and Language Navigation(VLN)

PAPER Chasing Ghosts: Instruction Following as Bayesian State Tracking A visually-grounded navigation instruction can be interpreted as a sequence of expected observations and actions an agent following the correct trajectory would encounter and perform. Based on this intuition, we formulate the problem of finding the goal lo arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파악한 자료이..

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

2020.08.18· Paper Reading/Vision and Language Navigation(VLN)

PAPER Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Following a navigation instruction such as 'Walk down the stairs and stop at the brown sofa' requires embodied AI agents to ground scene elements referenced via language (e.g. 'stairs') to visual content in the environment (pixels corresponding to 'stairs' arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위..

BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby steps

2020.08.18· Paper Reading/Vision and Language Navigation(VLN)

PAPER BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps Learning to follow instructions is of fundamental importance to autonomous agents for vision-and-language navigation (VLN). In this paper, we study how an agent can navigate long paths when learning from a corpus that consists of shorter ones. We show that arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵..

Vision-Dialog Navigation by Exploring Cross-modal Memory

2020.08.18· Paper Reading/Vision and Language Navigation(VLN)

PAPER Vision-Dialog Navigation by Exploring Cross-modal Memory Vision-dialog navigation posed as a new holy-grail task in vision-language disciplinary targets at learning an agent endowed with the capability of constant conversation for help with natural language and navigating according to human responses. Besides th arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파악한 자료이다 보니 없는 ..

VALAN: Vision and Language Agent Navigation

2020.08.18· Paper Reading/Vision and Language Navigation(VLN)

PAPER VALAN: Vision and Language Agent Navigation VALAN is a lightweight and scalable software framework for deep reinforcement learning based on the SEED RL architecture. The framework facilitates the development and evaluation of embodied agents for solving grounded language understanding tasks, such as arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파악한 자료이다 보니 없는 내용도 많습니다. 혹시 ..

Cross-Lingual Vision-Language Navigation

2020.08.18· Paper Reading/Vision and Language Navigation(VLN)

PAPER Cross-Lingual Vision-Language Navigation Vision-Language Navigation (VLN) is the task where an agent is commanded to navigate in photo-realistic environments with natural language instructions. Previous research on VLN is primarily conducted on the Room-to-Room (R2R) dataset with only English ins arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파악한 자료이다 보니 없는 내용도 많습니다. 혹시 사용하..

Environment-agnostic Multitask Learning for Natural Language Grounded Navigation

2020.08.18· Paper Reading/Vision and Language Navigation(VLN)

PAPER Environment-agnostic Multitask Learning for Natural Language Grounded Navigation Recent research efforts enable study for natural language grounded navigation in photo-realistic environments, e.g., following natural language instructions or dialog. However, existing methods tend to overfit training data in seen environments and fail to arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의..

Multi-View Learning for Vision-and-Language Navigation

2020.08.18· Paper Reading/Vision and Language Navigation(VLN)

PAPER Multi-View Learning for Vision-and-Language Navigation Learning to navigate in a visual environment following natural language instructions is a challenging task because natural language instructions are highly variable, ambiguous, and under-specified. In this paper, we present a novel training paradigm, Learn arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위주로만 파악한 자료이다 보니 없는 내용..

Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning

2020.08.18· Paper Reading/Vision and Language Navigation(VLN)

PAPER Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning Mobile agents that can leverage help from humans can potentially accomplish more complex tasks than they could entirely on their own. We develop "Help, Anna!" (HANNA), an interactive photo-realistic simulator in which an agent fulfills object-finding tasks arxiv.org 논..

Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training

2020.08.18· Paper Reading/Vision and Language Navigation(VLN)

PAPER Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training Learning to navigate in a visual environment following natural-language instructions is a challenging task, because the multimodal inputs to the agent are highly variable, and the training data on a new task is often limited. In this paper, we present the arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 ..

VL-BERT: Pre-training of Generic Visual-Linguistic Representations / UNITER: UNiversal Image-TExt Representation Learning

2020.08.18· Paper Reading/Transformer based Embedding Model

VLBERT PAPER VL-BERT: Pre-training of Generic Visual-Linguistic Representations We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short). VL-BERT adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguis arxiv.org UNITER PAPER UNITER: UNiversal Image-TExt Represent..

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

2020.08.18· Paper Reading/Transformer based Embedding Model

PAPER ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a multi-modal two-stream model, pro-cessing both visual and arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주..

A Behavioral Approach to Visual Navigation with Graph Localization Networks

2020.08.18· Paper Reading/Vision and Language Navigation(VLN)

PAPER A Behavioral Approach to Visual Navigation with Graph Localization Networks Inspired by research in psychology, we introduce a behavioral approach for visual navigation using topological maps. Our goal is to enable a robot to navigate from one location to another, relying only on its visual input and the topological map of the env arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주세요. 얕은 지식으로 모델의 핵심 위..

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Paper Reading

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역