Sequence based approach LayoutLM, LAMBERT와 같이 2D positional embedding을 input token에 추가한 BERT기반 모델 접근법 단점) 데이터셋 의존도가 높고, 컴퓨팅 파워도 요구함 Graph based approach BERTGrid, Chargrid, 각 문서를 Graph화 하고, 텍스트나 텍스트라인을 노드화, 관련 RoI visual 정보나 positional 정보까지 노드화 하여 그래프를 구성할 수 있고, GCN이나 attention network을 통해 각 이웃 노드간의 관계를 학습할 수 있음 단점) sequence based approach보다 성능이 다소 떨어짐 BERT와 같은 강력한 token embedding이 불가능함 (BERTGri..
Paper Reading/Transformer based Embedding Model
LayoutLMv3 to pre-train multimodal Transformers for Document AI with unified text and image masking LayoutLMv3 is pre-trained with a word-patch alignment objective to learn cross-modal alignment by predicting whether the corresponding image patch of a text word is masked. Contribution LayoutLMv3 : first multimodal model that doesn’t rely on a pre-trained CNN or Faster R-CNN backbone → save param..
PAPER DocFormer: End-to-End Transformer for Document Understanding We present DocFormer -- a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU). VDU is a challenging problem which aims to understand documents in their varied formats (forms, receipts etc.) and layouts. In additio arxiv.org GitHub GitHub - shabie/docformer: Implementation of DocFormer: E..
VLBERT PAPER VL-BERT: Pre-training of Generic Visual-Linguistic Representations We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short). VL-BERT adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguis arxiv.org UNITER PAPER UNITER: UNiversal Image-TExt Represent..
PAPER ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a multi-modal two-stream model, pro-cessing both visual and arxiv.org 논문을 깊게 읽고 만든 자료가 아니므로, 참고만 해주..