Js.Y
VL-BERT: Pre-training of Generic Visual-Linguistic Representations / UNITER: UNiversal Image-TExt Representation Learning