Js.Y
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training