Js.Y
Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation