Scaling Vision-and-Language Navigation With Offline RL
Authors: Valay Bundele, Mahesh Bhupati, Biplab Banerjee, Aditya Grover
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that the proposed reward-conditioned approach leads to significant performance improvements, even in complex and intricate environments.1 |
| Researcher Affiliation | Academia | Valay Bundele EMAIL University of Tübingen Mahesh Bhupati EMAIL Indian Institute of Technology Bombay Biplab Banerjee EMAIL Indian Institute of Technology Bombay Aditya Grover EMAIL University of California, Los Angeles |
| Pseudocode | Yes | Algorithm 1 describes reward-token conditioning in detail. Algorithm 1: Reward token conditioning Training Phase: Input: Instruction I, Visual features Vt, State token qt 1, Ground-truth action at, Current state st, Next state st+1, Goal location G Output: Trained policy model M parameters |
| Open Source Code | Yes | Code and datasets available at https://github.com/Valaybundele/Reward C-VLN-ORL |
| Open Datasets | Yes | We will open source our datasets for wider use by the community. We have created two versions of each dataset D which we generated by rolling out HAMT: 1) D-R2R, generated using train set of R2R, and 2) D-Rx R, generated using train set of Rx R. |
| Dataset Splits | Yes | The R2R dataset has 14,025 instructions in the train set and 4,173 instructions in the test set. The validation set is further divided into val-seen and val-unseen, having 1,020 and 2,349 instructions respectively. We use English subset of Rx R which includes 26,464 path-instruction pairs in train set, 2,939 pairs in the val-seen set and 4,551 pairs in the val-unseen set. |
| Hardware Specification | Yes | The experiments were performed on a NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions using Adam optimizer and ResNet-152, but does not specify software versions for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We used Adam optimizer with a learning rate of 1e-5 to train the models. The batch size was kept as 64 and the models were trained for 500K iterations. We trained all the models from scratch in the offline RL setup. |