reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scaling Vision-and-Language Navigation With Offline RL

Authors: Valay Bundele, Mahesh Bhupati, Biplab Banerjee, Aditya Grover

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that the proposed reward-conditioned approach leads to significant performance improvements, even in complex and intricate environments.1
Researcher Affiliation	Academia	Valay Bundele EMAIL University of Tübingen Mahesh Bhupati EMAIL Indian Institute of Technology Bombay Biplab Banerjee EMAIL Indian Institute of Technology Bombay Aditya Grover EMAIL University of California, Los Angeles
Pseudocode	Yes	Algorithm 1 describes reward-token conditioning in detail. Algorithm 1: Reward token conditioning Training Phase: Input: Instruction I, Visual features Vt, State token qt 1, Ground-truth action at, Current state st, Next state st+1, Goal location G Output: Trained policy model M parameters
Open Source Code	Yes	Code and datasets available at https://github.com/Valaybundele/Reward C-VLN-ORL
Open Datasets	Yes	We will open source our datasets for wider use by the community. We have created two versions of each dataset D which we generated by rolling out HAMT: 1) D-R2R, generated using train set of R2R, and 2) D-Rx R, generated using train set of Rx R.
Dataset Splits	Yes	The R2R dataset has 14,025 instructions in the train set and 4,173 instructions in the test set. The validation set is further divided into val-seen and val-unseen, having 1,020 and 2,349 instructions respectively. We use English subset of Rx R which includes 26,464 path-instruction pairs in train set, 2,939 pairs in the val-seen set and 4,551 pairs in the val-unseen set.
Hardware Specification	Yes	The experiments were performed on a NVIDIA A100 GPU.
Software Dependencies	No	The paper mentions using Adam optimizer and ResNet-152, but does not specify software versions for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We used Adam optimizer with a learning rate of 1e-5 to train the models. The batch size was kept as 64 and the models were trained for 500K iterations. We trained all the models from scratch in the offline RL setup.