Test-Time Adaptation for Online Vision-Language Navigation with Feedback-based Reinforcement Learning

Authors: Sungjune Kim, Gyeongrok Oh, Heeju Ko, Daehyun Ji, Dongwook Lee, Byung-Jun Lee, Sujin Jang, Sangpil Kim

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments on challenging VLN benchmarks demonstrate the superior adaptability of FEEDTTA, even outperforming the stateof-the-art offline training methods in REVERIE benchmark with a single stream of learning.
Researcher Affiliation Collaboration 1Department of AI, Korea University, Seoul, S.Korea 2Samsung AI Center, DS Division, Suwon, S.Korea.
Pseudocode Yes Algorithm 1 Online Learning Process of FEEDTTA
Open Source Code No The paper mentions re-implementing a baseline method (FSTTA) due to issues with its official code, and provides a link to the issue tracker for that third-party code. However, there is no explicit statement or link provided for the source code of the authors' own proposed method, FEEDTTA.
Open Datasets Yes We empirically demonstrate the effectiveness of the proposed method through extensive experiments on REVERIE (Qi et al., 2020), R2R (Anderson et al., 2018), and R2R-CE (Krantz et al., 2020) benchmark.
Dataset Splits Yes We empirically demonstrate the effectiveness of the proposed method through extensive experiments on REVERIE (Qi et al., 2020), R2R (Anderson et al., 2018), and R2R-CE (Krantz et al., 2020) benchmark. Specifically, for the REVERIE dataset, the results in the paper are obtained with p = 0.01 and α = 0.2 for the validation seen split, and p = 0.05 and α = 0.2 for the validation unseen split. For R2R and R2R-CE, we use p = 0.05 and α = 0.1 for both splits.
Hardware Specification Yes Lastly, all experiments are conducted on a single NVIDIA Tesla A100 GPU.
Software Dependencies No The paper mentions using 'GPT-4 model' as an LLM oracle, but does not specify any software libraries with version numbers (e.g., PyTorch, TensorFlow, Python, CUDA versions).
Experiment Setup Yes We use a batch size of 1 to properly simulate the online environment. Then, we search the best-performing values for the reversion rate p and the reversion magnitude α within {0.01, 0.05, 0.1, 0.2, 0.3} and {-0.01, -0.025, -0.05, -0.075, -0.1, -0.2, 0.3}, respectively. For the REVERIE dataset, the results in the paper are obtained with p = 0.01 and α = 0.2 for the validation seen split, and p = 0.05 and α = 0.2 for the validation unseen split. For R2R and R2R-CE, we use p = 0.05 and α = 0.1 for both splits. The learning rate η is set as 5e-6. All other hyperparameters adhere to the default configuration of the target policy.