Test-Time Adaptation for Online Vision-Language Navigation with Feedback-based Reinforcement Learning
Authors: Sungjune Kim, Gyeongrok Oh, Heeju Ko, Daehyun Ji, Dongwook Lee, Byung-Jun Lee, Sujin Jang, Sangpil Kim
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments on challenging VLN benchmarks demonstrate the superior adaptability of FEEDTTA, even outperforming the stateof-the-art offline training methods in REVERIE benchmark with a single stream of learning. |
| Researcher Affiliation | Collaboration | 1Department of AI, Korea University, Seoul, S.Korea 2Samsung AI Center, DS Division, Suwon, S.Korea. |
| Pseudocode | Yes | Algorithm 1 Online Learning Process of FEEDTTA |
| Open Source Code | No | The paper mentions re-implementing a baseline method (FSTTA) due to issues with its official code, and provides a link to the issue tracker for that third-party code. However, there is no explicit statement or link provided for the source code of the authors' own proposed method, FEEDTTA. |
| Open Datasets | Yes | We empirically demonstrate the effectiveness of the proposed method through extensive experiments on REVERIE (Qi et al., 2020), R2R (Anderson et al., 2018), and R2R-CE (Krantz et al., 2020) benchmark. |
| Dataset Splits | Yes | We empirically demonstrate the effectiveness of the proposed method through extensive experiments on REVERIE (Qi et al., 2020), R2R (Anderson et al., 2018), and R2R-CE (Krantz et al., 2020) benchmark. Specifically, for the REVERIE dataset, the results in the paper are obtained with p = 0.01 and α = 0.2 for the validation seen split, and p = 0.05 and α = 0.2 for the validation unseen split. For R2R and R2R-CE, we use p = 0.05 and α = 0.1 for both splits. |
| Hardware Specification | Yes | Lastly, all experiments are conducted on a single NVIDIA Tesla A100 GPU. |
| Software Dependencies | No | The paper mentions using 'GPT-4 model' as an LLM oracle, but does not specify any software libraries with version numbers (e.g., PyTorch, TensorFlow, Python, CUDA versions). |
| Experiment Setup | Yes | We use a batch size of 1 to properly simulate the online environment. Then, we search the best-performing values for the reversion rate p and the reversion magnitude α within {0.01, 0.05, 0.1, 0.2, 0.3} and {-0.01, -0.025, -0.05, -0.075, -0.1, -0.2, 0.3}, respectively. For the REVERIE dataset, the results in the paper are obtained with p = 0.01 and α = 0.2 for the validation seen split, and p = 0.05 and α = 0.2 for the validation unseen split. For R2R and R2R-CE, we use p = 0.05 and α = 0.1 for both splits. The learning rate η is set as 5e-6. All other hyperparameters adhere to the default configuration of the target policy. |