CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives
Authors: Armin Saghafian, Amirmohammad Izadi, Negin Hashemi Dijujin, Mahdieh Soleymani Baghshah
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The results of our experiments suggest superior sample efficiency and systematic generalization for this framework in multi-modal reinforcement learning problems. Our code base is available here. Our experiments on the Mini Grid and Baby AI environments Chevalier-Boisvert et al. (2018) showcase the idea's effectiveness in improving the systematic generalization and sample efficiency of instruction-following agents. |
| Researcher Affiliation | Academia | Armin Saghafian, Sharif University of Technology EMAIL Amirmohammad Izadi, Sharif University of Technology EMAIL Negin Hashemi Dijujin , Sharif University of Technology EMAIL Mahdieh Soleymani Baghshah, Sharif University of Technology EMAIL |
| Pseudocode | Yes | Algorithm 1 CAREL framework; Algorithm 2 Instruction Tracking (IT) framework |
| Open Source Code | Yes | The results of our experiments suggest superior sample efficiency and systematic generalization for this framework in multi-modal reinforcement learning problems. Our code base is available here. |
| Open Datasets | Yes | Our experiments on the Mini Grid and Baby AI environments Chevalier-Boisvert et al. (2018) showcase the idea's effectiveness in improving the systematic generalization and sample efficiency of instruction-following agents. We employ the Baby AI environment Chevalier-Boisvert et al. (2018), a lightweight but logically complex benchmark with procedurally generated difficulty levels, which enables in-depth exploration of grounded language learning in the goal-conditioned RL context. |
| Dataset Splits | No | The paper mentions evaluating on "unseen tasks" and reporting success rates, but does not provide specific percentages or counts for training/test/validation splits, nor does it explicitly reference predefined standard splits with detailed methodology. It states, "We report the agent's success rate (SR) over a set of unseen tasks at each Baby AI level, separated by pairs of color and type of target objects or specific orders of objects in the instruction." |
| Hardware Specification | Yes | For the experiments reported in this paper, we have used one NVIDIA 3090 GPU and one TITAN RTX GPU over two weeks. |
| Software Dependencies | No | The paper mentions using the PPO algorithm and Adam optimizer, as well as BERT's tokenizer, but does not specify version numbers for any software libraries or frameworks like Python, PyTorch, or TensorFlow. For example: "Its base model is trained using the PPO algorithm Schulman et al. (2017) and Adam optimizer with parameters β1 = 0.9 and β2 = 0.999." |
| Experiment Setup | Yes | The learning rate is 7e 4, and the batch size is 256. We set λC = 0.01 and the temperature τ = 1 as CAREL-specific hyperparameters. The actor-critic model from the SHELM model was also used as a baseline. We train the learnable parts of the model using the PPO algorithm and Adam optimizer with the same hyperparameters. The learning rate is 1e 4, and the batch size is set to 16. |