CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives

Authors: Armin Saghafian, Amirmohammad Izadi, Negin Hashemi Dijujin, Mahdieh Soleymani Baghshah

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The results of our experiments suggest superior sample efficiency and systematic generalization for this framework in multi-modal reinforcement learning problems. Our code base is available here. Our experiments on the Mini Grid and Baby AI environments Chevalier-Boisvert et al. (2018) showcase the idea's effectiveness in improving the systematic generalization and sample efficiency of instruction-following agents.
Researcher Affiliation Academia Armin Saghafian, Sharif University of Technology EMAIL Amirmohammad Izadi, Sharif University of Technology EMAIL Negin Hashemi Dijujin , Sharif University of Technology EMAIL Mahdieh Soleymani Baghshah, Sharif University of Technology EMAIL
Pseudocode Yes Algorithm 1 CAREL framework; Algorithm 2 Instruction Tracking (IT) framework
Open Source Code Yes The results of our experiments suggest superior sample efficiency and systematic generalization for this framework in multi-modal reinforcement learning problems. Our code base is available here.
Open Datasets Yes Our experiments on the Mini Grid and Baby AI environments Chevalier-Boisvert et al. (2018) showcase the idea's effectiveness in improving the systematic generalization and sample efficiency of instruction-following agents. We employ the Baby AI environment Chevalier-Boisvert et al. (2018), a lightweight but logically complex benchmark with procedurally generated difficulty levels, which enables in-depth exploration of grounded language learning in the goal-conditioned RL context.
Dataset Splits No The paper mentions evaluating on "unseen tasks" and reporting success rates, but does not provide specific percentages or counts for training/test/validation splits, nor does it explicitly reference predefined standard splits with detailed methodology. It states, "We report the agent's success rate (SR) over a set of unseen tasks at each Baby AI level, separated by pairs of color and type of target objects or specific orders of objects in the instruction."
Hardware Specification Yes For the experiments reported in this paper, we have used one NVIDIA 3090 GPU and one TITAN RTX GPU over two weeks.
Software Dependencies No The paper mentions using the PPO algorithm and Adam optimizer, as well as BERT's tokenizer, but does not specify version numbers for any software libraries or frameworks like Python, PyTorch, or TensorFlow. For example: "Its base model is trained using the PPO algorithm Schulman et al. (2017) and Adam optimizer with parameters β1 = 0.9 and β2 = 0.999."
Experiment Setup Yes The learning rate is 7e 4, and the batch size is 256. We set λC = 0.01 and the temperature τ = 1 as CAREL-specific hyperparameters. The actor-critic model from the SHELM model was also used as a baseline. We train the learnable parts of the model using the PPO algorithm and Adam optimizer with the same hyperparameters. The learning rate is 1e 4, and the batch size is set to 16.