Learning on One Mode: Addressing Multi-modality in Offline Reinforcement Learning

Authors: Mianchu Wang, Yue Jin, Giovanni Montana

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, LOM outperforms existing methods on standard D4RL benchmarks and demonstrates its effectiveness in complex, multi-modal scenarios. (...) 6 EXPERIMENTAL RESULTS
Researcher Affiliation Academia Mianchu Wang Yue Jin Giovanni Montana University of Warwick The Alan Turing Institute EMAIL
Pseudocode Yes Algorithm 1 Weighted imitation learning on one mode (LOM).
Open Source Code Yes The code has been open sourced 1. 1Git Hub repository: https://github.com/Mianchu Wang/LOM
Open Datasets Yes We evaluate LOM on three Mu Jo Co locomotion tasks from the D4RL benchmark (Fu et al., 2020): halfcheetah, hopper, and walker2d.
Dataset Splits Yes Each environment contains five dataset types: (i) medium 1M samples from a policy trained to approximately one-third of expert performance; (ii) medium-replay the replay buffer of a policy trained to match the performance of the medium agent (0.2M for halfcheetah, 0.4M for hopper, 0.3M for walker2d); (iii) medium-expert a 50-50 split of medium and expert data (just under 2M samples); (iv) expert 1M samples from a fully trained SAC policy (Haarnoja et al., 2018); and (v) full-replay 1M samples from the final replay buffer of an expert policy.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) are mentioned in the paper, only the type of tasks (MuJoCo locomotion) and general terms like "robot arm".
Software Dependencies No No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) are provided in the paper.
Experiment Setup Yes Table 3: Hyperparameters used in the experiments. (includes M, β, C, ρ, update_delay, network architectures with dimensions for πρ, πθ, Qψ, Qϕ)