Learning on One Mode: Addressing Multi-modality in Offline Reinforcement Learning
Authors: Mianchu Wang, Yue Jin, Giovanni Montana
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, LOM outperforms existing methods on standard D4RL benchmarks and demonstrates its effectiveness in complex, multi-modal scenarios. (...) 6 EXPERIMENTAL RESULTS |
| Researcher Affiliation | Academia | Mianchu Wang Yue Jin Giovanni Montana University of Warwick The Alan Turing Institute EMAIL |
| Pseudocode | Yes | Algorithm 1 Weighted imitation learning on one mode (LOM). |
| Open Source Code | Yes | The code has been open sourced 1. 1Git Hub repository: https://github.com/Mianchu Wang/LOM |
| Open Datasets | Yes | We evaluate LOM on three Mu Jo Co locomotion tasks from the D4RL benchmark (Fu et al., 2020): halfcheetah, hopper, and walker2d. |
| Dataset Splits | Yes | Each environment contains five dataset types: (i) medium 1M samples from a policy trained to approximately one-third of expert performance; (ii) medium-replay the replay buffer of a policy trained to match the performance of the medium agent (0.2M for halfcheetah, 0.4M for hopper, 0.3M for walker2d); (iii) medium-expert a 50-50 split of medium and expert data (just under 2M samples); (iv) expert 1M samples from a fully trained SAC policy (Haarnoja et al., 2018); and (v) full-replay 1M samples from the final replay buffer of an expert policy. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) are mentioned in the paper, only the type of tasks (MuJoCo locomotion) and general terms like "robot arm". |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) are provided in the paper. |
| Experiment Setup | Yes | Table 3: Hyperparameters used in the experiments. (includes M, β, C, ρ, update_delay, network architectures with dimensions for πρ, πθ, Qψ, Qϕ) |