Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Learning Fused State Representations for Control from Multi-View Observations
Authors: Zeyu Wang, Yao-Hui Li, Xin Li, Hongyu Zang, Romain Laroche, Riashat Islam
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results demonstrate that our method outperforms existing approaches in MVRL tasks. Even in more realistic scenarios with interference or missing views, MFSC consistently maintains high performance. The project code is available at https://github.com/zpwdev/MFSC. |
| Researcher Affiliation | Collaboration | 1Beijing Institute of Technology, Beijing, China 2Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, China 3Meituan, Beijing, China 4Wayve, London, UK 5Mila Quebec AI Institute, Montreal, Canada. |
| Pseudocode | Yes | Our pseudocode in Algorithm 1 and Algorithm 2, where Algorithm 1 is based on Proximal Policy Optimization (PPO) and Algorithm 2 is based on Soft Actor-Critic (SAC). Furthermore, we illustrate the representation learning component of MFSC in Algorithm 3 using a Py Torch-like style for enhanced clarity and comprehension. |
| Open Source Code | Yes | The project code is available at https://github.com/zpwdev/MFSC. |
| Open Datasets | Yes | We evaluate our method on a set of 3D manipulation environments Meta-World (Yu et al., 2020) and a high degree of freedom 3D locomotion environment Py Bulletโs Ant (Coumans & Bai, 2022). Furthermore, we also evaluate MFSC on the widely used single-view continuous control benchmark DMControl (Tassa et al., 2018), comparing it with recent visual RL baselines, as well as in the more realistic multi-view highway driving scenario, CARLA (Dosovitskiy et al., 2017). |
| Dataset Splits | No | The paper describes randomized configurations for tasks and references external papers for experimental setups, but does not explicitly provide specific dataset split information (e.g., exact percentages or sample counts for training/validation/test sets) within its main text for reproduction. For Meta-World, it states: "Each task involves 50 randomized configurations, such as the initial pose of the robot, object locations, and target positions." For CARLA, it describes the environment and duration: "In this experiment, we select the official Town04 map...The agent's goal is to drive as safely as possible for a limited 1,000-time steps." |
| Hardware Specification | Yes | The experiments were conducted on a server equipped with two NVIDIA A800 GPUs (Ampere architecture) with 80GB of VRAM each, running on CUDA 12.3 and NVIDIA driver version 545.23.06. The server is powered by dual Intel Xeon Platinum 8370C CPUs, each featuring 32 cores and 64 threads, with a base clock speed of 2.80GHz and a maximum clock speed of 3.5GHz. The system has a total of 128 logical processors and a NUMA configuration with two nodes. |
| Software Dependencies | No | The paper mentions "CUDA 12.3" as part of the hardware setup, but it does not provide version numbers for other key software components such as Python or PyTorch, which are implied by the "Py Torch-like" pseudocode. |
| Experiment Setup | Yes | Table 2 and 3 provide detailed information regarding the experimental setup and hyperparameter configurations. In the Meta-World and Py Bullet environments, we utilize a latent representation size of 128... All networks in both the policy and the representation models are optimized using the Adam optimizer (Kingma, 2014)... |