Learning Dynamics under Environmental Constraints via Measurement-Induced Bundle Structures
Authors: Dongzhe Zheng, Wenjie Mei
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive simulations demonstrate significant improvements in both learning efficiency and constraint satisfaction over traditional methods, especially under limited and uncertain sensing conditions. We design a comprehensive experimental framework to evaluate our proposed method against state-of-the-art approaches, focusing on three interconnected research directions: Learning-based safety control, geometric structure learning, and safe control under uncertainty. The experiments are constructed to highlight key methodological differences while ensuring fair comparison through standardized implementations and evaluation protocols. Table 1 presents the quantitative results. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China 2School of Automation and Key Laboratory of MCCSE of Ministry of Education, Southeast University, Nanjing, China. Correspondence to: Wenjie Mei <EMAIL; EMAIL>. |
| Pseudocode | No | The paper describes the learning framework and algorithms using mathematical equations and descriptive text, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our implementation is publicly available at https://github.com/Continuum Coder/Measurement-Induc ed-Bundle-for-Learning-Dynamics/. |
| Open Datasets | No | The experimental tasks are conducted in a simulation environment built on the Genesis physics engine. While Appendix D mentions studying |
| Dataset Splits | No | The experiments involve generating scenarios randomly for each trial within a simulation environment, rather than using a fixed, pre-split dataset. For example: "For all three tasks, the workspace is configured as a 2m 2m 2m arena with randomly placed obstacles. The obstacles positions are sampled uniformly within the workspace..." and "We test the worm robot (500 trials), Franka arm (400 trials), and quadrotor (300 trials) under various task scenarios." While Appendix C.7 mentions a "20% validation split for monitoring training progress" for the neural networks, this pertains to the internal training of the models within the RL framework, not the partitioning of a fixed dataset for the primary experimental evaluation. |
| Hardware Specification | Yes | All networks are trained with Adam optimizer using mixed precision training on an NVIDIA RTX 3090 GPU. Our experiments are conducted on a workstation equipped with an Intel Xeon CPU, NVIDIA RTX 3090 GPU (24GB GDDR6X), and 64GB DDR4 RAM. |
| Software Dependencies | Yes | All experiments are implemented in Python using Py Torch... The software stack consists of Python 3.9 and Py Torch 1.12.0, supported by CUDA 11.7 and cu DNN 8.5 for GPU acceleration. |
| Experiment Setup | Yes | Our SAC implementation follows the standard architecture with carefully tuned hyperparameters. The framework uses a discount factor γ of 0.99 and a soft update coefficient τ of 0.005. The target entropy is set to the negative dimension of the action space, following common practice. All policy components (actor, critic, and entropy networks) use a learning rate of 3 10 4. The replay buffer maintains 1 106 transitions... The neural network architecture consists of three hidden layers (128-64-32 units) with Re LU activations throughout. Layer normalization is applied after each hidden layer to stabilize training. For barrier functions, we add a tanh activation in the output layer to ensure the boundedness of safety certificates... The training process employs the Adam optimizer with β1 = 0.9 and β2 = 0.999, coupled with a cosine annealing learning rate schedule starting at 5 10 4. We use a batch size of 256 to fully utilize the GPU memory while maintaining stable gradients. Gradient clipping with a maximum norm of 1.0 prevents extreme parameter updates. Early stopping with a patience of 20 epochs prevents overfitting, and we maintain a 20% validation split for monitoring training progress. To ensure reproducibility, we fix random seeds to 42 across all experiments... |