Hierarchical Decision Making Based on Structural Information Principles
Authors: Xianghua Zeng, Hao Peng, Dingli Su, Angsheng Li
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations on challenging benchmarks demonstrate that our framework significantly and consistently outperforms state-of-the-art baselines, improving the effectiveness, efficiency, and stability of policy learning by up to 32.70%, 64.86%, and 88.26%, respectively, as measured by average rewards, convergence timesteps, and standard deviations. |
| Researcher Affiliation | Academia | Xianghua Zeng EMAIL School of Computer Science and Engineering Beihang University Beijing, 100191, China; Hao Peng EMAIL School of Cyber Science and Technology Beihang University Beijing, 100191, China; Dingli Su EMAIL School of Computer Science and Engineering Beihang University Beijing, 100191, China; Angsheng Li EMAIL School of Computer Science and Engineering Beihang University Beijing, 100191, China |
| Pseudocode | Yes | Algorithm 1: The Edge Filtration Algorithm; Algorithm 2: The Undirected Optimization Algorithm; Algorithm 3: The Directed Graph Adjustion Algorithm; Algorithm 4: The Directed Optimization Algorithm; Algorithm 5: The Single-Agent Skill-Based Learning Method; Algorithm 6: The Multi-Agent Role-Based Learning Method |
| Open Source Code | No | A video demonstration of SISL across multiple episodes and tasks is available on Git Hub2. 2. https://selgroup.github.io/SIDM/. This refers to a video demonstration, not the source code for the methodology described in the paper. |
| Open Datasets | Yes | For offline state abstraction, we select several baseline algorithms from the visual Gridworld environment... for online state abstraction, we adopt state representation and data augmentation algorithms that have demonstrated superior performance in the DMControl Suite (Tunyasuvunakool et al., 2020)... We evaluate the skill-based learning method, SISL, in robotic control environments using the Mu Jo Co physics simulator (Todorov et al., 2012)... For multi-agent role-based learning, we evaluate the role-based method, SIRD, using the standard Centralized Training with Decentralized Execution (CTDE) benchmark in complex, high-control environments: the Star Craft II micromanagement (SMAC) suite (Samvelyan et al., 2019). |
| Dataset Splits | No | The paper does not explicitly provide specific training/test/validation dataset splits with percentages, sample counts, or clear references to predefined splits for all experiments. While it mentions environments and benchmarks, the detailed split information is not present in the main text. |
| Hardware Specification | Yes | All experiments are conducted on five Linux servers, each equipped with an NVIDIA RTX A6000 GPU and an Intel i9-10980XE CPU clocked at 3.00 GHz. |
| Software Dependencies | Yes | We implement the SISA mechanism using Python 3.8.15 and Py Torch 1.13.0, the SISL method using Python 3.9.1 and Py Torch 1.9.0, and the SIRD method using Python 3.5.2 and Py Torch 1.5.1. |
| Experiment Setup | Yes | In our proposed SIDM framework, the maximum heights of encoding trees are set to K = 2 for undirected optimization in state/action abstraction and K = 5 for directed optimization in skill discovery. For the SISA mechanism, we set a latent dimension of 2, a batch size of 2048, a learning rate of 0.003, and the Adam optimizer for training the DQN. We set a maximum episodic step of 1000, a batch size of 16, and a discount factor γ for the offline abstraction. In the online abstraction, we set a latent dimension of 50, a replay buffer size of 1e5, a batch size of 128, a discount factor of 0.99, and the Adam optimizer. We use the Soft Actor-Critic (SAC) algorithm (Haarnoja et al., 2018) as the underlying single-agent RL method, which is integrated with various state abstraction approaches. For the state graph, we set the number of vertices to twice the batch size and determine the number of edges according to Algorithm 1. For the SISL method, we adopt the standardized SAC algorithm within the corresponding state subspace to train the low-level option policy for each discovered option. For the high-level policy, we extract the abstract state with the highest probability within each state subspace to construct the set of termination states. We extend the SAC algorithm to the discrete termination set by inputting its continuous output into a Softmax layer. The resulting output is a probability distribution over the termination states. For all experiments, we use neural networks with 4 hidden layers, skip connections, and Re LU activations. During the training process, we use the Adam optimizer with a replay buffer size of 1e6, a mini-batch size of 256, a learning rate of 0.001, and a discount factor of 0.99. Similarly, for the state graph, we set the number of vertices to twice the batch size and determine the number of edges according to Algorithm 1. For the SIRD method, we share a trajectory encoding network with two fully connected layers and a GRU layer for each agent, followed by a linear network without hidden layers or activation functions, which serves as the role policy. The outputs of the role policies are fed into separate QMIX-style mixing networks (Rashid et al., 2020), each containing a 32-dimensional hidden layer with Re LU activation, to estimate global action values. For all SMAC experiments, the dimension of action representations is set to 20, the optimizer is set to RMSprop with a learning rate of 0.0005, and the discount factor is set to 0.99. For the action graph, we set the number of vertices to the number of enemies plus six general discrete actions, and the number of edges is automatically determined according to Algorithm 1. |