Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
An Attentive Approach for Building Partial Reasoning Agents from Pixels
Authors: Safa Alver, Doina Precup
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our quantitative analyses show that the proposed approach allows for effective generalization in high-dimensional domains with raw observational inputs. We also perform ablation analyses to validate our design choices. Finally, we demonstrate through qualitative analyses that our approach actually allows for building agents that focus their reasoning on the relevant aspects of the environment. ... Performance curves in Fig. 3 consistently show that, the PR agent displays a better generalization performance than the MZ agent... |
| Researcher Affiliation | Collaboration | Safa Alver EMAIL Mila, Mc Gill University. Doina Precup EMAIL Mila, Mc Gill University and Google Deep Mind |
| Pseudocode | Yes | Algorithm 1: The pseudocode of the slot attention module (Locatello et al., 2020). Algorithm 2: The pseudocode of the PI soft attention mechanism. Algorithm 3: The pseudocode of the PI top-k semi-hard attention mechanism (Ke et al., 2018; Zhao et al., 2021). |
| Open Source Code | Yes | The PR agent is a partial reasoning agent that was built by our proposed approach in Sec. 3. This agent is implemented by using the MCTS simulation framework of Niu et al. (2024).3 More specifically, we have built upon the available Mu Zero implementation by replacing the internals of the Representation Network class in lzero/model/common.py with the internals of the aspect identification and PI filtration modules (see App. A.1 & A.2). 3See https://github.com/opendilab/LightZero for the publicly available code. |
| Open Datasets | Yes | We perform our experiments on the (i) Mini Grid (Chevalier-Boisvert et al., 2018) and (ii) Procgen domains (Cobbe et al., 2020). |
| Dataset Splits | Yes | In Mini Grid and Procgen experiments, the training sets consist of 16 and 500 randomly-sampled game levels, respectively. In both experiments, the test sets consist of all of the possible game levels. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using the 'MCTS simulation framework of Niu et al. (2024)' and refers to a 'Mu Zero implementation' and a specific file 'lzero/model/common.py', but it does not specify version numbers for any software components, programming languages, or libraries. |
| Experiment Setup | Yes | Table 1: The hyperparameters of the slot attention module. Parameters Values Batch size 256 Resolution 64 Number of slots (n) 8 Number of iterations (T) 3 Warmup steps 1e4. Table 2: The hyperparameters of the PR agent. Parameters Values Weight of policy loss 1 Weight of value loss 1 Weight of reward loss 1 Number of MCTS simulations 10 (Mini Grid) 25 (Procgen) Reanalyze ratio 0 Number of frames stacked 1 Number of frames skip 1 (Mini Grid) 4 (Procgen) Length of game segment 400 Replay buffer size (in transitions) 1e6 TD steps 5 Number of unroll steps 5 Batch size 256 Model update ratio 0.25 Reward clipping True Optimizer type Adam Learning rate 1e-4 Discount factor 0.99 Frequency of target network update 100 Weight decay coefficient 1e-4 Max gradient norm 10 Discrete action encoding type True Priority exponent coefficient 0.6 Priority correction coefficient 0.4 Dirichlet noise weight 0.25. |