Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
On Shallow Planning Under Partial Observability
Authors: Randy Lefebvre, Audrey Durand
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now conduct experiments3 to highlight the relationships between the planning horizon, the partial observability, and the structural parameters of the underlying MDP. ... We explore the interaction of shallow planning and partial observability in the Cartpole-v1 environment (Towers et al. 2024). ... Fig. 4 displays the average reward obtained by running the 10 models associated with each (γ, Ļ) configuration on the 100 environment seeds. |
| Researcher Affiliation | Academia | Randy Lefebvre1, Audrey Durand1,2 1Universit e Laval 2Canada CIFAR AI Chair EMAIL, EMAIL |
| Pseudocode | No | The paper describes theoretical bounds and then presents numerical experiments and their results. It does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Finally, we provide the open-source code1 for all our experiments to ensure reproducibility and offer a framework that practitioners can modify to better understand the impact of partial observability on their specific applications. 1https://github.com/GRAAL-Research/shallow-planning-partial-observability |
| Open Datasets | Yes | We explore the interaction of shallow planning and partial observability in the Cartpole-v1 environment (Towers et al. 2024). |
| Dataset Splits | No | The paper describes generating 'Random MDPs' for one set of experiments and using the 'Cartpole-v1 environment' for another. For Cartpole, it states 'We train 10 agents for each combination of (γ, Ļ), resulting in 150 models, and evaluate each of these models on 100 unseen seeds.' This describes evaluation seeds, but not traditional training/validation/test splits of a dataset. |
| Hardware Specification | Yes | Experiments are conducted on a AMD Ryzen 5 3600 CPU and a GTX 1660 Ti GPU. |
| Software Dependencies | No | The paper mentions using PPO (Schulman et al. 2017) and refers to recommended hyperparameters from Raffin et al. (2021) which is associated with Stable-Baselines3, but specific software versions are not provided in the text. |
| Experiment Setup | Yes | We consider the widely used PPO (Schulman et al. 2017) agent policy with the recommended hyperparameters for this task (Raffin et al. 2021). We then consider different discount factors γ {0, 0.24, 0.49, 0.73, 0.98}, with the largest γ = 0.98 from the baseline (Raffin et al. 2021). For the partially observable component, we simulate noisy sensors, which are common in real life. These are simulated by injecting noise into the state. The noise is sampled from a multivariate normal distribution N(0, IĻ2) parameterized by a diagonal covariance matrix with value Ļ2 on the diagonal. We consider Ļ {0, 0.1, 1}. |