Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

A Direct Approximation of AIXI Using Logical State Abstractions

Authors: Samuel Yang-Zhao, Tianyu Wang, Kee Siong Ng

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on controlling epidemics on large-scale contact networks validates the agent s performance.
Researcher Affiliation Academia Samuel Yang-Zhao Australian National University Canberra ACT 2601 EMAIL Tianyu Wang Australian National University Canberra ACT 2601 EMAIL Kee Siong Ng Australian National University Canberra ACT 2601 EMAIL
Pseudocode No The paper describes algorithms and processes but does not include any pseudocode or algorithm blocks.
Open Source Code No Did you include the code, data, and instructions needed to reproduce the main experi- mental results (either in the supplemental material or as a URL)? [No]
Open Datasets Yes We use an email network dataset as the underlying contact network, licensed under a Creative Commons Attribution-Share Alike License, containing 1133 nodes and 5451 edges [44, 45].
Dataset Splits No The paper does not explicitly specify dataset splits (e.g., training, validation, test percentages or counts) or cross-validation methods.
Hardware Specification Yes All experiments were performed on a 12-Core AMD Ryzen Threadripper 1920x processor and 32 gigabytes of memory.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks).
Experiment Setup Yes The transition model, observation model, Action_Cost(at) are parametrised the same way across all experiments (see Table 1 in Appendix B). A Quarantine(i) action imparts a cost of 1 per node that is quarantined at the given time step. A Vaccinate(i, j) action imparts a lower cost of 0.5 per node. The parameters λ, 1, 2 are varied across experiments. We generate a set of 1489 predicate functions... The Φ-AIXI-CTW agent is trained in an online fashion. The agent explores with probability t at each step t until t < 0.03, where the agent performs in an -greedy way with exploration rate 0.03. RF-BDD was performed with a threshold value of 0.9 across all rewards.