Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Approximate Value Iteration with Temporally Extended Actions
Authors: Timothy A. Mann, Shie Mannor, Doina Precup
JAIR 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results in three different domains demonstrate the key properties from the analysis. Our theoretical and experimental results demonstrate that options can play an important role in AVI by decreasing approximation error and inducing fast convergence. |
| Researcher Affiliation | Academia | Timothy A. Mann EMAIL Shie Mannor EMAIL Electrical Engineering The Technion Israel Institute of Technology, Haifa, Israel Doina Precup EMAIL School of Computer Science Mc Gill University, Montreal, QC, H3A2A7, Canada |
| Pseudocode | Yes | Algorithm 1 Options Fitted Value Iteration (OFVI) Algorithm 2 Landmark-based AVI Algorithm 3 DREX (Deterministic RElaXation) |
| Open Source Code | No | The paper mentions a link to a specific RDDL environment for one of the tasks (Cyclic Inventory Management) used in the experiments: "Mann, T. A. (2014). Cyclic Inventory Management (CIM). https://code.google.com/p/rddlsim/source/browse/trunk/files/ rddl2/examples/cim.rddl2. Accessed: 2015-06-29." However, this link is for the domain description, not the source code of the methodologies (OFVI, LAVI) described in the paper. There is no explicit statement or link provided for the release of the authors' own implementation code for their proposed algorithms. |
| Open Datasets | Yes | We created an inventory management problem where the agent restocks a warehouse with n = 8 different commodities (Mann, 2014). The Cyclic Inventory Management (CIM) problem is referenced with a specific URL: "Mann, T. A. (2014). Cyclic Inventory Management (CIM). https://code.google.com/p/rddlsim/source/browse/trunk/files/ rddl2/examples/cim.rddl2. Accessed: 2015-06-29." This provides concrete access to the domain definition. |
| Dataset Splits | No | The paper describes sampling states during each iteration of the algorithms (e.g., "At each iteration k = 1, 2, . . . , K, states xi µ for i = 1, 2, . . . , n are sampled"). However, it does not specify predefined training, test, or validation splits for any of the datasets or environments used in the experiments. The experiments focus on online sampling and performance over iterations rather than fixed dataset splits for model evaluation. |
| Hardware Specification | Yes | All experiments were implemented in Java and executed using Open JDK 1.7 on a desktop computer running Ubuntu 12.04 64-bit with an 8 core Intel Core i7-3370 CPU 3.40GHz and 8 gigabytes of memory. |
| Software Dependencies | Yes | All experiments were implemented in Java and executed using Open JDK 1.7 on a desktop computer running Ubuntu 12.04 64-bit with an 8 core Intel Core i7-3370 CPU 3.40GHz and 8 gigabytes of memory. |
| Experiment Setup | Yes | Optimal Replacement Task: We used parameter values γ = 0.6, β = 0.5, C = 30 and c(x) = 4x... we used polynomials to approximate the value function. All results presented here used fourth degree polynomials. ...single option that keeps the product up to a point x = x + and terminates once the state equals or exceeds x. Pinball: The discount factor was γ = 0.95. ...we added zero-mean Gaussian noise to the velocities with standard deviation 0.03. ...We chose α = 0.01 through experimentation. ...nearest neighbor approximation was both fast and able to capture the complexity of the value function. For LOFVI, we used one-nearest neighbor approximation and N = 1,000 states were sampled at each iteration. For PFVI, we averaged the value of states within a 0.1 radius of the queried state and N = 30,000 states were sampled at each iteration. Both PFVI and LOFVI used L = 5 samples for each state-option pair. ...grid sizes of 10x10, 12x12, and 14x14... The radius of the hypercube around landmarks was set to η = 0.03. Inventory Management Task: The discount factor was γ = 0.9. ...Radial Basis Function networks (RBFs) with a grid of 1-dimensional radial bases. ...value function approximation was implemented by 24 RBFs. The the number of bases per dimension was 25 and the basis widths were controlled by σ = 0.1. ...sampled n = 1000 states each iteration and sampled each option m = 20 times. ...20 temporally extended actions for each commodity. ...We set η = 0.05 * 500... and d+ = ∞. |