Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
C-Learning: Horizon-Aware Cumulative Accessibility Estimation
Authors: Panteha Naderian, Gabriel Loaiza-Ganem, Harry J. Braviner, Anthony L. Caterini, Jesse C. Cresswell, Tong Li, Animesh Garg
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on a set of multi-goal discrete and continuous control tasks. We show that our method outperforms state-of-the-art goal-reaching algorithms in success rate, sample complexity, and path optimality. |
| Researcher Affiliation | Collaboration | Panteha Naderian, Gabriel Loaiza-Ganem, Harry J. Braviner, Anthony L. Caterini, Jesse C. Cresswell & Tong Li Layer 6 AI EMAIL Animesh Garg University of Toronto, Vector Institute, Nvidia EMAIL |
| Pseudocode | Yes | Algorithm 1: Training C-learning Network |
| Open Source Code | Yes | Our code is available at https://github.com/layer6ai-labs/CAE |
| Open Datasets | Yes | 3. Fetch Pick And Place-v1 (Brockman et al., 2016) is a complex, higher-dimensional environment in which a robotic arm needs to pick up a block and move it to the goal location... 4. Hand Manipulate Pen Full-v0 (Brockman et al., 2016) is a realistic environment known the be a difficult goal-reaching problem... |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly provide details about a validation dataset split or percentages. |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers, such as Python versions or library versions. |
| Experiment Setup | Yes | For all methods, we train for 300 episodes, each one of maximal length 50 steps, we use a learning rate 10 3, a batch size of size 256, and train for 64 gradient steps per episode. We use a 0.1-greedy for the behavior policy. We use a neural network with two hidden layers of respective sizes 60 and 40 with Re LU activations. We use 15 fully random exploration episodes before we start training. We take p(s0) as uniform among non-hole states during training, and set it as a point mass at (1, 0) for testing. We set p(g) as uniform among states during training, and we evaluate at every goal during testing. For C-learning, we use κ = 3, and copy the target network every 10 steps. |