reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Actor Prioritized Experience Replay

Authors: Baturay Saglam, Furkan B. Mutlu, Dogan C. Cicek, Suleyman S. Kozat

JAIR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	An extensive set of experiments verifies our theoretical findings, showing that our method outperforms competing approaches and achieves state-of-the-art results over the standard off-policy actor-critic algorithms. We assess the performance of LA3P on challenging continuous control benchmarks from Open AI Gym (Brockman et al., 2016). Our results indicate that the introduced framework significantly outperforms the competing PER-correction algorithms by a wide margin, and achieves noteworthy gains over both PER and state-of-the-art methods in most of the domains tested. Furthermore, an extensive set of ablation studies verifies that each proposed modification to PER is essential to maintain the overall performance for actor-critic algorithms. All of our code and results are open-sourced and provided in the Git Hub repository1.
Researcher Affiliation	Academia	Baturay Saglam EMAIL Department of Electrical Engineering Yale University New Haven, CT, USA Furkan B. Mutlu EMAIL Dogan C. Cicek EMAIL Suleyman S. Kozat EMAIL Department of Electrical and Electronics Engineering Bilkent University Ankara, Turkey
Pseudocode	Yes	Algorithm 1 Actor-Critic with Loss-Adjusted Approximate Actor Prioritized Experience Replay (LA3P)
Open Source Code	Yes	All of our code and results are open-sourced and provided in the Git Hub repository1. ... 1https://github.com/baturaysaglam/LA3P
Open Datasets	Yes	We assess the performance of LA3P on challenging continuous control benchmarks from Open AI Gym (Brockman et al., 2016). Our results indicate that the introduced framework significantly outperforms the competing PER-correction algorithms by a wide margin, and achieves noteworthy gains over both PER and state-of-the-art methods in most of the domains tested. ... standard suite of Mu Jo Co (Todorov et al., 2012) and Box2D (Parberry, 2013) continuous control tasks interfaced by Open AI Gym.
Dataset Splits	Yes	Moreover, we add 25000 exploration time steps before the training to increase the data efficiency. ... Every 1000 steps, each method is evaluated in a distinct evaluation environment (training seed + constant) for ten episodes, where no exploration and learning are performed.
Hardware Specification	Yes	All experiments are conducted on a single Ge Force RTX 2070 SUPER GPU and an AMD Ryzen 7 3700X 8-Core Processor
Software Dependencies	Yes	All agents are assessed on continuous control benchmarks of the Mu Jo Co5 and Box2D6 physics engines, which are interfaced by Open AI Gym7 using v2 environments.
Experiment Setup	Yes	All networks feature two hidden layers having 256 hidden units, with Re LU activation functions after each. ... The Adam optimizer (Kingma & Ba, 2015) is used to train the networks, with a learning rate of 3 10 4 and a mini-batch size of 256. After each update step, the target networks are updated using polyak averaging with ζ = 0.005. ... we use α = 0.6 and β = 0.4 for PER. As LAP and PAL functions are employed in our algorithm, we directly use α = 0.4 and β = 0.4. ... LA3P uniform fraction λ = 0.5.