Actor Prioritized Experience Replay

Authors: Baturay Saglam, Furkan B. Mutlu, Dogan C. Cicek, Suleyman S. Kozat

JAIR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An extensive set of experiments verifies our theoretical findings, showing that our method outperforms competing approaches and achieves state-of-the-art results over the standard off-policy actor-critic algorithms. We assess the performance of LA3P on challenging continuous control benchmarks from Open AI Gym (Brockman et al., 2016). Our results indicate that the introduced framework significantly outperforms the competing PER-correction algorithms by a wide margin, and achieves noteworthy gains over both PER and state-of-the-art methods in most of the domains tested. Furthermore, an extensive set of ablation studies verifies that each proposed modification to PER is essential to maintain the overall performance for actor-critic algorithms. All of our code and results are open-sourced and provided in the Git Hub repository1.
Researcher Affiliation Academia Baturay Saglam EMAIL Department of Electrical Engineering Yale University New Haven, CT, USA Furkan B. Mutlu EMAIL Dogan C. Cicek EMAIL Suleyman S. Kozat EMAIL Department of Electrical and Electronics Engineering Bilkent University Ankara, Turkey
Pseudocode Yes Algorithm 1 Actor-Critic with Loss-Adjusted Approximate Actor Prioritized Experience Replay (LA3P)
Open Source Code Yes All of our code and results are open-sourced and provided in the Git Hub repository1. ... 1https://github.com/baturaysaglam/LA3P
Open Datasets Yes We assess the performance of LA3P on challenging continuous control benchmarks from Open AI Gym (Brockman et al., 2016). Our results indicate that the introduced framework significantly outperforms the competing PER-correction algorithms by a wide margin, and achieves noteworthy gains over both PER and state-of-the-art methods in most of the domains tested. ... standard suite of Mu Jo Co (Todorov et al., 2012) and Box2D (Parberry, 2013) continuous control tasks interfaced by Open AI Gym.
Dataset Splits Yes Moreover, we add 25000 exploration time steps before the training to increase the data efficiency. ... Every 1000 steps, each method is evaluated in a distinct evaluation environment (training seed + constant) for ten episodes, where no exploration and learning are performed.
Hardware Specification Yes All experiments are conducted on a single Ge Force RTX 2070 SUPER GPU and an AMD Ryzen 7 3700X 8-Core Processor
Software Dependencies Yes All agents are assessed on continuous control benchmarks of the Mu Jo Co5 and Box2D6 physics engines, which are interfaced by Open AI Gym7 using v2 environments.
Experiment Setup Yes All networks feature two hidden layers having 256 hidden units, with Re LU activation functions after each. ... The Adam optimizer (Kingma & Ba, 2015) is used to train the networks, with a learning rate of 3 10 4 and a mini-batch size of 256. After each update step, the target networks are updated using polyak averaging with ζ = 0.005. ... we use α = 0.6 and β = 0.4 for PER. As LAP and PAL functions are employed in our algorithm, we directly use α = 0.4 and β = 0.4. ... LA3P uniform fraction λ = 0.5.