Actor Prioritized Experience Replay
Authors: Baturay Saglam, Furkan B. Mutlu, Dogan C. Cicek, Suleyman S. Kozat
JAIR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | An extensive set of experiments verifies our theoretical findings, showing that our method outperforms competing approaches and achieves state-of-the-art results over the standard off-policy actor-critic algorithms. We assess the performance of LA3P on challenging continuous control benchmarks from Open AI Gym (Brockman et al., 2016). Our results indicate that the introduced framework significantly outperforms the competing PER-correction algorithms by a wide margin, and achieves noteworthy gains over both PER and state-of-the-art methods in most of the domains tested. Furthermore, an extensive set of ablation studies verifies that each proposed modification to PER is essential to maintain the overall performance for actor-critic algorithms. All of our code and results are open-sourced and provided in the Git Hub repository1. |
| Researcher Affiliation | Academia | Baturay Saglam EMAIL Department of Electrical Engineering Yale University New Haven, CT, USA Furkan B. Mutlu EMAIL Dogan C. Cicek EMAIL Suleyman S. Kozat EMAIL Department of Electrical and Electronics Engineering Bilkent University Ankara, Turkey |
| Pseudocode | Yes | Algorithm 1 Actor-Critic with Loss-Adjusted Approximate Actor Prioritized Experience Replay (LA3P) |
| Open Source Code | Yes | All of our code and results are open-sourced and provided in the Git Hub repository1. ... 1https://github.com/baturaysaglam/LA3P |
| Open Datasets | Yes | We assess the performance of LA3P on challenging continuous control benchmarks from Open AI Gym (Brockman et al., 2016). Our results indicate that the introduced framework significantly outperforms the competing PER-correction algorithms by a wide margin, and achieves noteworthy gains over both PER and state-of-the-art methods in most of the domains tested. ... standard suite of Mu Jo Co (Todorov et al., 2012) and Box2D (Parberry, 2013) continuous control tasks interfaced by Open AI Gym. |
| Dataset Splits | Yes | Moreover, we add 25000 exploration time steps before the training to increase the data efficiency. ... Every 1000 steps, each method is evaluated in a distinct evaluation environment (training seed + constant) for ten episodes, where no exploration and learning are performed. |
| Hardware Specification | Yes | All experiments are conducted on a single Ge Force RTX 2070 SUPER GPU and an AMD Ryzen 7 3700X 8-Core Processor |
| Software Dependencies | Yes | All agents are assessed on continuous control benchmarks of the Mu Jo Co5 and Box2D6 physics engines, which are interfaced by Open AI Gym7 using v2 environments. |
| Experiment Setup | Yes | All networks feature two hidden layers having 256 hidden units, with Re LU activation functions after each. ... The Adam optimizer (Kingma & Ba, 2015) is used to train the networks, with a learning rate of 3 10 4 and a mini-batch size of 256. After each update step, the target networks are updated using polyak averaging with ζ = 0.005. ... we use α = 0.6 and β = 0.4 for PER. As LAP and PAL functions are employed in our algorithm, we directly use α = 0.4 and β = 0.4. ... LA3P uniform fraction λ = 0.5. |