Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Privacy Preserving Reinforcement Learning for Population Processes
Authors: Samuel Yang-Zhao, Kee Siong Ng
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical findings are validated by experiments performed on a simulated epidemic control problem over large population sizes. We present empirical results that corroborate our theoretical findings on the SEIRS Epidemic Control problem detailed in 3. |
| Researcher Affiliation | Academia | Samuel Yang-Zhao EMAIL Australian National University Kee Siong Ng EMAIL Australian National University |
| Pseudocode | Yes | Algorithm 1 Differentially Private Reinforcement Learning Algorithm 2 Projected Laplace Mechanism Algorithm 3 ℓ2-projected laplace mechanism Algorithm 4 Differentially Private DQN (DP-DQN) |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | In each experiment the Epidemic Control problem is simulated over four large social networks from the Stanford Large Graph Network Dataset (Leskovec & Krevl, 2014). Table 2: Summary of network datasets used in experiments. Slashdot (Leskovec et al., 2009), Twitch (Rozemberczki & Sarkar, 2021), Gowalla (Cho et al., 2011), Youtube (Yang & Leskovec, 2012) |
| Dataset Splits | No | The paper describes a simulated environment and online interaction for the RL agent, rather than using predefined training/test/validation splits of a static dataset. It mentions that 'the sample size taken to be 90% of the population size' which refers to sampling individuals, not dataset splits for training and evaluation. |
| Hardware Specification | Yes | All experiments were performed on a shared server with a 32-Core Intel(R) Xeon(R) Gold 5218 CPU, 192 gigabytes of RAM. A single NVIDIA Ge Force RTX 3090 GPU was also used. |
| Software Dependencies | No | The paper mentions 'RMSProp optimizer in Py Torch' but does not specify a version number for PyTorch or any other software dependency. |
| Experiment Setup | Yes | Table 3: DP-DQN parameters used for every experiment. γ=0.999, T=2e5, B=128, D=800, ϵstart=0.9999, κ=10^-5. The neural network used in all experiments was a 6 layer, fully connected MLP. The learning rate (α) is not listed as we use the default settings of the RMSProp optimizer in Py Torch to optimize the neural network. exploration is performed using epsilon-greedy. |