reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Risk‑Seeking Reinforcement Learning via Multi‑Timescale EVaR Optimization

Authors: Deep Kumar Ganguly, Ajin George Joseph, Sarthak Girotra, Sirish Sekhar

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We analyze the asymptotic behavior of our proposed algorithm and rigorously evaluate it across various discrete and continuous benchmark environments. The results highlight that the EVa R policy achieves higher cumulative returns and corroborate that EVa R is indeed a competitive risk-seeking objective for RL. We evaluate our method on both discrete and continuous-control benchmarks. For each environment, we report environment-specific indicators including mean return, tail-risk metrics, dispersion across random seeds, and learning-curve behaviour. We also conduct selective ablation studies on stepsize and perturbation schedules to isolate their effects.
Researcher Affiliation	Academia	Deep Ganguly EMAIL Department of Computer Science and Engineering Indian Institute of Technology Tirupati Sarthak Girotra EMAIL Department of Computer Science and Engineering Indian Institute of Technology Tirupati Sirish Sekhar EMAIL Department of Computer Science and Engineering Indian Institute of Technology Tirupati Ajin George Joseph EMAIL Department of Computer Science and Engineering Indian Institute of Technology Tirupati
Pseudocode	Yes	Algorithm 1 Multi-timescale EVa R optimization Algorithm 2 EVa R Optimization using Disciplied Convex Cone
Open Source Code	No	The paper mentions third-party open-source libraries like Simglucose, Open AI Gym, Riskfolio-lib, and Stable Baselines 3 that were used. It also states: "Complete implementation details, hyperparameters, and reproducibility artefacts are provided in D and C." and Appendix D states: "The supplementary material provided includes all the experiments with their obtained values, which are reported here in a visual format." However, there is no explicit statement of the authors releasing the source code for their own methodology described in this paper, nor is a direct link to their code repository provided.
Open Datasets	Yes	We consider the Open AI Gym environments Inverted-Double-Pendulum/v4 and Swimmer/v4 from the Mu Jo Co framework (Tassa et al., 2018) and Mountain-Car-Continuous/v0 from the Box2D Gym framework (Towers et al., 2023). We demonstrate our algorithm s ability to manage high-risk insulin administration for Type-1 Diabetes Mellitus (T1DM) using the Simglucose simulator (Xie, 2018). The portfolio optimization problem seeks an optimal portfolio allocation among N assets by maximizing the EVa R of the portfolio returns R, which captures the upside tail of the return distribution. Here, policy represents the action chosen, which includes sell, buy, or hold. Constraints are kept to ensure that the portfolio weights wi are nonnegative and sum to one, representing a fully invested portfolio. For our portfolio (top 10 stocks of DJIA).
Dataset Splits	No	Across 5,000 evaluation episodes and 200 distinct obstacle layouts we observe a markedly heavy tailed distribution. All methods operate in a fully controlled tabular setting with identical finite-horizon MDPs, tabular state action value tables initialized to zero, ϵ-greedy exploration (ϵ 0.1), discount factor γ 0.99, and fixed learning rate of 0.1. By limiting all algorithms to 500 episodes per seed (truncated at 200 steps) and averaging over eight independent random seeds, we ensure that any performance differential arises exclusively from the choice of risk criterion and its estimator, rather than from architectural capacity or extensive hyperparameter tuning.
Hardware Specification	Yes	The experiments were conducted on NVIDIA DGX A100, having an AMD EPYC 7742 64-core processor operating at 1.5 GHz 3.39 GHz, with GDDR5 32 GB RAM, NVIDIA A100-SXM4-40 GB GPU at 1.41 GHz, and memory clocked at 1.21 GHz.
Software Dependencies	Yes	The operating CUDA version for Py Torch 1.13.1 is 11.6 for Python version 3.10.13.
Experiment Setup	Yes	Complete implementation details, hyperparameters, and reproducibility artefacts are provided in D and C. Table 5: Hyperparameters for Finite Difference Gradient Estimation (lists Timeout, Iterations, Learning Rate Decay, Learning Rate Power, Perturbation Size, Perturbation Decay, Perturbation Power, Momentum, Beta (Adam Parameter), Epsilon (Adam Parameter)). Table 6: Hyperparameters used in experiments (lists Learning rate, Constant A, Constant c, Random noise parameter δ, Action sampling, Step size δ, Step size ξ). Table 9: Hyperparameters used for training (lists Learning rate, Buffer size, Batch size, Gamma, Train frequency, Gradient steps, Entropy coefficient (ent coef), Target entropy, Tau, Policy kwargs).