reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Optimistic Algorithm for online CMDPS with Anytime Adversarial Constraints

Authors: Jiahui Zhu, Kihyun Yu, Dabeen Lee, Xin Liu, Honghao Wei

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our algorithm in a synthetic and finite-horizon CMDP environment constructed to assess performance under both stochastic and adversarial cost settings. We plot the cumulative constraint violation across learning episodes(K 3000), where both the stochastic and adversarial curves clearly demonstrate the algorithm s ability to ensure sublinear violation growth.
Researcher Affiliation	Academia	1School of Electrical Engineering & Computer Science, Pullman, USA 2Department of Industrial & Systems Engineering, KAIST, Daejeon, South Korea 3School of Information Science & Technology, Shanghai Tech University, Shanghai, China.
Pseudocode	Yes	Algorithm 1 OMDPD
Open Source Code	No	The paper does not provide any explicit statement or link regarding the availability of source code for the described methodology.
Open Datasets	No	The paper uses a "synthetic and finite-horizon CMDP environment" for simulation. It does not mention using or providing access to any publicly available dataset.
Dataset Splits	No	The paper describes the parameters of a synthetic CMDP environment used for simulation (e.g., state space, action space, horizon, reward/cost sampling), but it does not specify any training/test/validation dataset splits, as it's a simulated environment rather than a fixed dataset.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory specifications) used for running the simulations or experiments.
Software Dependencies	No	The paper does not specify any software dependencies, libraries, or solvers with version numbers.
Experiment Setup	Yes	We evaluate our algorithm in a synthetic and finite-horizon CMDP environment constructed to assess performance under both stochastic and adversarial cost settings. The CMDP consists of a state space S t0, 1, 2, 3, 4u with five discrete states and an action space A t0, 1, 2u with three available actions. The decision process unfolds over a fixed horizon of H 5 steps. At each time step, the agent receives a reward r P r0, 1s HˆSˆA sampled uniformly from the unit interval. In the stochastic setting, the cost c P r 1, 1s HˆSˆA is also drawn uniformly and held fixed across episodes. In contrast, the adversarial setting introduces a discrete cost perturbation mechanism: in each episode, the cost is independently sampled from a finite set t 1.0, 0.6, 0.2, 0.0, 0.2, 0.6, 1.0u, simulating abrupt shifts in constraint feedback. The transition dynamics are time-dependent, where each transition distribution Ph,s,a is independently sampled from a Dirichlet distribution with a concentration parameter α 0.5. The initial state is sampled uniformly, ensuring that each trajectory starts from a randomly selected state. Throughout all experiments, the cumulative cost constraint threshold is 0. We plot the cumulative constraint violation across learning episodes(K 3000).