reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Reward Design for Markov Decision Processes

Authors: Shuo Wu, Haoxiang Ma, Jie Fu, Shuo Han

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments on multiple test cases demonstrate that our solution improves robustness compared to the standard approach without incurring significant additional computing costs.
Researcher Affiliation	Academia	SHUO WU, University of Illinois Chicago, USA HAOXIANG MA, University of Florida, USA JIE FU, University of Florida, USA SHUO HAN, University of Illinois Chicago, USA
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks. It refers to MILP formulations and discusses algorithms conceptually.
Open Source Code	Yes	Code for the numerical experiments is provided at the URL in Footnote1. 1https://github.com/fribuilder/robust-reward-design
Open Datasets	No	The paper uses custom-designed environments: a 6x6 grid world, a 10x10 grid world, and a probabilistic attack graph. It does not provide access information (links, DOIs, or citations to public repositories) for these or any other datasets.
Dataset Splits	No	The paper describes simulation environments (grid worlds, attack graph) and their initial conditions, but it does not specify any training, testing, or validation dataset splits, as these are not traditional datasets in the context of the experiments conducted.
Hardware Specification	Yes	All numerical experiments were performed on a Macbook Air laptop computer with an Apple M2 processor and 8 GB RAM running mac OS Sonoma 14.3.1.
Software Dependencies	Yes	The interior-point solutions and MILP solutions in different environments are computed using the Python MIP package with Gurobi 11.0.0.
Experiment Setup	Yes	The allocated resource at each allocated reward must be nonnegative, and the total budget of allocation is 4. ... The parameter τ in (12) reflects the level of rationality of the attacker... We then computed vε(xMILP) and vε(xIP) under different values of ε by solving problem (23). ... The bisection was initialized with lower and upper bounds of 0 and C, respectively, where C is the total reward budget.