reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributionally Robust Policy Learning under Concept Drifts

Authors: Jingyuan Wang, Zhimei Ren, Ruohan Zhan, Zhengyuan Zhou

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed methods are implemented and evaluated in numerical studies, demonstrating substantial improvement compared with existing benchmarks. ... We evaluated our learning algorithm in two settings: a simulated and a real-world dataset, against the benchmark algorithm SNLN in Si et al. (2023, Algorithm 2).
Researcher Affiliation	Collaboration	1Stern School of Business, New York University 2Department of Statistics and Data Science, University of Pennsylvania 3IEDA, Hong Kong University of Science and Technology 4Arena Technologies. Correspondence to: Zhengyuan Zhou <EMAIL>.
Pseudocode	Yes	Algorithm 1 Policy estimation under concept drift ... Algorithm 2 Policy learning under concept drift
Open Source Code	Yes	A working example on the real-world dataset is given in https: //github.com/off-policy-learning/concept-drift-robust-learning.
Open Datasets	Yes	Real-world Dataset. We consider the dataset of a largescale randomized experiment comparing assistance programs offered to French unemployed individuals provided in Behaghel et al. (2014). The decision maker is trying to learn a personalized policy that decides whether to provide: (i) an intensive counseling program run by a public agency (A = 0); or (ii) a similar program run by private agencies (A = 1), to an unemployed individual. The reward Y is binary and indicates reemployment within six months. The processed dataset is provided in Kallus (2023).
Dataset Splits	Yes	We similarly generate 100 testing datasets Dtest of size 10,000. ... We first split D into K equally sized disjoint folds: D(k) for k [K], ... In our implementation, the number of splits is taken to be K = 3.
Hardware Specification	Yes	The experiments were run on the following cloud servers: (i) an Intel Xeon Platinum 8160 @ 2.1 GHz with 766GB RAM and 96 CPU x 2.1 GHz; (ii) an Intel Xeon Platinum 8160 @ 2.1 GHz with 1.5TB RAM and 96 CPU x 2.1 GHz; (iii) an Intel Xeon Gold 6132 @ 2.59 GHz with 768GB RAM and 56 CPU x 2.59 GHz and (iv) an Intel Xeon GPU E5-2697A v4 @ 2.59 GHz with 384GB RAM and 64 CPU x 2.59 GHz.
Software Dependencies	No	We use the Random Forest regressor from the scikit-learn Python library to estimate bπ0 and bg. For estimating θ , we adopt the cubic spline method and employ the Nelder-Mead optimization method in Sci Py Python library (Virtanen et al., 2020) to optimize the coefficients in the spline approximation, where the obtained estimator has threshold at 0.001 to guarantee Proposition 2.5. Finally, we optimize and find bπLN with policytree (Sverdrup et al., 2020). The specific versions for scikit-learn, SciPy, and policytree are not mentioned.
Experiment Setup	Yes	In our implementation, the number of splits is taken to be K = 3. We use the Random Forest regressor from the scikit-learn Python library to estimate bπ0 and bg. For estimating θ , we adopt the cubic spline method and employ the Nelder-Mead optimization method in Sci Py Python library (Virtanen et al., 2020) to optimize the coefficients in the spline approximation, where the obtained estimator has threshold at 0.001 to guarantee Proposition 2.5. Finally, we optimize and find bπLN with policytree (Sverdrup et al., 2020).