Distributionally Robust Policy Learning under Concept Drifts

Authors: Jingyuan Wang, Zhimei Ren, Ruohan Zhan, Zhengyuan Zhou

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed methods are implemented and evaluated in numerical studies, demonstrating substantial improvement compared with existing benchmarks. ... We evaluated our learning algorithm in two settings: a simulated and a real-world dataset, against the benchmark algorithm SNLN in Si et al. (2023, Algorithm 2).
Researcher Affiliation Collaboration 1Stern School of Business, New York University 2Department of Statistics and Data Science, University of Pennsylvania 3IEDA, Hong Kong University of Science and Technology 4Arena Technologies. Correspondence to: Zhengyuan Zhou <EMAIL>.
Pseudocode Yes Algorithm 1 Policy estimation under concept drift ... Algorithm 2 Policy learning under concept drift
Open Source Code Yes A working example on the real-world dataset is given in https: //github.com/off-policy-learning/concept-drift-robust-learning.
Open Datasets Yes Real-world Dataset. We consider the dataset of a largescale randomized experiment comparing assistance programs offered to French unemployed individuals provided in Behaghel et al. (2014). The decision maker is trying to learn a personalized policy that decides whether to provide: (i) an intensive counseling program run by a public agency (A = 0); or (ii) a similar program run by private agencies (A = 1), to an unemployed individual. The reward Y is binary and indicates reemployment within six months. The processed dataset is provided in Kallus (2023).
Dataset Splits Yes We similarly generate 100 testing datasets Dtest of size 10,000. ... We first split D into K equally sized disjoint folds: D(k) for k [K], ... In our implementation, the number of splits is taken to be K = 3.
Hardware Specification Yes The experiments were run on the following cloud servers: (i) an Intel Xeon Platinum 8160 @ 2.1 GHz with 766GB RAM and 96 CPU x 2.1 GHz; (ii) an Intel Xeon Platinum 8160 @ 2.1 GHz with 1.5TB RAM and 96 CPU x 2.1 GHz; (iii) an Intel Xeon Gold 6132 @ 2.59 GHz with 768GB RAM and 56 CPU x 2.59 GHz and (iv) an Intel Xeon GPU E5-2697A v4 @ 2.59 GHz with 384GB RAM and 64 CPU x 2.59 GHz.
Software Dependencies No We use the Random Forest regressor from the scikit-learn Python library to estimate bπ0 and bg. For estimating θ , we adopt the cubic spline method and employ the Nelder-Mead optimization method in Sci Py Python library (Virtanen et al., 2020) to optimize the coefficients in the spline approximation, where the obtained estimator has threshold at 0.001 to guarantee Proposition 2.5. Finally, we optimize and find bπLN with policytree (Sverdrup et al., 2020). The specific versions for scikit-learn, SciPy, and policytree are not mentioned.
Experiment Setup Yes In our implementation, the number of splits is taken to be K = 3. We use the Random Forest regressor from the scikit-learn Python library to estimate bπ0 and bg. For estimating θ , we adopt the cubic spline method and employ the Nelder-Mead optimization method in Sci Py Python library (Virtanen et al., 2020) to optimize the coefficients in the spline approximation, where the obtained estimator has threshold at 0.001 to guarantee Proposition 2.5. Finally, we optimize and find bπLN with policytree (Sverdrup et al., 2020).