reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adaptive Estimation and Learning under Temporal Distribution Shift

Authors: Dheeraj Baby, Yifei Tang, Hieu Duy Nguyen, Yu-Xiang Wang, Rohit Pyati

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we validate our findings on synthetic and real-world data. 6.1. Experiments on Synthetic Data In this section, we report the results obtained from simulation studies. 6.2. Experiments on Real Data As an application of our proposed methods, we conduct a model selection experiment using real-world data. We evaluate our method on data from the Dubai Land Department (Land Sales) following the setup identical to that of [2].
Researcher Affiliation	Collaboration	1Amazon 2University of California San Diego. Correspondence to: Dheeraj Baby <EMAIL>.
Pseudocode	Yes	Algorithm 1 Wavelet-Denoising Algorithm 1: Input: data yn, . . . , y1 Rd, Wavelet Transform matrix W , soft-threshold λ, failure probability δ. 2: Initialize y [yn, y1, . . . , y1]T Rn. 3: Compute empirical wavelet coefficients β W y. 4: Compute denoised coefficients ˆβ Tλ( β), where for an x R, Tλ(x) := sign(x) max{\|x\| λ, 0} is the soft-thresholding operator. When acted upon a vector, the soft-thresholding is performed coordinate-wise. 5: Reconstruct (a.k.a inverse wavelet transform) the signal by ˆθ W T ˆβ. 6: Return the last coordinate of ˆθ.
Open Source Code	No	The paper does not explicitly state that code is provided, nor does it include a link to a code repository or mention code in supplementary materials.
Open Datasets	Yes	6.2. Experiments on Real Data As an application of our proposed methods, we conduct a model selection experiment using real-world data. We evaluate our method on data from the Dubai Land Department (Land Sales) following the setup identical to that of [2]. The dataset includes apartment sales from January 2008 to December 2023 (192 months). [2] Land Sales. Dld transactions open data. https://www.dubaipulse.gov.ae/data/dld-transactions/dld_ transactions-open. Accessed: 2025-05-19.
Dataset Splits	Yes	Data is randomly split into 20% test, with two train-validation splits: (a) 79% 1% and (b) 75% 5%.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only mentions experiments were conducted across trials.
Software Dependencies	No	The paper mentions "Random Forest (Breiman, 2001) and XGBoost (Chen and Guestrin, 2016)" but does not specify software versions for these or any other libraries or programming languages.
Experiment Setup	Yes	Experimental methodology. First a ground-truth signal is generated. We considered two types of ground truth signal as shown in Fig.4. The failure probability parameter for all algorithms is set to be 0.1. For the wavelet-based algorithms, an estimate of the standard deviation is formed based on the Median Absolute Deviation (MAD) of the wavelet coefficients at the highest resolution similar to as done in Donoho et al. (1998). For each month t, we train Random Forest (Breiman, 2001) and XGBoost (Chen and Guestrin, 2016) models using a window of past data where we consider window sizes w [1, 4, 16, 62, 256], yielding 10 models per month.