Adaptive Estimation and Learning under Temporal Distribution Shift
Authors: Dheeraj Baby, Yifei Tang, Hieu Duy Nguyen, Yu-Xiang Wang, Rohit Pyati
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we validate our findings on synthetic and real-world data. 6.1. Experiments on Synthetic Data In this section, we report the results obtained from simulation studies. 6.2. Experiments on Real Data As an application of our proposed methods, we conduct a model selection experiment using real-world data. We evaluate our method on data from the Dubai Land Department (Land Sales) following the setup identical to that of [2]. |
| Researcher Affiliation | Collaboration | 1Amazon 2University of California San Diego. Correspondence to: Dheeraj Baby <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Wavelet-Denoising Algorithm 1: Input: data yn, . . . , y1 Rd, Wavelet Transform matrix W , soft-threshold λ, failure probability δ. 2: Initialize y [yn, y1, . . . , y1]T Rn. 3: Compute empirical wavelet coefficients β W y. 4: Compute denoised coefficients ˆβ Tλ( β), where for an x R, Tλ(x) := sign(x) max{|x| λ, 0} is the soft-thresholding operator. When acted upon a vector, the soft-thresholding is performed coordinate-wise. 5: Reconstruct (a.k.a inverse wavelet transform) the signal by ˆθ W T ˆβ. 6: Return the last coordinate of ˆθ. |
| Open Source Code | No | The paper does not explicitly state that code is provided, nor does it include a link to a code repository or mention code in supplementary materials. |
| Open Datasets | Yes | 6.2. Experiments on Real Data As an application of our proposed methods, we conduct a model selection experiment using real-world data. We evaluate our method on data from the Dubai Land Department (Land Sales) following the setup identical to that of [2]. The dataset includes apartment sales from January 2008 to December 2023 (192 months). [2] Land Sales. Dld transactions open data. https://www.dubaipulse.gov.ae/data/dld-transactions/dld_ transactions-open. Accessed: 2025-05-19. |
| Dataset Splits | Yes | Data is randomly split into 20% test, with two train-validation splits: (a) 79% 1% and (b) 75% 5%. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only mentions experiments were conducted across trials. |
| Software Dependencies | No | The paper mentions "Random Forest (Breiman, 2001) and XGBoost (Chen and Guestrin, 2016)" but does not specify software versions for these or any other libraries or programming languages. |
| Experiment Setup | Yes | Experimental methodology. First a ground-truth signal is generated. We considered two types of ground truth signal as shown in Fig.4. The failure probability parameter for all algorithms is set to be 0.1. For the wavelet-based algorithms, an estimate of the standard deviation is formed based on the Median Absolute Deviation (MAD) of the wavelet coefficients at the highest resolution similar to as done in Donoho et al. (1998). For each month t, we train Random Forest (Breiman, 2001) and XGBoost (Chen and Guestrin, 2016) models using a window of past data where we consider window sizes w [1, 4, 16, 62, 256], yielding 10 models per month. |