Gradient-Based Nonlinear Rehearsal Learning with Multivariate Alterations
Authors: Tian Qin, Tian-Zuo Wang, Zhi-Hua Zhou
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that Grad-Rh performs comparably to exact baselines on linear data and significantly outperforms them on nonlinear data in both decision quality and running time. Experiments on both linear and nonlinear datasets demonstrate that our method is effective and efficient, achieving performance comparable to exact baselines on linear data and significantly better results on nonlinear data with improved scalability. |
| Researcher Affiliation | Academia | National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artifcial Intelligence, Nanjing University, China EMAIL |
| Pseudocode | Yes | Algorithm 1 Learning the SRM Algorithm 2 Grad-Rh |
| Open Source Code | No | The paper does not contain any explicit statement about making the source code available or provide a link to a code repository. |
| Open Datasets | Yes | We adopted the two datasets, Bermuda and Ride-hailing, used in Qin, Wang, and Zhou (2023b) and two additional synthetic datasets for the case with linear SRMs. The Bermuda dataset (Courtney et al. 2017; Andersson and Bates 2018) records some environmental variables |
| Dataset Splits | Yes | When fitting a structural equation, 70% observational data are used as the training set, and the left data are used as a validation set for early stopping. |
| Hardware Specification | Yes | All experiments were run on a Nvidia Tesla A100 GPU and two Intel Xeon Platinum 8358 CPUs. |
| Software Dependencies | No | The paper mentions "Adam optimizer (Kingma and Ba 2015)" but does not provide specific version numbers for any key software components or libraries used for implementation. |
| Experiment Setup | Yes | Specifically, the network contains 16 blocks, each of which combines affine coupling, permutation, and global affine transformation (Ardizzone et al. 2018), to form an invertible flow. We train each model for a maximum of 1,000 epochs with the Adam optimizer (Kingma and Ba 2015). The learning rate is 0.001, and the batch size is 128. For Algorithm 2, we experiment with all four surrogate losses described in Section 4, and solving (7) with a maximum of 1,000 rounds of Adam optimizer with 0.001 learning rate. For each decision round and each method, the number of rehearsal samples n is 1,000. |