Last Iterate Convergence of Incremental Methods as a Model of Forgetting
Authors: Xufeng Cai, Jelena Diakonikolas
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further provide illustrative numerical results in Fig. 2 to facilitate our discussion. In particular, we choose L = 2, T {100, 150, 200}, δt = 1/t (t [T 1]) and δT = T for the example f(x) = L 2T PT t=1(x δt)2 used in the proof of Theorem 3. In Fig. 2(a), we plot the optimality gap at the last iterate, i.e., the excess forgetting, against the step sizes after K = 104 epochs. |
| Researcher Affiliation | Academia | Xufeng Cai Jelena Diakonikolas Department of Computer Sciences, University of Wisconsin Madison EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Incremental Gradient Descent (IGD) [...] Algorithm 2 Incremental Proximal Method (IPM) |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository. |
| Open Datasets | No | The numerical results use a synthetically generated function: "we choose L = 2, T {100, 150, 200}, δt = 1/t (t [T 1]) and δT = T for the example f(x) = L 2T PT t=1(x δt)2 used in the proof of Theorem 3." This does not refer to a publicly available dataset. |
| Dataset Splits | No | The paper uses a synthetically generated function for numerical illustration rather than a traditional dataset; therefore, the concept of training/test/validation splits does not apply in the conventional sense. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the numerical experiments. |
| Software Dependencies | No | The paper does not mention any specific software names or version numbers (e.g., programming languages, libraries, or solvers) used for implementation or experimentation. |
| Experiment Setup | Yes | In particular, we choose L = 2, T {100, 150, 200}, δt = 1/t (t [T 1]) and δT = T for the example f(x) = L 2T PT t=1(x δt)2 used in the proof of Theorem 3. In Fig. 2(a), we plot the optimality gap at the last iterate, i.e., the excess forgetting, against the step sizes after K = 104 epochs. |