Last Iterate Convergence of Incremental Methods as a Model of Forgetting

Authors: Xufeng Cai, Jelena Diakonikolas

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further provide illustrative numerical results in Fig. 2 to facilitate our discussion. In particular, we choose L = 2, T {100, 150, 200}, δt = 1/t (t [T 1]) and δT = T for the example f(x) = L 2T PT t=1(x δt)2 used in the proof of Theorem 3. In Fig. 2(a), we plot the optimality gap at the last iterate, i.e., the excess forgetting, against the step sizes after K = 104 epochs.
Researcher Affiliation Academia Xufeng Cai Jelena Diakonikolas Department of Computer Sciences, University of Wisconsin Madison EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Incremental Gradient Descent (IGD) [...] Algorithm 2 Incremental Proximal Method (IPM)
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository.
Open Datasets No The numerical results use a synthetically generated function: "we choose L = 2, T {100, 150, 200}, δt = 1/t (t [T 1]) and δT = T for the example f(x) = L 2T PT t=1(x δt)2 used in the proof of Theorem 3." This does not refer to a publicly available dataset.
Dataset Splits No The paper uses a synthetically generated function for numerical illustration rather than a traditional dataset; therefore, the concept of training/test/validation splits does not apply in the conventional sense.
Hardware Specification No The paper does not provide specific details about the hardware used for running the numerical experiments.
Software Dependencies No The paper does not mention any specific software names or version numbers (e.g., programming languages, libraries, or solvers) used for implementation or experimentation.
Experiment Setup Yes In particular, we choose L = 2, T {100, 150, 200}, δt = 1/t (t [T 1]) and δT = T for the example f(x) = L 2T PT t=1(x δt)2 used in the proof of Theorem 3. In Fig. 2(a), we plot the optimality gap at the last iterate, i.e., the excess forgetting, against the step sizes after K = 104 epochs.