Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification

Authors: Hyunji Jung, Hanseul Cho, Chulhee Yun

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we theoretically study continual linear classification via sequentially running gradient descent (GD) on the unregularized logistic loss for a fixed budget of iterations at every stage.2 When all tasks are jointly separable and revealed in the cyclic order (as studied by Evron et al. (2023)), we show that sequential GD converges in the direction of the offline max-margin solution, unlike SMM. ... Lastly, in Section 5 we consider the case where the tasks are no longer jointly separable... Experiments on a Real-World Dataset. For those interested, we also provide an experiment on a real-world dataset CIFAR-10 (Krizhevsky, 2009), which is not guaranteed to be linearly separable: see Appendix C.5.
Researcher Affiliation Academia Hyunji Jung Graduate School of Artificial Intelligence POSTECH EMAIL Hanseul Cho , Chulhee Yun Kim Jaechul Graduate School of AI KAIST EMAIL
Pseudocode No The paper describes the Sequential Gradient Descent algorithm using mathematical equations and prose (Section 2.2 'ALGORITHM: SEQUENTIAL GRADIENT DESCENT') but does not present it in a structured pseudocode or algorithm block.
Open Source Code Yes We ran numerical experiments running the SMM iterations, which are done by solving the constrained minimization problems using fmincon in MATLAB Optimization Toolbox. The code is provided in our supplementary material.
Open Datasets Yes For those interested, we also provide an experiment on a real-world dataset CIFAR-10 (Krizhevsky, 2009), which is not guaranteed to be linearly separable: see Appendix C.5.
Dataset Splits No The paper mentions generating synthetic datasets (Appendix C.2.1) and using CIFAR-10 (Appendix C.5), stating, 'We choose two classes from the CIFAR-10 dataset and design 3 tasks that have 512 data points from the two classes ( airplane and automobile )'. However, it does not explicitly specify train/test/validation splits (e.g., percentages, sample counts, or references to standard splits) for these datasets in the main text or appendices.
Hardware Specification No The paper does not explicitly describe the specific hardware used to run the experiments. General terms like 'training' are used, but no details regarding GPU models, CPU types, or memory are provided.
Software Dependencies Yes We ran numerical experiments running the SMM iterations, which are done by solving the constrained minimization problems using fmincon in MATLAB Optimization Toolbox.
Experiment Setup Yes Optimization. We run sequential GD for 300 stages in total. Since there are three tasks, for the cyclic ordering case, it is equivalent to J = 100. The step size we used is η = 0.1. Also, we allow and conduct K = 1,000 updates per stage. For the joint training case, we run full-batch GD on the union of all datasets for MJK = 300,000 steps.