reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Does SGD really happen in tiny subspaces?

Authors: Minhak Song, Kwangjun Ahn, Chulhee Yun

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we rigorously examine the question (Q1) through systematic experiments. Quite surprisingly, our results reveal that the answer to the question is negative, as summarized below. In Section 3, we demonstrate that the observed alignment is spurious in the sense that the aligned component of the gradient is not beneficial for training, even though it constitutes the majority of the gradient. Specifically, we run a critical experiment where we modify SGD by projecting each update onto the dominant subspace; we call this Dom-SGD. Unexpectedly, Dom-SGD does not further decrease the training loss.
Researcher Affiliation	Collaboration	Minhak Song KAIST Math Kwangjun Ahn Microsoft Research Chulhee Yun KAIST AI
Pseudocode	No	The paper describes the modified SGD updates (Dom-SGD, Bulk-SGD) in mathematical notation within the main text (e.g., "θt+1 θt ηPk(θt)gt . (Dom-SGD)") but does not present them within a clearly labeled pseudocode block or algorithm figure.
Open Source Code	Yes	Furthermore, to facilitate replication and verification, the source code for the experiments is included in the attached supplementary material. This code contains scripts for reproducing the main results discussed in the paper, along with instructions for running the experiments.
Open Datasets	Yes	MNIST-5k: We use the first 5000 samples of MNIST dataset (Le Cun et al., 1998) for multi-class classification. CIFAR10-5k: We use the first 5000 samples of CIFAR10 dataset (Krizhevsky, 2009) for multi-class classification. SST2-1k: We use the first 1000 samples of SST2 dataset (Socher et al., 2013) for binary classification.
Dataset Splits	No	The paper specifies using subsets like "first 5000 samples of MNIST dataset" or "first 1000 samples of SST2 dataset." While these define the data used, it does not explicitly provide information on how these selected samples are further split into training, validation, or test sets for the experiments described in the main sections (3, 4, 5, 6). Appendix H mentions test accuracy for the full MNIST dataset, but this specific split information is not provided for the main experimental datasets.
Hardware Specification	Yes	All experiments were performed on a single server equipped with 4 NVIDIA RTX 3090 GPUs.
Software Dependencies	No	Our experiments were conducted using Pytorch (Paszke et al., 2019). While PyTorch is mentioned, a specific version number for the software is not provided.
Experiment Setup	Yes	Throughout this paper, all experiments are conducted using a constant learning rate. For experiments using SGD, we use a batch size of 50 for all experiments. MLP on MNIST-5k: 0.01, CNN on CIFAR10-5k: 0.001, Transformer on SST2-1k: 0.001. Specifically, we track the exponential moving average (EMA) of χk( L(θt)) values (EMA factor set to 0.9).