reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Capturing the Temporal Dependence of Training Data Influence

Authors: Jiachen (Tianhao) Wang, Dawn Song, James Y Zou, Prateek Mittal, Ruoxi Jia

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the effectiveness of our proposed data value embedding method. First, we assess its fidelity in accurately reflecting data importance using small-scale experimental setups (Section 5.1), as well as its computational efficiency (Section 5.2). We then apply data value embedding to analyze the training dynamics during foundation model pretraining (Section 5.3 and Appendix E.4). The implementation details and additional results are deferred to Appendix E.
Researcher Affiliation	Academia	Jiachen T. Wang Princeton University Dawn Song UC Berkeley James Zou Stanford University Prateek Mittal Princeton University Ruoxi Jia Virginia Tech Correspondence to Jiachen T. Wang and Ruoxi Jia (EMAIL, EMAIL).
Pseudocode	Yes	Algorithm 1 Backpropagation for computing data value embedding from the final checkpoint Algorithm 2 Parallel Influence Checkpointing for Data Value Embedding
Open Source Code	No	The paper does not contain any explicit statements about the release of source code or links to a code repository for their methodology.
Open Datasets	Yes	We conduct our experiments on the MNIST (Le Cun et al., 1989)... For Pythia-410M trained on 1% of the Pile dataset... with Pythia-410M trained on 1% of Pile dataset as an example... For both settings, the sequence length is set to 1024. The learning rate is set at a maximum of 3 10 4. We use Adam W as the optimizer with a weight decay of 0.1, and beta values set to 0.9 and 0.95. Gradients are clipped at a maximum value of 1.0 to maintain stability during training. The batch size is set to 16, with a learning rate warmup of 2000 iterations followed by cosine decay.
Dataset Splits	No	The paper mentions training Pythia-410M on "1% of Pile" and using "a subset of 1,000 samples from CIFAR-10 dataset" with "10% random label noise". It also discusses using a "validation batch sampled from Pile". However, it does not provide specific, reproducible training, validation, and test splits (e.g., percentages, exact counts, or specific predefined splits) for the datasets used in its experiments.
Hardware Specification	Yes	The experiment is conducted on one A100 GPU with 80GB VRAM.
Software Dependencies	No	The paper mentions using "standard SGD" and "Adam W as the optimizer" but does not specify software versions for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries with version numbers.
Experiment Setup	Yes	To validate the effectiveness of our proposed data value embedding algorithm, we assess its accuracy in approximating TSLOO scores... we conduct our experiments on the MNIST (Le Cun et al., 1989) using a small MLP trained with standard SGD. We consider two settings: (1) Single epoch removal, where a data point is excluded from training during a single epoch but still in other training epochs. Here, we remove the data point from the last epoch. (2) All-epoch removal, where a data point is excluded in all epochs. In this case, the approximation provided by data value embedding is obtained by summing the data value embeddings of the data point from all epochs, as discussed in Appendix C.10.