reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simulating Training Dynamics to Reconstruct Training Data from Deep Neural Networks

Authors: Hanling Tian, Yuhang Liu, Mingzhen He, Zhengbao He, Zhehao Huang, Ruikai Yang, Xiaolin Huang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that Simu Dy significantly outperforms previous approaches when handling non-linear training dynamics, and for the first time, most training samples can be reconstructed from a trained Res Net s parameters.
Researcher Affiliation	Academia	Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University EMAIL
Pseudocode	Yes	Algorithm 1 Reconstructing training data using Simu Dy. Input: Network function fθ, initial parameters θ0, final parameters θf, dataset size n, training learning rate η, training steps T, batch size \|B\|, dissimilarity function d( , ), optimizer Optim; Output: Reconstructed images via Simu Dy;
Open Source Code	Yes	Our code is available at https://github.com/Blue Blood6/Simu Dy.
Open Datasets	Yes	We train the model on a subset of CIFAR-10 (Krizhevsky et al., 2009) from pre-trained initial parameters θ0 to the final parameters θf. C.2 RECONSTRUCTIONS ON OTHER DATASET AND ARCHITECTURE To further demonstrate that Simu Dy maintains its effectiveness across various datasets and diverse network architectures, we extend our experiments to SVHN with Res Net-18 and CIFAR-10 with Res Net-50. ...Additionally, we conduct experiments on Image Net with Vi T and an NLP task, as shown in Appendix C.6 and Appendix C.7. ...we choose Tiny BERT (Jiao et al., 2020) as model and Co LA (Warstadt et al., 2018) as dataset, with sentence classification as the training task.
Dataset Splits	No	The paper mentions training on a 'subset of CIFAR-10' and testing reconstruction on trained models, but does not specify the exact percentages or methodology for splitting data into training, validation, or test sets for the model training itself, nor for the datasets used in reconstruction.
Hardware Specification	Yes	All training and reconstructing run on one RTX 4090 GPU. Both training and reconstructing run on one RTX 4090 GPU. Table 3: GPU memory usage and reconstruction time for training CIFAR-10 datasets with different sizes on one RTX 4090 GPU.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	The model is trained with a batch size of 20 and a learning rate of 7e-3. ...We train MLPs, which comprise three fully-connected layers with dimensions d-1000-1000-1 (d is the dimension of the input), from scratch with SGD. ...The class distribution is balanced and the training algorithm is mini-batch gradient descent with shuffle on. We set the coefficients to 0, 5e-4, 2e-3, and 5e-2, respectively. ...Input: Network function fθ, initial parameters θ0, final parameters θf, dataset size n, training learning rate η, training steps T, batch size \|B\|, dissimilarity function d( , ), optimizer Optim;