reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generalization Analysis for Deep Contrastive Representation Learning

Authors: Nong Minh Hieu, Antoine Ledent, Yunwen Lei, Cheng Yeaw Ku

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To compare our results with previous works, we conducted experiments by training fully-connected deep neural networks with the MNIST digits dataset (Le Cun, Cortes, and Burges 2010) with a train-test ratio of 75%/25%. We ran two ablation studies to test how our bounds vary with network depth and hidden layer dimension compared to the bounds proposed by Arora et al. (2019) and Lei et al. (2023). ... A summary of our experiment results is presented in figure 1: the y axis shows the main factor in our and competing bounds, ignoring constants and logarithmic terms in all cases. The results demonstrate that our generalization bounds outperform the competing ones, especially for larger widths and depths.
Researcher Affiliation	Academia	Nong Minh Hieu 1, 2, Antoine Ledent 2, Yunwen Lei 3, Cheng Yeaw Ku 1 1School of Physical and Mathematical Science, Nanyang Technological University, Singapore 639798 2School of Computing and Information Systems, Singapore Management University, Singapore 188065 3Department of Mathematics, University of Hong Kong, Pok Fu Lam, Hong Kong EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes theoretical frameworks and mathematical derivations but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing code, a link to a code repository, or mentions of code in supplementary materials.
Open Datasets	Yes	Experiments To compare our results with previous works, we conducted experiments by training fully-connected deep neural networks with the MNIST digits dataset (Le Cun, Cortes, and Burges 2010) with a train-test ratio of 75%/25%.
Dataset Splits	Yes	Experiments To compare our results with previous works, we conducted experiments by training fully-connected deep neural networks with the MNIST digits dataset (Le Cun, Cortes, and Burges 2010) with a train-test ratio of 75%/25%.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU/CPU models or memory specifications.
Software Dependencies	No	The paper does not specify any software names with version numbers that were used for the experiments.
Experiment Setup	Yes	For the first experiment, we fixed the hidden layer dimensions to 64 and trained deep neural networks at different depths in the [2, 10] range. For the second experiment, we fixed the depth to L = 3 and trained deep neural networks at different hidden layer dimensions of 32, 64, 128, ... (in multiples of 32). In both experiments, we fixed the output dimension to d = 64 and the number of negative samples to k = 10 (furthermore, additional experiments with k = 64 are provided in Appendix J of the full Ar Xiv version). For all the neural networks trained in both experiments, we set the maximum number of training iterations to 1000 and stopped until the empirical unsupervised loss reached 1e 4 to ensure that all networks roughly converge to the empirical risk minimizers.