Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Instance-Dependent Generalization Bounds via Optimal Transport
Authors: Songyan Hou, Parnian Kassraie, Anastasis Kratsios, Andreas Krause, Jonas Rothfuss
JMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training. While our bounds generically hold for any machine learning model, we focus our exposition on neural network generalization and empirically verify the mentioned properties through experiments. When applied to fully-connected Re LU networks, trained on simple regression and classification tasks, we observe that our result provides meaningful bound values in the same order of magnitude as the empirical risk, even for small sample sizes. |
| Researcher Affiliation | Academia | Songyan Hou EMAIL Department of Mathematics, ETH Zurich, Parnian Kassraie EMAIL Department of Computer Science, ETH Zurich, Anastasis Kratsios EMAIL Department of Mathematics, Mc Master University and the Vector Institute, Andreas Krause EMAIL Department of Computer Science, ETH Zurich, Jonas Rothfuss EMAIL Department of Computer Science, ETH Zurich |
| Pseudocode | No | The paper describes steps and methodologies in narrative text, particularly in Section 6.2 'Proof outline for Theorem 10', but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code for the described methodology, nor does it include links to code repositories. |
| Open Datasets | No | For our empirical evaluations of neural network regression, we use the simplistic problem of regressing on noisy observation of a modified logistic function. We generate random datasets for two toy regression and classification tasks. The paper describes the generation process for these synthetic datasets but does not indicate their public availability or provide access details. |
| Dataset Splits | No | The paper describes generating synthetic datasets for regression and classification tasks, but it does not explicitly provide details about training, test, or validation dataset splits (e.g., percentages, sample counts, or specific split methodologies). |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for conducting the experiments. |
| Software Dependencies | No | The paper mentions using 'stochastic gradient descent with the Adam W (Loshchilov and Hutter, 2019) optimizer' and 'fully-connected neural networks with leaky Re LU activation functions' but does not specify version numbers for any software, libraries, or programming languages used in the experiments. |
| Experiment Setup | Yes | In our empirical evaluations in Section 4, we use fully-connected neural networks with leaky Re LU activation functions... We train the neural network by stochastic gradient descent with the Adam W (Loshchilov and Hutter, 2019) optimizer which combines the adaptive learning rate method Adam with weight decay. Unless stated otherwise, we set the weight decay parameter to 0 (i.e., no weight decay), use an initial learning rate of 0.05 and decay the learning rate every 1000 iterations by 0.85. By default, we train for 20000 iterations with a mini-batch size of 8 in case of regression and 16 in case of classification. In the experiments where we do not vary the neural network size, we use l = 3 hidden layers with w = 64 neurons each. |