reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Deep Out-of-Distribution Uncertainty Quantification via Weight Entropy Maximization

Authors: Antoine de Mathelin, François Deheeger, Mathilde Mougeot, Nicolas Vayatis

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide both theoretical and numerical results to assess the efﬁciency of the approach. Numerical experiments conducted on several regression and classiﬁcation datasets demonstrate the strong beneﬁt of this approach in OOD detection compared to state-of-the-art methods dedicated to this task (e.g., Figure 1). In particular, the proposed algorithm appears in the top three best methods in all conﬁgurations of an extensive out-of-distribution detection benchmark including more than thirty competitors.
Researcher Affiliation	Collaboration	Antoine de Mathelin1,2 EMAIL François Deheeger1 EMAIL Mathilde Mougeot2 EMAIL Nicolas Vayatis2 EMAIL 1Manufacture Française des pneumatiques Michelin, Clermont-Ferrand, 63000, France 2Centre Borelli, Université Paris-Saclay, CNRS, ENS Paris-Saclay, Gif-sur-Yvette, 91190, France
Pseudocode	Yes	Algorithm 1 Max WEnt Training (...) Algorithm 2 Max WEnt Inference
Open Source Code	Yes	The source code of the experiments is available on Git Hub.1 (https://github.com/antoinedemathelin/maxwent-expe) (...) The source code for the Max WEnt experiments, conducted within the Open OOD benchmark, is available on Git Hub.3
Open Datasets	Yes	We consider the two-moons classiﬁcation dataset from scikit-learn2 (...) We reproduce the synthetic univariate regression experiment from Jain et al. (2020) (...) We consider the most common UCI regression datasets (...) This section is dedicated to uncertainty quantiﬁcation on the real-world dataset City Cam (Zhang et al., 2017). (...) MNIST (Deng, 2012), CIFAR10, CIFAR100 (Krizhevsky et al., 2009) and Tiny Image Net (Torralba et al., 2008).
Dataset Splits	Yes	The training set is composed of 200 data points generated from the two-moons generator; 50 additional instances are generated to form a validation dataset. (...) We reproduce the synthetic univariate regression experiment from Jain et al. (2020) with 100 training and 20 validation instances. (...) we split the dataset along the ﬁrst component of the input PCA: we deﬁne the internal domain with the data between the 25% and 75% percentiles of the ﬁrst component of the input PCA, while the rest of the data forms the external domain. (...) 10% of the in-distribution data are selected to form the test set and 5% of the remaining data to form the validation set. (...) The dataset is split into two subsets: images recorded before 2 pm are considered in-distribution, while the others are out-of-distribution.
Hardware Specification	No	The paper discusses various experimental setups, datasets, and methods, including the use of ResNet50 models for feature extraction. However, it does not explicitly state any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. Mentions like 'deep learning models' or 'neural networks' imply computational resources but lack the required specificity.
Software Dependencies	No	The paper mentions the use of 'scikit-learn' for a dataset, 'Adam optimizer' as a method, and 'Git Hub' as a code repository. However, it does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow, which are implicitly used), or programming languages, which are necessary for reproducible software dependency information.
Experiment Setup	Yes	We use the Adam optimizer (Kingma and Ba, 2015) with learning rate 0.001 and batch size 32. 10k iterations are used to train the Vanilla Network and 20k iterations for other methods, as the stochastic variational inference requires more iterations to converge. (...) The end-layer is composed of two neurons, which respectively predict the conditional mean and standard deviation µw(x), σw(x) (cf. Section 7.1.1). (...) We then consider 10k iterations for ensemble methods and 50k iterations for Bayesian and Bayesian ensemble methods, as stochastic variational inference converges slower than stochastic gradient descent. A callback process is used to monitor the validation NLL of the model every 100 iterations, the network weights corresponding to the iteration of best validation NLL are restored at the training end. For Max WEnt, the scale parameters are saved if the validation NLL is below the threshold deﬁned in Section 7.5. (...) For our experiments, we trained Max WEnt with the Adam optimizer (Kingma and Ba, 2015) with learning rate 5 10 4 and 20 epochs. We also consider an ensemble of ﬁve Max WEnt networks. For inference, we use P = 10 predictions. (...) The φ parameters are initialized with a small constant value C 1. (...) In all our experiments, we choose to consider a ﬁxed trade-off λ = 10 for simplicity.