Deep Out-of-Distribution Uncertainty Quantification via Weight Entropy Maximization

Authors: Antoine de Mathelin, François Deheeger, Mathilde Mougeot, Nicolas Vayatis

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide both theoretical and numerical results to assess the efficiency of the approach. Numerical experiments conducted on several regression and classification datasets demonstrate the strong benefit of this approach in OOD detection compared to state-of-the-art methods dedicated to this task (e.g., Figure 1). In particular, the proposed algorithm appears in the top three best methods in all configurations of an extensive out-of-distribution detection benchmark including more than thirty competitors.
Researcher Affiliation Collaboration Antoine de Mathelin1,2 EMAIL François Deheeger1 EMAIL Mathilde Mougeot2 EMAIL Nicolas Vayatis2 EMAIL 1Manufacture Française des pneumatiques Michelin, Clermont-Ferrand, 63000, France 2Centre Borelli, Université Paris-Saclay, CNRS, ENS Paris-Saclay, Gif-sur-Yvette, 91190, France
Pseudocode Yes Algorithm 1 Max WEnt Training (...) Algorithm 2 Max WEnt Inference
Open Source Code Yes The source code of the experiments is available on Git Hub.1 (https://github.com/antoinedemathelin/maxwent-expe) (...) The source code for the Max WEnt experiments, conducted within the Open OOD benchmark, is available on Git Hub.3
Open Datasets Yes We consider the two-moons classification dataset from scikit-learn2 (...) We reproduce the synthetic univariate regression experiment from Jain et al. (2020) (...) We consider the most common UCI regression datasets (...) This section is dedicated to uncertainty quantification on the real-world dataset City Cam (Zhang et al., 2017). (...) MNIST (Deng, 2012), CIFAR10, CIFAR100 (Krizhevsky et al., 2009) and Tiny Image Net (Torralba et al., 2008).
Dataset Splits Yes The training set is composed of 200 data points generated from the two-moons generator; 50 additional instances are generated to form a validation dataset. (...) We reproduce the synthetic univariate regression experiment from Jain et al. (2020) with 100 training and 20 validation instances. (...) we split the dataset along the first component of the input PCA: we define the internal domain with the data between the 25% and 75% percentiles of the first component of the input PCA, while the rest of the data forms the external domain. (...) 10% of the in-distribution data are selected to form the test set and 5% of the remaining data to form the validation set. (...) The dataset is split into two subsets: images recorded before 2 pm are considered in-distribution, while the others are out-of-distribution.
Hardware Specification No The paper discusses various experimental setups, datasets, and methods, including the use of ResNet50 models for feature extraction. However, it does not explicitly state any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. Mentions like 'deep learning models' or 'neural networks' imply computational resources but lack the required specificity.
Software Dependencies No The paper mentions the use of 'scikit-learn' for a dataset, 'Adam optimizer' as a method, and 'Git Hub' as a code repository. However, it does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow, which are implicitly used), or programming languages, which are necessary for reproducible software dependency information.
Experiment Setup Yes We use the Adam optimizer (Kingma and Ba, 2015) with learning rate 0.001 and batch size 32. 10k iterations are used to train the Vanilla Network and 20k iterations for other methods, as the stochastic variational inference requires more iterations to converge. (...) The end-layer is composed of two neurons, which respectively predict the conditional mean and standard deviation µw(x), σw(x) (cf. Section 7.1.1). (...) We then consider 10k iterations for ensemble methods and 50k iterations for Bayesian and Bayesian ensemble methods, as stochastic variational inference converges slower than stochastic gradient descent. A callback process is used to monitor the validation NLL of the model every 100 iterations, the network weights corresponding to the iteration of best validation NLL are restored at the training end. For Max WEnt, the scale parameters are saved if the validation NLL is below the threshold defined in Section 7.5. (...) For our experiments, we trained Max WEnt with the Adam optimizer (Kingma and Ba, 2015) with learning rate 5 10 4 and 20 epochs. We also consider an ensemble of five Max WEnt networks. For inference, we use P = 10 predictions. (...) The φ parameters are initialized with a small constant value C 1. (...) In all our experiments, we choose to consider a fixed trade-off λ = 10 for simplicity.