reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Can multi-label classification networks know what they don’t know?

Authors: Haoran Wang, Weitang Liu, Alex Bocchieri, Yixuan Li

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results show consistent improvement over previous methods that are based on the maximum-valued scores, which fail to capture joint information from multiple labels. We demonstrate the effectiveness of our method on three common multi-label classification benchmarks, including MSCOCO, PASCAL-VOC, and NUS-WIDE. We show that Joint Energy can reduce the FPR95 by up to 10.05% compared to the previous best baseline, establishing state-of-the-art performance. 4 Experiments In this section, we describe our experimental setup (Section 4.1) and demonstrate the effectiveness of our method on several OOD evaluation tasks (Section 4.2).
Researcher Affiliation	Academia	Haoran Wang Information Networking Institute Carnegie Mellon University EMAIL Weitang Liu Department of Computer Science and Eng. University of California, San Diego EMAIL Alex Bocchieri Department of Computer Sciences University of Wisconsin-Madison EMAIL Yixuan Li Department of Computer Sciences University of Wisconsin-Madison EMAIL
Pseudocode	No	The paper describes the proposed method using mathematical formulations and textual explanations but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Our code and dataset is released for reproducible research2. 2Code and data is available: https://github.com/deeplearning-wisc/multi-label-ood
Open Datasets	Yes	We consider three multi-label datasets: MS-COCO [29], PASCAL-VOC [11], and NUS-WIDE [6].
Dataset Splits	Yes	MS-COCO consists of 82,783 training, 40,504 validation, and 40,775 testing images with 80 common object categories.
Hardware Specification	Yes	All experiments are conducted on NVIDIA GeForce RTX 2080Ti.
Software Dependencies	No	The paper mentions the Adam optimizer but does not specify versions for key software components or libraries (e.g., Python, PyTorch, TensorFlow, etc.).
Experiment Setup	Yes	We use the Adam optimizer [23] with standard parameters (β1 = 0.9, β2 = 0.999). The initial learning rate is 10^-4 for the fully connected layers and 10^-5 for convolutional layers. We also augmented the data with random crops and random flips to obtain color images of size 256x256.