reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

3DB: A Framework for Debugging Computer Vision Models

Authors: Guillaume Leclerc, Hadi Salman, Andrew Ilyas, Sai Vemprala, Logan Engstrom, Vibhav Vineet, Kai Xiao, Pengchuan Zhang, Shibani Santurkar, Greg Yang, Ashish Kapoor, Aleksander Madry

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate, through a wide range of use cases, that 3DB allows users to discover vulnerabilities in computer vision systems and gain insights into how models make decisions. In all our experiments, we analyze a Res Net-18 [30] trained on the Image Net [53] classiﬁcation task.
Researcher Affiliation	Collaboration	Guillaume Leclerc EMAIL Hadi Salman EMAIL Andrew Ilyas EMAIL Sai Vemprala EMAIL Microsoft Research Logan Engstrom EMAIL Vibhav Vineet EMAIL Microsoft Research Kai Xiao EMAIL Pengchuan Zhang EMAIL Microsoft Research Shibani Santurkar EMAIL Greg Yang EMAIL Microsoft Research Ashish Kapoor EMAIL Microsoft Research Aleksander M adry EMAIL
Pseudocode	No	The paper describes the 3DB workflow and its components but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We are releasing 3DB as a library1 alongside a set of examples2, guides3, and documentation4. 1https://github.com/3db/3db
Open Datasets	Yes	In all our experiments, we analyze a Res Net-18 [30] trained on the Image Net [53] classiﬁcation task. [53] Olga Russakovsky et al. Image Net Large Scale Visual Recognition Challenge . In: International Journal of Computer Vision (IJCV). 2015.
Dataset Splits	Yes	In all our experiments, we analyze a Res Net-18 [30] trained on the Image Net [53] classiﬁcation task (its validation set accuracy is 69.8%).
Hardware Specification	No	The paper states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix B.' However, Appendix B is not provided in the given text, thus specific hardware details are not available.
Software Dependencies	No	The paper mentions 'Py Torch classiﬁcation module' and 'Blender' but does not specify any version numbers for these or other software dependencies.
Experiment Setup	No	The paper specifies the model (ResNet-18) and the dataset (ImageNet) used, and describes how 3DB generates scenes with various transformations (e.g., 'random poses, orientations, and scales'), but it does not provide specific training hyperparameters such as learning rate, batch size, number of epochs, or optimizer details for the model being analyzed.