reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Two Is Better Than One: Aligned Representation Pairs for Anomaly Detection

Authors: Alain Ryser, Thomas M. Sutter, Alexander Marx, Julia E Vogt

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the benefit of this approach in extensive experiments on specialized medical datasets, outperforming competitive baselines based on self-supervised learning and pretrained models and presenting competitive performance on natural imaging benchmarks.
Researcher Affiliation	Academia	Alain Ryser EMAIL Department of Computer Science ETH Zurich Thomas M. Sutter Department of Computer Science ETH Zurich Alexander Marx Research Center Trustworthy Data Science and Security of the University Alliance Ruhr Department of Statistics TU Dortmund University Julia E. Vogt Department of Computer Science ETH Zurich
Pseudocode	No	The paper describes the Con2 objective and its components (Context Contrasting and Content Alignment) using mathematical equations and descriptive text, but no distinct pseudocode block or algorithm is presented.
Open Source Code	Yes	1We provide our code on https://github.com/alain-ryser/CON2.
Open Datasets	Yes	We train Con2 on the healthy population of the training datasets of Breast MNIST (Al-Dhabyani et al., 2020) and Oct MNIST (Kermany et al., 2018b) of the Med MNIST collection (Yang et al., 2021; 2023), containing breast ultrasound and retinal optical coherence tomography images, respectively, the KVASIR dataset (Pogorelov et al., 2017) which contains endoscopic images of the gastrointestinal tract, the BR35H brain MRI dataset (Hamada, 2020), a chest x-ray dataset for Pneumonia detection (Kermany et al., 2018a) and a Melanoma detection dataset (Javid, 2022). ... Additionally, vertical flipping often satisfies distinctiveness, as natural images are usually not taken from a birds-eye view and adhere to gravity, e.g., a plane of CIFAR10 will typically not fly upside down. Vertical flipping also satisfies alignment since it neither adds nor removes any information from the image, but instead reorders pixel positions. ... In Figure 6, we compare the performance of Con2 and our baselines across the different classes of CIFAR10. We further provide results on one-class CIFAR100, Image Net30, Dogs vs. Cats, and Muffin vs. Chihuahua in Figure 7.
Dataset Splits	No	The paper mentions training on 'normal training samples' and evaluating on 'a held-out test set with normal and anomalous samples' for medical datasets, and using a 'one-class classification setting' for natural imaging benchmarks where one class is normal and others are anomalies. However, it does not specify explicit percentages or counts for training, validation, or test splits for any of the datasets used, nor does it cite specific predefined splits with numerical details.
Hardware Specification	Yes	We run all our experiments on single GPUs on a compute cluster using either an RTX2080Ti, RTX3090, or RTX4090 GPU for training.
Software Dependencies	No	The paper mentions using 'Py Torch (Ansel et al., 2024) with Lightning (Falcon & The Py Torch Lightning team, 2019)' and 'Num Py (Harris et al., 2020), scikit-learn (Pedregosa et al., 2011), Pandas (Mc Kinney, 2010; team, 2020), or Sci Py (Virtanen et al., 2020)'. However, specific version numbers for these software components or libraries are not provided in the text.
Experiment Setup	Yes	We choose hyperparameters for Con2 based on their performance on the CIFAR10 dataset and keep them constant across all experiments to ensure we have no exposure to the anomaly class of the medical datasets. We linearly anneal the hyperparameter α in LCon2 from 0 to 1 over the course of training to encourage the model to first learn the context-specific cluster structure while gradually aligning representations over the course of training. We optimize our loss using the Adam W optimizer (Loshchilov & Hutter, 2019) with β1 = 0.9, β2 = 0.999, weight decay λ = 0.001, and using a learning rate of 10 3 with a cosine annealing (Loshchilov & Hutter, 2017) schedule. We run all experiments for 2048 epochs. ... We ran all our experiments on Breast MNIST with a batch size of 64. ... We ran all our experiments on Oct MNIST with a batch size of 128. ... We ran all our experiments on Kvasir with a batch size of 128. ... We ran all our experiments on BR35H with a batch size of 128. ... We ran all our experiments on the Pneumonia dataset with a batch size of 128. ... We resize all images to 128 128 before passing them to the model with a batch size 128. ... We ran all our experiments on CIFAR10 and CIFAR100 with a batch size of 512. ... We ran all our experiments on Image Net with a batch size of 128. ... We ran all our experiments on Dogs vs. Cats with a batch size of 256. ... We ran all our experiments on Chihuahua vs. Muffin with a batch size of 256.