reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Test-Time Canonicalization by Foundation Models for Robust Perception

Authors: Utkarsh Singhal, Ryan Feng, Stella X. Yu, Atul Prakash

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate FOCAL across a range of challenging transformations, including 3D-viewpoint, illumination, day-night changes, and 2D rotations. We find that FOCAL improves out-of-distribution performance of foundation models such as CLIP (Radford et al., 2021) on Image Net (Deng et al., 2009) scale datasets. ... We demonstrate the effectiveness of FOCAL through evaluations on modern models such as CLIP, OV-Seg, and SAM, across diverse datasets including Image Net, COCO, Objaverse-LVIS, and CO3D.
Researcher Affiliation	Academia	1UC Berkeley 2University of Michigan. Correspondence to: Utkarsh Singhal <EMAIL>.
Pseudocode	No	The paper describes the method using text and mathematical equations, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at: https://github.com/sutkarsh/focal.
Open Datasets	Yes	We demonstrate the effectiveness of FOCAL through evaluations on modern models such as CLIP, OV-Seg, and SAM, across diverse datasets including Image Net, COCO, Objaverse-LVIS, and CO3D. ... We evaluate chrominance (color) and contrast transformations on CIFAR100 (Krizhevsky et al., 2010) and Image Net (Deng et al., 2009) with CLIP (Radford et al., 2021). ... We compare against PRLC (Mondal et al., 2023) on their 2D rotation settings (C8) using PRLC-trained Vi T (Dosovitskiy et al., 2021) and PRLCtrained Res Net50 (He et al., 2016) models across CIFAR10 (Krizhevsky et al., 2010), CIFAR100 (Krizhevsky et al., 2010), and STL10 (Coates et al., 2011).
Dataset Splits	No	The paper discusses filtering processes for Objaverse-LVIS and CO3D, and refers to existing settings for 2D rotation experiments (e.g., 'their 2D rotation settings (C8)'). However, it does not explicitly provide specific train/test/validation splits with percentages, sample counts, or detailed methodology for all its experiments within the main text.
Hardware Specification	Yes	All experiments were done on an RTX 2080Ti GPU except 3D viewpoint, which was done on an RTX 6000 Ada Generation GPU.
Software Dependencies	Yes	For Objaverse-LVIS, we noticed cases of misleading and overlapping labels and thus filtered out such objects. ... We then pass the crop and the cropped segmentation to gpt-4o-mini-2025-04-16 (Open AI, 2025) with the following prompt: ... Rendering: For Objaverse-LVIS (Deitke et al., 2023), we generate our base input renders at viewpoints in the upper viewing hemisphere. We sample at an interval of 30 degrees and a radius of 2.2. ... Blender Foundation. Blender A 3D Modeling and Rendering Package. Blender Foundation, 2022. URL https://www.blender.org. Version 3.2.2.
Experiment Setup	Yes	Combining energy functions: We minimize the combined energy EFOCAL t(x) over all transformations t T to find the canonical version of the input image x. This is done by solving the following optimization problem: EFOCAL t(x) = γ1ECLIP(t(x)) + γ2Ediff(t(x)) (5) where α, β, γ1, γ2 R are hyperparameters. ... Bayesian Optimization for Efficient Search: ... We utilize Bayesian Optimization (BO) with a Gaussian Process (GP) using an RBF kernel and the Expected Improvement (EI) acquisition function (Jones et al., 1998) to balance exploration and exploitation. ... (Appendix B.1) For both Objaverse (Deitke et al., 2023) and CO3D (Reizenstein et al., 2021), we use α = 1, β = 0.5 following the 2D experiments (B.3). We also used the diffusion energy (steps 500 to 1000 with stride 100) with a factor of 5. ... (Appendix B.2) We define the color shift transformation using the popular von Kries model ... For initialization, we use random as well as a grid of initial samples. Color uses a uniform grid of 3x3, 6 random points, and 20 iterations. Contrast uses 3 grid points, 4 random points, and 5 iterations. ... (Appendix B.3) For experiments on Image Net, CIFAR10, CIFAR100, and STL10, we only used the classification energy for computational efficiency. We used α = 1, β = 0.5 for all these settings. ... For segmentation, we used the diffusion energy (steps 50 to 150 with stride 10) with a factor of 0.67 along with CLIP energy factor of 0.54 and β = 0.2.