reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FeatSharp: Your Vision Model Features, Sharper

Authors: Mike Ranzinger, Greg Heinrich, Pavlo Molchanov, Bryan Catanzaro, Andrew Tao

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of this approach on core perception tasks as well as within agglomerative model training using RADIO as a way of providing richer targets for distillation. Code available at https://github.com/NVlabs/FeatSharp. ... We study our method in relation to Feat Up from a core multi-view consistency standpoint in this section, from a semantic segmentation linear probe standpoint, and also for training a new RADIO-like model with hi-res teacher targets. ... We show these results in figure 6, where we observe that Feat Sharp consistently achieves the highest fidelities... ... Results are in figure 1. ... Semantic segmentation has the potential to benefit from increased resolution... We show these results in figure 7. ... We integrate our method, Feat Up-JBU, the baselines, as well as SAPA (Lu et al., 2022) and the preprint Re SFU (Zhou et al., 2025) into Detectron2 using the Edge2 codebase, and probe on COCO 2017 (Lin et al., 2014). ... We show the results in table 1, where Feat Sharp is clearly best able to improve object detection results over baseline and comparison methods, particularly for small objects, benefitting from the additional tile guidance. ... We show the MTL Gain results in table 2. ... In this section, we provide detailed benchmark results used to compute the MTL aggregate metrics in table 2. We show these results in tables 3, 4, 5, and 6.
Researcher Affiliation	Industry	Mike Ranzinger 1 * Greg Heinrich 1 * Pavlo Molchanov 1 Bryan Catanzaro 1 Andrew Tao 1 ... 1NVIDIA. Correspondence to: Mike Ranzinger <EMAIL>.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It describes the methodology using architectural diagrams (Figures 2, 3) and mathematical equations (Equations 1-5, 6), but not in a pseudocode format.
Open Source Code	Yes	Code available at https://github.com/NVlabs/FeatSharp.
Open Datasets	Yes	ADE20k (Zhou et al., 2017) Semantic segmentation results ... COCO 2017 (Lin et al., 2014) ... Dataset COCO SA-1B ... Probe3d (El Banani et al., 2024) ... NYUDv2 (Nathan Silberman & Fergus, 2012) ... Classification Zero Shot Retrieval Image Net-1k COCO Flickr30k
Dataset Splits	No	The paper uses well-known datasets like ADE20k, COCO 2017, ImageNet-1k, Flickr30k, SA-1B, Probe3d, and NYUDv2. While these datasets often come with predefined splits, the paper does not explicitly state the specific split percentages, sample counts, or refer to predefined splits with citations for its own experimental setup. For example, it does not state 'we used the standard train/validation/test splits of COCO 2017' with specific proportions.
Hardware Specification	Yes	Throughput of a Vi T-H/14 model (e.g. DFN CLIP) achieved with an A100 GPU, BS=1.
Software Dependencies	No	The paper mentions 'Optimizer NAdam' in Table 11 and 'Detectron2' in Section 4.4, but does not provide specific version numbers for these or any other software components, libraries, or frameworks used.
Experiment Setup	Yes	Table 11. Training hyperparameters. Feat Up JBU refers to the settings in the official https://github.com/mhamilton723/FeatUp. Unless otherwise specified, we report numbers based on the Long schedule, which includes Feat Up reproduction values, to maintain fairness. Num GPUS 1 8 8, Batch Size (per GPU) 4 4 4, Batch Size (total) 4 32 32, Num Steps 2,000 3,000 9,000, Optimizer NAdam NAdam NAdam, Learning Rate 0.001 1e-4 1e-4, Downsampler Attention (k=7) Attention (k=7) Attention (k=7), Num Jitters 5 5 5, CRF Weight 0.001 0 0, TV Weight 0 0 0, Feature Normalization Layer Norm PHI-S PHI-S, Dataset COCO SA-1B SA-1B, Multi-view Augs Scale, Shift Scale, Shift, HFlip, Rotate, Perspective