FeatSharp: Your Vision Model Features, Sharper

Authors: Mike Ranzinger, Greg Heinrich, Pavlo Molchanov, Bryan Catanzaro, Andrew Tao

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of this approach on core perception tasks as well as within agglomerative model training using RADIO as a way of providing richer targets for distillation. Code available at https://github.com/NVlabs/FeatSharp. ... We study our method in relation to Feat Up from a core multi-view consistency standpoint in this section, from a semantic segmentation linear probe standpoint, and also for training a new RADIO-like model with hi-res teacher targets. ... We show these results in figure 6, where we observe that Feat Sharp consistently achieves the highest fidelities... ... Results are in figure 1. ... Semantic segmentation has the potential to benefit from increased resolution... We show these results in figure 7. ... We integrate our method, Feat Up-JBU, the baselines, as well as SAPA (Lu et al., 2022) and the preprint Re SFU (Zhou et al., 2025) into Detectron2 using the Edge2 codebase, and probe on COCO 2017 (Lin et al., 2014). ... We show the results in table 1, where Feat Sharp is clearly best able to improve object detection results over baseline and comparison methods, particularly for small objects, benefitting from the additional tile guidance. ... We show the MTL Gain results in table 2. ... In this section, we provide detailed benchmark results used to compute the MTL aggregate metrics in table 2. We show these results in tables 3, 4, 5, and 6.
Researcher Affiliation Industry Mike Ranzinger 1 * Greg Heinrich 1 * Pavlo Molchanov 1 Bryan Catanzaro 1 Andrew Tao 1 ... 1NVIDIA. Correspondence to: Mike Ranzinger <EMAIL>.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It describes the methodology using architectural diagrams (Figures 2, 3) and mathematical equations (Equations 1-5, 6), but not in a pseudocode format.
Open Source Code Yes Code available at https://github.com/NVlabs/FeatSharp.
Open Datasets Yes ADE20k (Zhou et al., 2017) Semantic segmentation results ... COCO 2017 (Lin et al., 2014) ... Dataset COCO SA-1B ... Probe3d (El Banani et al., 2024) ... NYUDv2 (Nathan Silberman & Fergus, 2012) ... Classification Zero Shot Retrieval Image Net-1k COCO Flickr30k
Dataset Splits No The paper uses well-known datasets like ADE20k, COCO 2017, ImageNet-1k, Flickr30k, SA-1B, Probe3d, and NYUDv2. While these datasets often come with predefined splits, the paper does not explicitly state the specific split percentages, sample counts, or refer to predefined splits with citations for its own experimental setup. For example, it does not state 'we used the standard train/validation/test splits of COCO 2017' with specific proportions.
Hardware Specification Yes Throughput of a Vi T-H/14 model (e.g. DFN CLIP) achieved with an A100 GPU, BS=1.
Software Dependencies No The paper mentions 'Optimizer NAdam' in Table 11 and 'Detectron2' in Section 4.4, but does not provide specific version numbers for these or any other software components, libraries, or frameworks used.
Experiment Setup Yes Table 11. Training hyperparameters. Feat Up JBU refers to the settings in the official https://github.com/mhamilton723/FeatUp. Unless otherwise specified, we report numbers based on the Long schedule, which includes Feat Up reproduction values, to maintain fairness. Num GPUS 1 8 8, Batch Size (per GPU) 4 4 4, Batch Size (total) 4 32 32, Num Steps 2,000 3,000 9,000, Optimizer NAdam NAdam NAdam, Learning Rate 0.001 1e-4 1e-4, Downsampler Attention (k=7) Attention (k=7) Attention (k=7), Num Jitters 5 5 5, CRF Weight 0.001 0 0, TV Weight 0 0 0, Feature Normalization Layer Norm PHI-S PHI-S, Dataset COCO SA-1B SA-1B, Multi-view Augs Scale, Shift Scale, Shift, HFlip, Rotate, Perspective