Visually Consistent Hierarchical Image Classification

Authors: Seulki Park, Youren Zhang, Stella Yu, Sara Beery, Jonathan Huang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS We first show that hierarchical classification remains challenging even for vision foundation models, which often yield inconsistent predictions. Our method outperforms existing approaches and flat baselines on benchmark datasets. We further validate our design through ablations and demonstrate that hierarchical supervision also benefits semantic segmentation.
Researcher Affiliation Collaboration Seulki Park1, Youren Zhang1, Stella X. Yu1,2, Sara Beery3, Jonathan Huang4 1University of Michigan 2UC Berkeley 3MIT 4Scaled Foundations EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology in prose (e.g., Section 3.1 H-CAST FOR VISUAL CONSISTENCY and 3.2 TREE-PATH KL DIVERGENCE LOSS FOR SEMANTIC CONSISTENCY) and mathematical formulas (e.g., equations 1, 2, 3) but does not include structured pseudocode or an algorithm block.
Open Source Code Yes Our code is available at https://github.com/pseulki/hcast.
Open Datasets Yes Datasets. We use widely used benchmarks in hierarchical recognition: BREEDS (Santurkar et al., 2021), CUB-200-2011 (Welinder et al., 2010), FGVC-Aircraft (Maji et al., 2013), and i Nat21Mini (Van Horn et al., 2021).
Dataset Splits Yes For BREEDS, we conduct training and validation using their source splits. BREEDS provide a wider class variety and larger sample size than CUB-200-2011 and FGVC-Aircraft, making it better suited for evaluating generalization performance. CUB-200-2011 comprises a 3-level hierarchy with order, family, and species; FGVC-Aircraft consists of a 3-level hierarchy including maker, family, and model (e.g., Boeing Boeing 707 707-320 ); For experiments on a larger dataset, we used the 3-level i Nat21-Mini. Details of the i Nat21-Mini are provided in Sec. E.4. Table 4 in Appendix summarizes a description of the datasets. (...) i Naturalist21-mini contains 10,000 classes, 500,000 training samples, and 100,000 test samples... i Naturalist-2018 includes two-level hierarchical annotations with 14 super-categories and 8,142 species, comprising 437,513 training images and 24,426 validation images.
Hardware Specification No The paper acknowledges 'partial compute support from NAIRR Pilot (CIS240431, CIS250430)' but does not specify any particular hardware details such as GPU models, CPU types, or memory amounts used for the experiments.
Software Dependencies No The paper mentions using the Dei T framework, the Adam optimizer, and various augmentation techniques (Rand Aug, label smoothing, mixup, cutmix), but does not specify version numbers for these software components or any other libraries like PyTorch, TensorFlow, or Python.
Experiment Setup Yes Table 5: Hyper-parameters for training H-CAST and Vi T on FGVC-Aircraft, CUB-200-2011, BREEDS, and i Naturalist datasets. We follow mostly the same set up as CAST (Ke et al., 2024). This table lists specific parameters such as batch size (256), crop size (224), learning rate (1e-3, 5e-4), weight decay (0.05), momentum (0.9), total epochs (100), warmup epochs (5), warmup learning rate (1e-4, 1e-6), optimizer (Adam), learning rate policy (Cosine decay), augmentation (Rand Aug(9, 0.5)), label smoothing (0.1), mixup (0.8), cutmix (1.0), and α (weight for TK loss) (0.5).