Visually Consistent Hierarchical Image Classification
Authors: Seulki Park, Youren Zhang, Stella Yu, Sara Beery, Jonathan Huang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS We first show that hierarchical classification remains challenging even for vision foundation models, which often yield inconsistent predictions. Our method outperforms existing approaches and flat baselines on benchmark datasets. We further validate our design through ablations and demonstrate that hierarchical supervision also benefits semantic segmentation. |
| Researcher Affiliation | Collaboration | Seulki Park1, Youren Zhang1, Stella X. Yu1,2, Sara Beery3, Jonathan Huang4 1University of Michigan 2UC Berkeley 3MIT 4Scaled Foundations EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology in prose (e.g., Section 3.1 H-CAST FOR VISUAL CONSISTENCY and 3.2 TREE-PATH KL DIVERGENCE LOSS FOR SEMANTIC CONSISTENCY) and mathematical formulas (e.g., equations 1, 2, 3) but does not include structured pseudocode or an algorithm block. |
| Open Source Code | Yes | Our code is available at https://github.com/pseulki/hcast. |
| Open Datasets | Yes | Datasets. We use widely used benchmarks in hierarchical recognition: BREEDS (Santurkar et al., 2021), CUB-200-2011 (Welinder et al., 2010), FGVC-Aircraft (Maji et al., 2013), and i Nat21Mini (Van Horn et al., 2021). |
| Dataset Splits | Yes | For BREEDS, we conduct training and validation using their source splits. BREEDS provide a wider class variety and larger sample size than CUB-200-2011 and FGVC-Aircraft, making it better suited for evaluating generalization performance. CUB-200-2011 comprises a 3-level hierarchy with order, family, and species; FGVC-Aircraft consists of a 3-level hierarchy including maker, family, and model (e.g., Boeing Boeing 707 707-320 ); For experiments on a larger dataset, we used the 3-level i Nat21-Mini. Details of the i Nat21-Mini are provided in Sec. E.4. Table 4 in Appendix summarizes a description of the datasets. (...) i Naturalist21-mini contains 10,000 classes, 500,000 training samples, and 100,000 test samples... i Naturalist-2018 includes two-level hierarchical annotations with 14 super-categories and 8,142 species, comprising 437,513 training images and 24,426 validation images. |
| Hardware Specification | No | The paper acknowledges 'partial compute support from NAIRR Pilot (CIS240431, CIS250430)' but does not specify any particular hardware details such as GPU models, CPU types, or memory amounts used for the experiments. |
| Software Dependencies | No | The paper mentions using the Dei T framework, the Adam optimizer, and various augmentation techniques (Rand Aug, label smoothing, mixup, cutmix), but does not specify version numbers for these software components or any other libraries like PyTorch, TensorFlow, or Python. |
| Experiment Setup | Yes | Table 5: Hyper-parameters for training H-CAST and Vi T on FGVC-Aircraft, CUB-200-2011, BREEDS, and i Naturalist datasets. We follow mostly the same set up as CAST (Ke et al., 2024). This table lists specific parameters such as batch size (256), crop size (224), learning rate (1e-3, 5e-4), weight decay (0.05), momentum (0.9), total epochs (100), warmup epochs (5), warmup learning rate (1e-4, 1e-6), optimizer (Adam), learning rate policy (Cosine decay), augmentation (Rand Aug(9, 0.5)), label smoothing (0.1), mixup (0.8), cutmix (1.0), and α (weight for TK loss) (0.5). |