Learning Structured Representations by Embedding Class Hierarchy with Fast Optimal Transport

Authors: Siqi Zeng, Sixian Du, Makoto Yamada, Han Zhao

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically analyze the advantage of OT-CPCC over ℓ2-CPCC across a wide range real-world datasets and tasks. We conducted experiments across 7 diverse datasets, CIFAR10, CIFAR100 (Krizhevsky & Hinton, 2009), BREEDS benchmarks (Santurkar et al., 2021) including four settings LIVING17 (L17), ENTITY13 (E13), ENTITY30 (E30), NONLIVING26 (N26), and i Naturalist-mini (Van Horn et al., 2018) (INAT) to thoroughly evaluate the efficacy and influence of OT-CPCC methods.
Researcher Affiliation Academia 1University of Illinois, Urbana-Champaign 2Stanford University 3Okinawa Institute of Science and Technology EMAIL; EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Greedy Flow Matching Input: number of features m, n, weights a = (a1, . . . , am), b = (b1, . . . , bn) 1: P = Array[m n] 2: i = 0 3: j = 0 4: while i < m and j < n do 5: P [i, j] min(a[i], b[j]) 6: a[i] = P [i, j] 7: b[j] = P [i, j] 8: if a[i] == 0 then i += 1 9: if b[j] == 0 then j += 1 Output: P
Open Source Code Yes The code is available at https://github.com/uiuctml/OTCPCC.
Open Datasets Yes We conducted experiments across 7 diverse datasets, CIFAR10, CIFAR100 (Krizhevsky & Hinton, 2009), BREEDS benchmarks (Santurkar et al., 2021) including four settings LIVING17 (L17), ENTITY13 (E13), ENTITY30 (E30), NONLIVING26 (N26), and i Naturalist-mini (Van Horn et al., 2018) (INAT) to thoroughly evaluate the efficacy and influence of OT-CPCC methods. The original class labels for CIFAR10 and CIFAR100 are from https://www.cs.toronto.edu/~kriz/cifar.html, for i Naturalist from https://github.com/visipedia/inat_comp/tree/master/2021, and for BREEDS from Word Net (Miller, 1995).
Dataset Splits Yes For downstream classification and retrieval, we split each dataset into source and target subsets, with both further divided into train and test sets, as illustrated in Fig.4. The source and target datasets share the same coarse labels but different fine labels, where the fine level refers to the more granular class labels that are exactly one level below the coarse labels. For example, in CIFAR10, the source set includes the fine labels (deer, dog, frog, horse) and (ship, truck), while the target set includes (bird, cat) and (airplane, automobile). Both sets share the same coarse labels: animal and transportation. We include details of hierarchical construction and splits for other datasets in App.E.
Hardware Specification Yes We conducted all experiments with NVIDIA RTX A6000.
Software Dependencies No The paper mentions Pytorch and torchvision.models without specific version numbers. It refers to 'torchvision.models.resnet18(weights= IMAGENET1K_V1 )' and provides a URL for the model, but does not list specific versions for PyTorch or other libraries used in the implementation.
Experiment Setup Yes F TRAINING HYPERPARAMETERS. We conduct training from scratch using Res Net18, following the Res Net architecture design where the first convolution layer employs a 3x3 kernel size, while the max pooling layer is omitted. For CIFAR10, the pretraining process spans 100 epochs with a batch size of 64. The learning rate begins at 0.01 and is decreased by a factor of 10 every 60 steps, with momentum set at 0.9 and weight decay set to 5 10 4 in the SGD optimizer.