Label Embedding via Low-Coherence Matrices

Authors: Jianxin Zhang, Clayton Scott

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present an experimental evaluation of our proposed method, LOCOLE (LOw COherence Label Embedding), for extreme multiclass classification. [...] The experimental results in Table 3 highlight the superior performance of our proposed method across various datasets.
Researcher Affiliation Academia Jianxin Zhang Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109 EMAIL Clayton Scott Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109 EMAIL
Pseudocode Yes Meta-Algorithm 1 Label Embedding. 1: Input: dataset D = {(xi, yi)}N i=1, embedding matrix G, multi-output regression algorithm A, the decoder function βG from the embedding space to the label space. 2: Form the regression dataset Dr = {(xi, gyi)}N i=1. 3: Train a regression model f with A on Dr. 4: Return: βG f.
Open Source Code No We list the existing code used in our experiments. PD-Sparse (Yen et al., 2016): https://github.com/a061105/Extreme Multiclass (BSD-3-Clause license). PPD-Sparse (Yen et al., 2017a): https://github.com/a061105/Async PDSparse. Parabel (Prabhu et al., 2018): http://manikvarma.org/code/Parabel/download.html. Annex ML (Tagami, 2017): https://github.com/yahoojapan/Annex ML?tab=Apache-2. 0-1-ov-file(Apache-2.0 license). WLSTS(Evron et al., 2018): https://github.com/ievron/wltls/?tab=MIT-1-ov-file(MIT License). The paper lists code for other methods used for comparison, but does not explicitly state that the code for their proposed method (LOCOLE) is publicly available.
Open Datasets Yes We conduct experiments on three large-scale datasets, DMOZ (Partalas et al., 2015), LSHTC1 (Partalas et al., 2015), and ODP (Bennett and Nguyen, 2009), which are extensively used for benchmarking extreme classification algorithms. The details of these datasets are provided in Table 2, with DMOZ and LSHTC1 available from (Yen et al., 2016)1, and ODP from (Medini et al., 2019).
Dataset Splits Yes Table 2: Summary of the datasets used in the experiments. Ntrain is the number of training data points, Ntest the number of test data points, D the number of features, and C the number of classes. [...] LSHTC1 Ntrain 83805 Ntest 5000 [...] DMOZ Ntrain 335068 Ntest 38340 [...] ODP Ntrain 975936 Ntest 493014
Hardware Specification Yes All neural network training is performed on a single NVIDIA A40 GPU with 48GB RAM. We train the PD-Sparse method and Single-Node LOCOLE on Intel Xeon Gold 6154 processors, equipped with 36 cores and 180GB of memory. The distributed LOCOLE and the PPD-Sparse method also implemented in a distributed fashion are trained across 10 CPU nodes, harnessing 360 cores and 1.8TB of memory in total.
Software Dependencies No This is implemented using PyTorch, with a 2-layer fully connected neural network used for the LSHTC1 and DMOZ datasets and a 4-layer fully connected neural network for the ODP dataset. An Adamax optimizer with a learning rate of 0.001 is utilized... The paper mentions "PyTorch" and "Adamax optimizer" but does not specify their version numbers or versions of other critical libraries.
Experiment Setup Yes The proposed embedding strategy adopts a 2-layer neural network architecture, employing a hidden layer of 4096 neurons with ReLU activation. The output of the neural network is normalized to have a Euclidean norm of 1. An Adamax optimizer with a learning rate of 0.001 is utilized together with a batch size of 128 for training. The model is trained for a total of 5 epochs. In order to effectively manage the learning rate, a scheduler is deployed, which scales down the learning rate by a factor of 0.1 at the second epoch.