Harmonic Loss Trains Interpretable AI Models
Authors: David D. Baek, Ziming Liu, Riya Tyagi, Max Tegmark
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first validate the performance of harmonic models across algorithmic, vision, and language datasets. Through extensive experiments, we demonstrate that models trained with harmonic loss perform better than standard models by: (a) enhancing interpretability (i.e. geometry of representations), (b) requiring less data for generalization, and (c) reducing grokking. |
| Researcher Affiliation | Academia | David D. Baek EMAIL Massachusetts Institute of Technology Ziming Liu EMAIL Massachusetts Institute of Technology Riya Tyagi EMAIL Massachusetts Institute of Technology Max Tegmark EMAIL Massachusetts Institute of Technology |
| Pseudocode | No | The paper describes methods using mathematical formulas and textual descriptions, for example, in Section 3 titled 'Harmonic Loss', but does not contain explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of its source code or provide a link to a code repository for its methodology. |
| Open Datasets | Yes | We demonstrate the performance of harmonic models on the vision task of MNIST digit classification. We pre-train a GPT-2 small model (128M, based on Nano GPT) on Open Web Text. Harmonic loss slightly outperforms crossentropy loss on the Image Net benchmark. We evaluate two tasks, COLA (linguistic acceptibility) (Warstadt et al., 2018) and SST2 (sentence sentiment classification) (Socher et al., 2013). |
| Dataset Splits | No | The paper mentions evaluating 'test accuracy as a function of Train Fraction' for algorithmic datasets and 'validation losses' for GPT-2, and 'validation dataset' for SST-2 and CoLA, and 'Val Acc' for ImageNet, implying the existence of splits. However, it does not explicitly provide specific percentages, sample counts, or detailed methodologies for these splits across all experiments in a way that is immediately reproducible without relying on external knowledge of standard splits for benchmark datasets or making assumptions for custom ones. |
| Hardware Specification | Yes | We use 8 V100 GPUs, choose block size 1024, batch size 480 blocks. |
| Software Dependencies | No | The paper mentions using the 'Adam W optimizer' and that the GPT-2 model is 'based on Nano GPT', but does not provide specific version numbers for any key software components or libraries (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | We trained the MLP models for 7000 epochs and the transformers for 10000 epochs. For all four models, we used the Adam W optimizer with a learning rate of 2 10 3, a weight decay of 10 2, and an L2 regularization on the embeddings with strength 0.01. For MNIST: The models were trained with a batch size of 64, a learning rate of 0.001, and for 10 epochs. For GPT-2: We use 8 V100 GPUs, choose block size 1024, batch size 480 blocks. We use the Adam Optimizer with β1 = 0.9, β2 = 0.95. For the harmonic loss, we choose n = 768 28. We use a linear warmup learning rate schedule for 2k (1k) steps to maximum learning rate 6 10 4 (6 10 3), and a cosine decay schedule from 2k to 10k, ending at lr 3 10 5 (3 10 4). |