Deep Networks Learn Features From Local Discontinuities in the Label Function
Authors: Prithaj Banerjee, Harish G Ramaswamy, Mahesh Yadav, CHANDRA SHEKAR LAKSHMINARAYANAN
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To test this hypothesis, we perform experiments on classification data where the true label function is given by an oblique decision tree. This setup allows easy enumeration of label function discontinuities, while still remaining intractable for static kernel/linear methods. We then design/construct a novel deep architecture called a Deep Linearly Gated Network (DLGN), whose discontinuities in the input space can be easily enumerated. In this setup, we provide supporting evidence demonstrating the movement of model function discontinuities towards the label function discontinuities during training. The easy enumerability of discontinuities in the DLGN also enables greater mechanistic interpretability. We demonstrate this by extracting the parameters of a high-accuracy decision tree from the parameters of a DLGN. We also show that the DLGN is competitive with Re LU networks and other tree-learning algorithms on several real-world tabular datasets. |
| Researcher Affiliation | Academia | Prithaj Banerjee, Harish G. Ramaswamy, Yadav Mahesh Lorik, Chandrashekar Lakshminarayanan Indian Institute of Technology Madras, India {{prithaj,maheshyadav}@cse, {hariguru,chandarashekar}@dsai}.iitm.ac.in |
| Pseudocode | Yes | Algorithm 1 Building a decision tree from trained DLGN Algorithm 2 Finding Discontinuous Hyperplane Algorithm 3 Return gates of a trained DLGN model |
| Open Source Code | No | The paper does not contain any explicit statement about releasing the source code for the methodology described, nor does it provide a link to a code repository. It only mentions the experimental setup and hyperparameters for the algorithms used. |
| Open Datasets | Yes | Most datasets are available in the UCI repository.1 Some are taken from Open ML benchmark: (Grinsztajn et al., 2022).2 1https://archive.ics.uci.edu/datasets 2https://www.openml.org/search?type=benchmark&study_type=task&sort= tasks_included&id=298 Table 8: Tabular datasets |
| Dataset Splits | Yes | For the synthetic datasets SDI, SDII, and SDIII, the dataset is split into 50% train, 25% test, and 25% validation set. Models are trained on the training data and validated on the validation set, and then the test score is reported against the test data with the best hyperparameters. Similarly, for the tabular datasets, the dataset is split into 60% train, 20% test, and 20% validation set. |
| Hardware Specification | Yes | All the experiments are performed on Kaggle, and all the neural network-based experiments use GPU, whereas traditional ML algorithms are performed on CPU. Kaggle provided GPUs, such as GPU T4 x2 and GPU P100. |
| Software Dependencies | No | The paper mentions several algorithms and tools (e.g., DBScan, Adam, SGD) but does not provide specific version numbers for any software libraries or dependencies used in their implementation. |
| Experiment Setup | Yes | A.10.4 HYPERPARAMETERS TUNING: Each algorithm used in this paper has a different set of hyperparameters, and hyperparameter tuning is one of the most important aspects for getting the best accuracy. Here, for each algorithm, we extensively searched from a collection of hyperparameters and validated their result on the validation set to obtain the best-performing hyperparameters. Test results are reported based on the best hyperparameters. The below tables give a list of all hyperparameters for each algorithm used. (Tables 9-21 detail specific hyperparameters for each model, including learning rates, epochs, batch sizes, optimizers, etc.) |