Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
MotherNet: Fast Training and Inference via Hyper-Network Transformers
Authors: Andreas Mueller, Carlo Curino, Raghu Ramakrishnan
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Mother Net on two tabular benchmarks, the small datasets in Open ML CC18, as used by Hollmann et al. (2022) and a version of the Tab Zilla benchmark (Mc Elfresh et al., 2024). Quantitative results are shown in Figure 2 and Table 4, where errors are given over the five paired splits of the data. We can see that Tab PFN outperforms all other methods, though not statistically significantly so, even at 60 minutes of tuning time for reference methods. |
| Researcher Affiliation | Industry | Andreas C. M uller, Carlo Curino & Raghu Ramakrishnan Gray Systems Lab Microsoft EMAIL |
| Pseudocode | No | The paper describes the methodology in Section 3 and illustrates the architecture in Figure 1. There is no explicit section or figure labeled 'Pseudocode' or 'Algorithm' presenting structured algorithmic steps. |
| Open Source Code | Yes | Training and inference code and pre-trained model weights are made publicly available 1. 1https://github.com/microsoft/ticl |
| Open Datasets | Yes | Using a fixed model structure, we are able to produce neural networks that work well on small numeric tabular datasets from the Open ML CC-18 benchmark suite (Bischl et al., 2017), and show that our approach also provides a good trade-off of speed and accuracy on the Tab Zilla dataset collection Mc Elfresh et al. (2024). |
| Dataset Splits | Yes | As in Hollmann et al. (2022), we split each dataset 50/50 into training (or in-context learning) and test set, and repeat this split five times. For this evaluation, we follow Mc Elfresh et al. (2024) in their setup for Tab PFN, and subsample 3000 data points for Mother Net, as the full datasets are too large for the transformer architectures. |
| Hardware Specification | Yes | We train Mother Net on a single A100 GPU with 80GB of GPU memory, which takes approximately four weeks. We were able to process up to 30,000 data points on an A100 GPU with 80GB of memory, and 100,000 samples on CPU. All our experiments were done on a A100 GPU with 80GB of RAM on cloud infrastructure. |
| Software Dependencies | No | The paper mentions using 'scikit-learn (Pedregosa et al., 2011)' and 'Hyper Opt Bergstra et al. (2011)' for baseline hyperparameter tuning, but does not provide specific version numbers for the software dependencies used in their own experimental setup. |
| Experiment Setup | Yes | We are using increasing batch sizes of 8, 16 and 32 and a learning rate of 0.00003, with cosine annealing (Loshchilov & Hutter, 2016). |