Deep Linear Probe Generators for Weight Space Learning

Authors: Jonathan Kahana, Eliahu Horwitz, Imri Shuval, Yedid Hoshen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation follows the standard protocol for weight space learning. We evaluate on two tasks: (i) CNNs generalization error prediction and (ii) detecting the training classes of images based on INR networks trained on them. We include experiments on small-scale established benchmarks as well as a new larger-scale Model Zoo which we present, using Res Net18(He et al., 2016) models. ... Table 3: Results for Small Scale Benchmarks. Comparison of Probe Gen, to graph based, mechanistic approaches and latent optimized probes. We average the results over 5 different seeds.
Researcher Affiliation Academia Jonathan Kahana, Eliahu Horwitz, Imri Shuval, Yedid Hoshen School of Computer Science and Engineering The Hebrew University of Jerusalem, Israel EMAIL
Pseudocode No The paper describes the methodology in prose and through figures, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Project page: https://vision.huji.ac.il/probegen/ ... 11 REPRODUCABILLITY In this work, we presented a new and light-weight framework for weight space learning. Our method is simple to implement, and can be easily reproduced. To encourage future work in this direction, we provide a short implementation of our method in the supplementary materials.
Open Datasets Yes We evaluate on 4 established datasets. For training data prediction we choose the MNIST and FMNIST implicit neural representation (INR) benchmarks (Navon et al., 2023a). ... For generalization error prediction, we used the CIFAR10-GS (Unterthiner et al., 2020) and CIFAR10 Wild Park tasks Kofinas et al. (2024). ... Tiny Imagenet Le & Yang (2015); Deng et al. (2009). ... we use the Neural-Field-Arena (Papa et al., 2024), to evaluate Probe Gen s ability in classifying INRs trained on point clouds from the Shape Net (Chang et al., 2015) dataset.
Dataset Splits Yes We evaluate on 4 established datasets. For training data prediction we choose the MNIST and FMNIST implicit neural representation (INR) benchmarks (Navon et al., 2023a). ... For generalization error prediction, we used the CIFAR10-GS (Unterthiner et al., 2020) and CIFAR10 Wild Park tasks Kofinas et al. (2024). ... Each Res Net model was trained on a a randomly selected subset of Tiny Imagenet Le & Yang (2015); Deng et al. (2009). We sampled the subset out of a closed list of 10 subsets, that we created in advance.
Hardware Specification No The paper mentions computational costs in terms of FLOPs and states that inferring about a model would require "computational resources equivalent to training such a model." However, it does not specify any particular hardware like GPU models, CPU types, or memory used for the experiments.
Software Dependencies No The paper states that an implementation of the method is provided in the supplementary materials, but it does not specify any software libraries or frameworks with their version numbers that would be required to reproduce the experiments.
Experiment Setup Yes Hyper-parameters. We use a learning rate of 3 10 4 and a batch size of 32 in all our experiments. Our MLP classifier C, uses 6 layers with a hidden size of 256. The latent vectors of each probe are of size 32. We trained all probing algorithms on the INR and CIFAR10 Wild Park experiments for 30 epochs, all experiments on the CIFAR10-GS dataset for 150 epochs, and all experiments on our Res Net18 Model Zoo dataset for 100 epochs.