SFESS: Score Function Estimators for $k$-Subset Sampling

Authors: Klas Wijk, Ricardo Vinuesa, Hossein Azizpour

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we validate our proposed estimator in three main experimental settings: feature selection, variational autoencoders (VAE), and stochastic k-nearest-neighbors (k-NN). In this set of problems, the k-subset distribution is used in various ways: as the first operation in feature selection, as the mid-point bottleneck in a VAE, and in computing the final loss in stochastic k-NN (see Figure 3a). We use MNIST (Le Cun et al., 1998) and FASHION MNIST (Xiao et al., 2017) with the canonical train and test splits. We withhold 10,000 samples from the train set for validation. For all training, we use a batch size of 128 and train for 50,000 steps using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 1e 4 and parameters β1 = 0.9 and β2 = 0.999. We compare our proposed method with variance reduction (SFESS + VR) using 32 variance reduction samples to relaxed subset sampling (GS) and its straight-through variant (STGS) (Xie & Ermon, 2019), implicit maximum likelihood estimation (I-MLE) (Niepert et al., 2021), SIMPLE (Ahmed et al., 2023), and SFESS without variance reduction.
Researcher Affiliation Academia Klas Wijk1,2 Ricardo Vinuesa1,2 Hossein Azizpour1,2,3 1KTH Royal Institute of Technology 2Swedish e-Science Research Centre 3Science for Life Laboratory EMAIL EMAIL EMAIL
Pseudocode Yes Algorithm 1 Subset sampling using Gumbel top-k Algorithm 2 SFESS + VR: Score function estimator for k-subset sampling with variance reduction
Open Source Code Yes 1Code available at https://github.com/klaswijk/sfess.
Open Datasets Yes We use MNIST (Le Cun et al., 1998) and FASHION MNIST (Xiao et al., 2017) with the canonical train and test splits.
Dataset Splits Yes We use MNIST (Le Cun et al., 1998) and FASHION MNIST (Xiao et al., 2017) with the canonical train and test splits. We withhold 10,000 samples from the train set for validation.
Hardware Specification No The computations were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725 as well as the Berzelius resource provided by the Knut and Alice Wallenberg Foundation at the National Supercomputer Centre.
Software Dependencies No 5We use the Nvidia cu FFT implementation in Py Torch.
Experiment Setup Yes For all training, we use a batch size of 128 and train for 50,000 steps using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 1e 4 and parameters β1 = 0.9 and β2 = 0.999.