reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SFESS: Score Function Estimators for $k$-Subset Sampling

Authors: Klas Wijk, Ricardo Vinuesa, Hossein Azizpour

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we validate our proposed estimator in three main experimental settings: feature selection, variational autoencoders (VAE), and stochastic k-nearest-neighbors (k-NN). In this set of problems, the k-subset distribution is used in various ways: as the first operation in feature selection, as the mid-point bottleneck in a VAE, and in computing the final loss in stochastic k-NN (see Figure 3a). We use MNIST (Le Cun et al., 1998) and FASHION MNIST (Xiao et al., 2017) with the canonical train and test splits. We withhold 10,000 samples from the train set for validation. For all training, we use a batch size of 128 and train for 50,000 steps using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 1e 4 and parameters β1 = 0.9 and β2 = 0.999. We compare our proposed method with variance reduction (SFESS + VR) using 32 variance reduction samples to relaxed subset sampling (GS) and its straight-through variant (STGS) (Xie & Ermon, 2019), implicit maximum likelihood estimation (I-MLE) (Niepert et al., 2021), SIMPLE (Ahmed et al., 2023), and SFESS without variance reduction.
Researcher Affiliation	Academia	Klas Wijk1,2 Ricardo Vinuesa1,2 Hossein Azizpour1,2,3 1KTH Royal Institute of Technology 2Swedish e-Science Research Centre 3Science for Life Laboratory EMAIL EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Subset sampling using Gumbel top-k Algorithm 2 SFESS + VR: Score function estimator for k-subset sampling with variance reduction
Open Source Code	Yes	1Code available at https://github.com/klaswijk/sfess.
Open Datasets	Yes	We use MNIST (Le Cun et al., 1998) and FASHION MNIST (Xiao et al., 2017) with the canonical train and test splits.
Dataset Splits	Yes	We use MNIST (Le Cun et al., 1998) and FASHION MNIST (Xiao et al., 2017) with the canonical train and test splits. We withhold 10,000 samples from the train set for validation.
Hardware Specification	No	The computations were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725 as well as the Berzelius resource provided by the Knut and Alice Wallenberg Foundation at the National Supercomputer Centre.
Software Dependencies	No	5We use the Nvidia cu FFT implementation in Py Torch.
Experiment Setup	Yes	For all training, we use a batch size of 128 and train for 50,000 steps using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 1e 4 and parameters β1 = 0.9 and β2 = 0.999.