reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Run Like a Neural Network, Explain Like k-Nearest Neighbor

Authors: Xiaomeng Ye, David Leake, Yu Wang, David Crandall

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	An evaluation of the revised architecture for image and language classification tasks illustrates its promise as a flexible and interpretable method. Computational experiments are carried out to test two main hypotheses: 1. NN-k NN using glocal feature weighting retains accuracy comparable to that of NN-k NNO using either global or individual feature weighting. 2. NN-k NN using sampling and feature extraction achieves comparable accuracy to neural counterparts on larger datasets, while providing instance-based explanations. 5 Experimental Results
Researcher Affiliation	Academia	Xiaomeng Ye1 , David Leake2 , Yu Wang2 and David Crandall2 1Berry College 2Indiana University Bloomington
Pseudocode	No	The paper describes the design of the extended NN-k NN in section 3 with numbered steps, but these are descriptive text and a workflow diagram (Figure 1), not structured pseudocode or algorithm blocks.
Open Source Code	Yes	All code can be found on https://github.com/Heuzi/NN-k NN
Open Datasets	Yes	All data sets (see Table 1) are from UCI repository [Dua and Graff, 2017] except Zebra (a) and Zebra (b), which are explained later. For image tasks, we carried out experiments on CIFAR-10 [Krizhevsky, 2009] and SVHN [Netzer et al., 2011]. For a language task, we conducted experiments using the Stanford Sentiment Treebank (SST) dataset.
Dataset Splits	Yes	For datasets with predefined train-test splits, we use the provided partitions. For datasets without such splits, we create a 90-10 train-test split. For smaller datasets, we use 10-fold cross-validation; for larger datasets, we conduct 10 iterations on the given train-test split. CIFAR-10 consists of 60,000 images...The dataset has a preset partition of 50,000 training samples and 10,000 testing samples. Similarly, SVHN...is divided into 73,257 images for training and 26,032 images for testing. The SST dataset contains 8,544 samples in the training set and 2,210 in the test set. SST-2 contains 6290 training samples and 1821 testing samples.
Hardware Specification	No	The paper mentions general concepts like "tensor computations and parallel computing" but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	We trained baseline models (e.g., convolutional image classifiers), with a learning rate of 1e 4. For NN-k NN without a feature extractor on small datasets (Section 5.1), we set the learning rate to 1e 2...For NN-k NN on larger datasets, we used varying learning rates tailored to different components of the model: 1e 4 for the feature extractor, 1e 3 for the glocal weighting parameters, and 1e 4 for case-related parameters...All models, including NN-k NN and the baseline models, are trained until testing accuracy stabilizes, defined as no improvement for 40 epochs (referred to as the training patience). For NN-k NN, we experimented with the setting w = 1, k = 20 and sampling 500 and 2000 cases for each query. In all NN-k NN experiments, we set w = 4, k = 5, and sample 500 cases for each query.