reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Support-Set Context Matters for Bongard Problems

Authors: Nikhil Raghuraman, Adam W Harley, Leonidas Guibas

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our Transformer-based approach sets a new state-of-the-art on Bongard-LOGO (75.3%) and Bongard-HOI (76.4%) compared to approaches with equivalent backbone architectures, and also performs fairly well on the original Bongard problem set (60.8%).
Researcher Affiliation	Academia	Nikhil Raghuraman EMAIL Department of Computer Science Stanford University Adam W. Harley EMAIL Department of Computer Science Stanford University Leonidas Guibas EMAIL Department of Computer Science Stanford University
Pseudocode	No	The paper only describes steps in regular paragraph text without structured formatting or dedicated pseudocode blocks.
Open Source Code	Yes	Code is available at https://github.com/nraghuraman/bongard-context.
Open Datasets	Yes	We evaluate on the Bongard-LOGO dataset (Nie et al., 2021)... Bongard-HOI (Jiang et al., 2023)... Mikhail Bongard s original dataset (Foundalis, 2006; Yun et al., 2020)... We publicly release this cleaned training dataset to aid future works.
Dataset Splits	Yes	For Bongard-LOGO... The train, validation, and test sets consist of 9300, 900, and 1800 problems, respectively. The test set is further subdivided into four splits... For Bongard-HOI... The dataset authors defined train, validation, and test sets consisting of 23041, 17187, and 13941 problems, respectively. We arbitrarily select the final positive and negative images as queries... For Bongard-Classic... we select one positive and one negative example as queries.
Hardware Specification	Yes	We ran all experiments on NVIDIA Titan RTX, Titan XP, Ge Force, or A5000 GPUs. Each experiment used at most 1 GPU and 30GB of GPU memory.
Software Dependencies	No	The paper mentions various models and tools used (e.g., CLIP, ResNet, Transformer, GPT-4o, DINO) and references the Open AI API, but does not provide specific version numbers for any key software libraries or dependencies like PyTorch, TensorFlow, or CUDA.
Experiment Setup	Yes	To train our models, we used the Adam W optimizer (Loshchilov & Hutter, 2017), a maximum learning rate of 5e 5, and a 1-cycle learning rate policy (Smith & Topin, 2019)... We train models for 500,000 iterations with a batch size of 2. We resize images to 512 512 pixels and apply random cropping and horizontal flipping to augment... We train for 10,000 iterations with a batch size of 16 and weight decay of 0.01. We use augmentations at training time, including horizontal flips, random grayscale, color jitter, and random rescaling and cropping (to 224 224), and apply support dropout and label noise.