reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Neural Slot Interpreters: Grounding Object Semantics in Emergent Slot Representations

Authors: Bhishma Dedhia, Niraj Jha

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments with a bi-modal object-property and scene retrieval task demonstrate the grounding efficacy and interpretability of correspondences learned by NSI. From a scene representation standpoint, we find that emergent NSI slots that move beyond the image grid by binding to spatial objects facilitate improved visual grounding compared to conventional bounding-box-based approaches. From a data efficiency standpoint, we empirically validate that NSI learns more generalizable representations from a fixed amount of annotation data than the traditional approach.
Researcher Affiliation	Academia	Bhishma Dedhia Niraj K Jha Department of Electrical and Computer Engineering Princeton University EMAIL
Pseudocode	Yes	Algorithm 1 Neural Slot Interpreter Contrastive Learning Pseudocode
Open Source Code	No	The paper does not contain an explicit statement about releasing their source code, nor does it provide a direct link to a code repository.
Open Datasets	Yes	Our experiments encompass different tasks on scenes ranging from synthetic renderings to in-the-wild scenes viz. (1) CLEVr Hans (Stammer et al., 2020): objects scattered on a plane, (2) CLEVr Tex (Karazija et al., 2021): textured objects placed on textured backgrounds (3) MOVi-C (Greff et al., 2022): photorealistic objects on real-world surfaces, and (4) MS-COCO 2017 (Lin et al., 2015): a large-scale object detection dataset containing real-world images.
Dataset Splits	Yes	The dataset splits used in this work are detailed in Table 3. Table 3: Dataset splits used in experiments. Name Train Split Size Validation Split Size Test Split Size CLEVr Hans 3 9000 2250 2250 CLEVr Hans 7 21000 5250 5250 CLEVr Tex 37500 2500 10000 MOVi-C 198635 35053 6000 MS COCO 2017 99676 17590 4952
Hardware Specification	Yes	We list the hyperparameters for NSI and other methods used in our experiments, which were all performed on Nvidia A100 GPUs.
Software Dependencies	No	The paper mentions several software components like DINO Vi T, PyTorch, MLP, Transformer, Gated Recurrent Unit, Slot Attention, and the Hungarian Algorithm, but it does not specify version numbers for any of these.
Experiment Setup	Yes	The hyperparameters for ungrounded and HMC matching backbones are given in Table 4. The hyperparameters for the NSI alignment model are listed in Table 5.