Identifiable Object Representations under Spatial Ambiguities

Authors: Avinash Kori, Francesca Toni, Ben Glocker

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on standard benchmarks and novel complex datasets validate our method s robustness and scalability.
Researcher Affiliation Academia 1Department of Computing, Imperial College London, London, UK. Correspondence to: Avinash Kori <EMAIL>.
Pseudocode Yes Algorithm 1 View Invariant Slot Attention VISA
Open Source Code No All experimental scripts will be made available on Git Hub at a later stage.
Open Datasets Yes We first evaluate the framework on standard benchmarks, specifically focusing on CLEVR-MV, CLEVR-AUG, GQN (Li et al., 2020); we additionally demonstrate the framework s scalability to highly diverse setting with GSO (Downs et al., 2022) and proposed datasets MV-MOVIC, MV-MOVID which are multiview versions of Mo Vi C dataset with fixed and varying scene-specific cameras (Greff et al., 2022).
Dataset Splits No The paper describes in-domain and out-of-domain evaluations based on viewpoint groups, but does not provide specific train/validation/test dataset splits (percentages or counts) for model training or evaluation on the mentioned benchmarks or generated datasets.
Hardware Specification Yes We run all our experiments on a cluster with a Nvidia NVIDIA L40 48GB GPU cards.
Software Dependencies No Table 5 mentions optimizers like ADAM and ADAMW, but does not specify version numbers for any software libraries (e.g., Python, PyTorch, CUDA, etc.).
Experiment Setup Yes In Table 5, we detail all the hyper-parameters used in our experiments. In the case of benchmark experiments, we use trainable CNN encoder as used in (Locatello et al., 2020b; Kori et al., 2023), while in the case of proposed MVMOVI datasets we use DINO (Caron et al., 2021) encoder to extract image features and change our objective to reconstruct these features rather than the original image as proposed in (Seitzer et al., 2022). For most of hyperparameters we use the values suggested by (Locatello et al., 2020b; Seitzer et al., 2022), based on their ablation results.