Identifiable Object Representations under Spatial Ambiguities
Authors: Avinash Kori, Francesca Toni, Ben Glocker
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on standard benchmarks and novel complex datasets validate our method s robustness and scalability. |
| Researcher Affiliation | Academia | 1Department of Computing, Imperial College London, London, UK. Correspondence to: Avinash Kori <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 View Invariant Slot Attention VISA |
| Open Source Code | No | All experimental scripts will be made available on Git Hub at a later stage. |
| Open Datasets | Yes | We first evaluate the framework on standard benchmarks, specifically focusing on CLEVR-MV, CLEVR-AUG, GQN (Li et al., 2020); we additionally demonstrate the framework s scalability to highly diverse setting with GSO (Downs et al., 2022) and proposed datasets MV-MOVIC, MV-MOVID which are multiview versions of Mo Vi C dataset with fixed and varying scene-specific cameras (Greff et al., 2022). |
| Dataset Splits | No | The paper describes in-domain and out-of-domain evaluations based on viewpoint groups, but does not provide specific train/validation/test dataset splits (percentages or counts) for model training or evaluation on the mentioned benchmarks or generated datasets. |
| Hardware Specification | Yes | We run all our experiments on a cluster with a Nvidia NVIDIA L40 48GB GPU cards. |
| Software Dependencies | No | Table 5 mentions optimizers like ADAM and ADAMW, but does not specify version numbers for any software libraries (e.g., Python, PyTorch, CUDA, etc.). |
| Experiment Setup | Yes | In Table 5, we detail all the hyper-parameters used in our experiments. In the case of benchmark experiments, we use trainable CNN encoder as used in (Locatello et al., 2020b; Kori et al., 2023), while in the case of proposed MVMOVI datasets we use DINO (Caron et al., 2021) encoder to extract image features and change our objective to reconstruct these features rather than the original image as proposed in (Seitzer et al., 2022). For most of hyperparameters we use the values suggested by (Locatello et al., 2020b; Seitzer et al., 2022), based on their ablation results. |