Spatial Reasoning with Denoising Models
Authors: Christopher Wewer, Bartlomiej Pogodzinski, Bernt Schiele, Jan Eric Lenssen
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce Spatial Reasoning Models (SRMs), a framework to perform reasoning over sets of continuous variables via denoising generative models. ... To measure this, we introduce a set of benchmark tasks that test the quality of complex reasoning in generative models and can quantify hallucination. ... We evaluate SRMs for reasoning on three new benchmark datasets that we introduce in Sec. 4.1. |
| Researcher Affiliation | Academia | 1Max Planck Institute for Informatics, Saarland Informatics Campus, Germany. Correspondence to: Christopher Wewer <EMAIL>, Bart Pogodzinski <EMAIL>, Jan Eric Lenssen <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Recursive Sampling of a sum constrained Vector |
| Open Source Code | Yes | Our project website provides additional videos, code, and the benchmark datasets. ... Our framework, code, and benchmarks are available on our project website for further investigation and development. |
| Open Datasets | Yes | We introduce three different datasets to quantify reasoning capabilities. They are aimed at different aspects to be tested. The MNIST Sudoku dataset captures complex (NP-hard) dependencies that need to be understood. The Even Pixels dataset is an easier task that can be solved in a greedy fashion. Finally, we introduce the Counting Polygons / Stars FFHQ dataset, which moves closer to real-world images. ... Our project website provides additional videos, code, and the benchmark datasets. |
| Dataset Splits | No | For testing, we use a held-out dataset split of valid Sudokus and apply random masking of cells with the number of masked ones randomly sampled from the intervals [1, 27], [28, 54], and [55, 81], resulting in three levels of difficulty easy, medium, and hard, respectively. As metrics, we consider accuracy as well as the sum of L1-distances of row-, column-, and block-wise digit histograms to the all ones vector (zero if correct), averaged over all test examples. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions general architectural choices like "2D UNets" and "Diffusion Transformers". |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. It mentions methods like "rectified flows (Liu et al., 2023)" but not the underlying software environment or library versions used for implementation. |
| Experiment Setup | Yes | Table 5: Hyperparameters used for all experiments. (MNIST Sudoku, Even Pixels, Counting Polygons/Stars FFHQ) Channels, Depth, Channel multipliers, Head channels, Attention resolution, Parameters, Effective batch size, Iterations, Learning Rate. |