Spatial Reasoning with Denoising Models

Authors: Christopher Wewer, Bartlomiej Pogodzinski, Bernt Schiele, Jan Eric Lenssen

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We introduce Spatial Reasoning Models (SRMs), a framework to perform reasoning over sets of continuous variables via denoising generative models. ... To measure this, we introduce a set of benchmark tasks that test the quality of complex reasoning in generative models and can quantify hallucination. ... We evaluate SRMs for reasoning on three new benchmark datasets that we introduce in Sec. 4.1.
Researcher Affiliation Academia 1Max Planck Institute for Informatics, Saarland Informatics Campus, Germany. Correspondence to: Christopher Wewer <EMAIL>, Bart Pogodzinski <EMAIL>, Jan Eric Lenssen <EMAIL>.
Pseudocode Yes Algorithm 1 Recursive Sampling of a sum constrained Vector
Open Source Code Yes Our project website provides additional videos, code, and the benchmark datasets. ... Our framework, code, and benchmarks are available on our project website for further investigation and development.
Open Datasets Yes We introduce three different datasets to quantify reasoning capabilities. They are aimed at different aspects to be tested. The MNIST Sudoku dataset captures complex (NP-hard) dependencies that need to be understood. The Even Pixels dataset is an easier task that can be solved in a greedy fashion. Finally, we introduce the Counting Polygons / Stars FFHQ dataset, which moves closer to real-world images. ... Our project website provides additional videos, code, and the benchmark datasets.
Dataset Splits No For testing, we use a held-out dataset split of valid Sudokus and apply random masking of cells with the number of masked ones randomly sampled from the intervals [1, 27], [28, 54], and [55, 81], resulting in three levels of difficulty easy, medium, and hard, respectively. As metrics, we consider accuracy as well as the sum of L1-distances of row-, column-, and block-wise digit histograms to the all ones vector (zero if correct), averaged over all test examples.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions general architectural choices like "2D UNets" and "Diffusion Transformers".
Software Dependencies No The paper does not provide specific software dependencies with version numbers. It mentions methods like "rectified flows (Liu et al., 2023)" but not the underlying software environment or library versions used for implementation.
Experiment Setup Yes Table 5: Hyperparameters used for all experiments. (MNIST Sudoku, Even Pixels, Counting Polygons/Stars FFHQ) Channels, Depth, Channel multipliers, Head channels, Attention resolution, Parameters, Effective batch size, Iterations, Learning Rate.