Probing the Latent Hierarchical Structure of Data via Diffusion Models

Authors: Antonio Sclocchi, Alessandro Favero, Noam Levi, Matthieu Wyart

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Remarkably, we confirm this prediction in both text and image datasets using state-of-the-art diffusion models. Our results show how latent variable changes manifest in the data and establish how to measure these effects in real data using diffusion models.
Researcher Affiliation Academia Antonio Sclocchi Institute of Physics, EPFL Alessandro Favero Institute of Physics, EPFL Noam Itzhak Levi Institute of Physics, EPFL Matthieu Wyart Department of Physics and Astronomy, Johns Hopkins
Pseudocode No The paper describes algorithms such as Belief Propagation and diffusion processes in a structured text format and with equations, but does not present them within a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper does not contain an explicit statement about releasing code or a direct link to a code repository for the methodology described.
Open Datasets Yes We perform forward-backward experiments with state-of-the-art masked diffusion language models (MDLM) (Sahoo et al., 2024) on Wiki Text. ... Vision diffusion models We extend our analysis to computer vision by considering Improved Denoising Diffusion Probabilistic Models (Nichol & Dhariwal, 2021), trained on the Image Net dataset.
Dataset Splits Yes We present the average correlation functions and the susceptibility for vision DDPMs, starting from samples of the Image Net validation set (Deng et al., 2009).
Hardware Specification No The paper mentions that experiments were run for language and vision diffusion models, but no specific hardware details (e.g., GPU models, CPU types, memory) are provided.
Software Dependencies No The paper mentions models and tools such as "GPT2 tokenizer", "CLIP Vi T-B32", "MDLM", and "Improved Denoising Diffusion Probabilistic Models", but it does not specify software versions for these or for any underlying programming languages or libraries (e.g., Python, PyTorch, CUDA versions) used for implementation.
Experiment Setup Yes The results are averaged over NS = 300 samples, each consisting of NT = 128 tokens, with NR = 50 noise realizations for each masking fraction. ... Data obtained with 344 starting images and 128 diffusion trajectories per starting image. ... we divide each image into 7 7 patches and use the last-layer embeddings for each patch from a CLIP Vi T-B32 (Radford et al., 2021) to tokenize the image.