Probing the Latent Hierarchical Structure of Data via Diffusion Models
Authors: Antonio Sclocchi, Alessandro Favero, Noam Levi, Matthieu Wyart
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Remarkably, we confirm this prediction in both text and image datasets using state-of-the-art diffusion models. Our results show how latent variable changes manifest in the data and establish how to measure these effects in real data using diffusion models. |
| Researcher Affiliation | Academia | Antonio Sclocchi Institute of Physics, EPFL Alessandro Favero Institute of Physics, EPFL Noam Itzhak Levi Institute of Physics, EPFL Matthieu Wyart Department of Physics and Astronomy, Johns Hopkins |
| Pseudocode | No | The paper describes algorithms such as Belief Propagation and diffusion processes in a structured text format and with equations, but does not present them within a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing code or a direct link to a code repository for the methodology described. |
| Open Datasets | Yes | We perform forward-backward experiments with state-of-the-art masked diffusion language models (MDLM) (Sahoo et al., 2024) on Wiki Text. ... Vision diffusion models We extend our analysis to computer vision by considering Improved Denoising Diffusion Probabilistic Models (Nichol & Dhariwal, 2021), trained on the Image Net dataset. |
| Dataset Splits | Yes | We present the average correlation functions and the susceptibility for vision DDPMs, starting from samples of the Image Net validation set (Deng et al., 2009). |
| Hardware Specification | No | The paper mentions that experiments were run for language and vision diffusion models, but no specific hardware details (e.g., GPU models, CPU types, memory) are provided. |
| Software Dependencies | No | The paper mentions models and tools such as "GPT2 tokenizer", "CLIP Vi T-B32", "MDLM", and "Improved Denoising Diffusion Probabilistic Models", but it does not specify software versions for these or for any underlying programming languages or libraries (e.g., Python, PyTorch, CUDA versions) used for implementation. |
| Experiment Setup | Yes | The results are averaged over NS = 300 samples, each consisting of NT = 128 tokens, with NR = 50 noise realizations for each masking fraction. ... Data obtained with 344 starting images and 128 diffusion trajectories per starting image. ... we divide each image into 7 7 patches and use the last-layer embeddings for each patch from a CLIP Vi T-B32 (Radford et al., 2021) to tokenize the image. |