Diffusion Models are Evolutionary Algorithms
Authors: Yanbo Zhang, Benedikt Hartl, Hananel Hazan, Michael Levin
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct two sets of experiments to study Diffusion Evolution in terms of diversity and solving complex reinforcement learning tasks. Moreover, we utilize techniques from the diffusion models literature to improve Diffusion Evolution. In the first experiment, we adopt an accelerated sampling method (Nichol & Dhariwal, 2021) to significantly reduce the number of iterations. In the second experiment, we propose Latent Space Diffusion Evolution, inspired by latent space diffusion models (Rombach et al., 2022), allowing us to deploy our approach to complex problems with high-dimensional parameter spaces by exploring a lower-dimensional latent space. The experiments show that our Diffusion Evolution algorithm can find diverse solutions on the Himmelblau, Ackley, and Rastrigin functions. In contrast, CMA-ES and Open ES struggle, as they either focus on finding a single solution or get distracted by multiple high-fitness peaks, leading to sub-optimal results. While the MAP-Elite method demonstrates diverse solutions, the corresponding average fitness is lower than that of our method. Our experiments demonstrate that the Diffusion Evolution algorithm can identify highfitness and diverse solutions and adapt to various fitness landscapes (see Figure 3 and Table 1). |
| Researcher Affiliation | Academia | 1 Allen Discovery Center at Tufts University, Medford, MA, 02155, USA 2 Institute for Theoretical Physics, TU Wien, Austria 3 Wyss Institute for Biologically Inspired Engineering at Harvard University, Boston, MA, 02115, USA |
| Pseudocode | Yes | See the psuedocode of Diffusion Evolution in Algorithm 1. When inversely denoising, i.e., evolving from time T to 0, while increasing αt, the Gaussian term will initially have a high variance, allowing global exploration at first. As the evolution progresses, the variance decreases giving lower weight to distant populations, leads to local optimization (exploitation). This locality avoids global competition, akin to reproductive isolation in biology, enabling the algorithm to maintain multiple solutions and balance exploration and exploitation. The weighted average in Equation 7 then acts as a gene recombination operation. Hence, the denoising process of diffusion models can be understood in an evolutionary manner: ˆx0 represents an estimated high fitness parameter target. In contrast, xt can be considered as diffused from high-fitness points. The first two parts in the Equation 5, i.e., αt 1 ˆx0 + p 1 αt 1 σ2 t ˆϵ, guide the individuals towards high fitness targets in small steps. The last part of Equation 5, σtw, is an integral part of diffusion models, perturbing the parameters in our approach similarly to random mutations. Algorithm 1 Diffusion Evolution Require: Population size N, parameter dimension D, fitness function f, density mapping function g, total evolution steps T, diffusion schedule α and noise schedule σ. Ensure: α0 1, αT 0, αi > αi+1, 0 < σi < 1 αi 1 1: [x(1) T , x(2) T , ..., x(N) T ] N(0, IN D) Initialize population 2: for t [T, T 1, ..., 2] do 3: i [1, N] : Qi g[f(x(i) t )] Fitness are cached to avoid repeated evaluations 4: for i [1, 2, .., N] do j=1 Qj N(x(i) t ; αtx(j) t , 1 αt) j=1 Qj N(x(i) t ; αtx(j) t , 1 αt)x(j) t 7: w N(0, ID) 8: x(i) t 1 αt 1 ˆx0 + p 1 αt 1 σ2 t x(i) t αt ˆx0 1 αt + σtw 9: end for 10: end for |
| Open Source Code | Yes | Codes are available on Github1. 1https://github.com/Zhangyanbo/diffusion-evolution |
| Open Datasets | Yes | To compare our method to selected mainstream evolutionary algorithms, we choose five different two-dimensional fitness landscapes as benchmarks: The Rosenbrock and Beale fitness functions have a single optimal point, while the Himmelblau, Ackley, and Rastrigin functions have multiple optimal solutions, see Appendix A.4 for more details; we compare our method to other evolutionary strategies, including CMA-ES (Hansen et al., 2003), Open ES (Salimans et al., 2017), PEPG (Sehnke et al., 2010), and MAP-Elite (Mouret & Clune, 2015). We apply the Diffusion Evolution method to reinforcement learning tasks (Sutton & Barto, 1998) to train neural networks for controlling the cart-pole system (Barto et al., 1983), among other tasks (see Appendix A.5). |
| Dataset Splits | No | The paper mentions that for benchmark experiments, "Each experiment was conducted with a population of 512 for 25 iterations". For the cart-pole task, it states "The task is considered solved when a fitness score (cumulative reward) of 500 is reached consistently over several episodes." However, these describe population management and success criteria for an evolutionary algorithm or RL task, not explicit training/test/validation dataset splits typically found in supervised learning, nor does it specify how the RL environment interactions are divided into such splits. |
| Hardware Specification | No | The authors acknowledge the Tufts University High Performance Compute Cluster 2 and the Vienna Scientific Cluster 3 which have been utilized for the research reported in this paper. These are general names for computing clusters and do not provide specific hardware details like GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with their version numbers. While it references various algorithms and models that imply the use of certain programming languages and libraries (e.g., neural networks implying Python with frameworks like PyTorch or TensorFlow), it does not provide the concrete version numbers required for reproducibility. |
| Experiment Setup | Yes | Each experiment was conducted with a population of 512 for 25 iterations, except for the Open ES method, which requires 1000 steps to converge. To quantify diversity, we then calculated the Shannon entropy of the final population by gridding the space and counting the individuals in different grid cells (we select the top-64 fitness individuals, focusing solely on elite individuals). The most time-consuming part of evolutionary algorithms is often the fitness evaluation. In this experiment, we adopt an accelerated sampling method from the diffusion models literature to reduce the number of iterations. As proposed by Nichol & Dhariwal (2021), instead of the default αt scheduling of DDPM, a cosine scheduling αt = cos(πt/T)/2 + 1/2 leads to better performance when T is small (see Appendix A.2 for a detailed comparison). For σt, we follow the DDIM setting with a slight modification for better control: σt = σm * sqrt(1 − αt / αt-1) * (1 − αt) / (1 − αt-1), where 0 ≤ σm ≤ 1 is a hyperparameter to control the magnitude of noise. We use σm = 1 for most experiments, except for the experiment demonstrating the process in Figure 2, which requires a lower noise magnitude σm = 0.1 for better visualization. We use a two-layer neural network of 58 parameters to control the cart, with inputs being the current position, velocity, pole angle, and pole angular velocity. The output of the neural network determines whether to move left or right. See more details about the neural network in Appendix A.5.1. Our standard experiment uses a one-hidden-layer neural network with the hidden layer of 8 neurons, resulting in (4 * 8 + 8) + (8 * 2 + 2) = 58 parameters. We also use a deeper neural network with two hidden layers (each has 128 neurons), totaling (4 * 128 + 128) + (128 * 128 + 128) + (128 * 2 + 2) = 17,410 parameters. Both neural networks use the ReLU activation function. Specifically, we rescaled the input to have a standard deviation of one to improve training. |