Lifelong Scalable Generative System via Online Maximum Mean Discrepancy

Authors: Fei Ye, Adrian G. Bors

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive results show that the proposed approach performs better than the state-of-the-art. We perform a series of experiments on unsupervised image generation, showing that DEMU significantly relieves the DDPM model forgetting. The contributions of this research study are as follows: (4) We perform a series of experiments showing that the proposed methodology achieves state-of-the-art performance.
Researcher Affiliation Academia 1School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 2Department of Computer Science, University of York, York YO10 5GH, UK EMAIL, EMAIL
Pseudocode Yes Algorithm Implementation We illustrate the learning procedure of a single model for DEMU in Fig. 2. Training DEMU consists of 4 steps : Step 1 (Training). Initially, if the memory is empty M = , we build the first memory unit M1 = {x1} and add it to M. We train the DDPM model ϵθ and the representation network {encϕ, decη} on M xi using the DDPM objective function and reconstruction loss, respectively. Step 2 (Sample selection). We form a joint set Mk xi and sort it to M using Eq. (8). Then the current memory unit Mk is updated using (9). Step 3 (Memory expansion). We build the second memory unit M2 at the training step (T100) to use the memory expansion criterion from Eq. (5). Other samples are added in the same way to the memory buffers. Step 4 (Memory reduction). If the memory buffer is full, we repeatedly perform Eq. (10) and Eq. (11) to remove those memory units containing redundant information units, until the memory reaches its maximum capacity, |M| = D.
Open Source Code Yes The code is available 1 https://github.com/dtuzi123/DEMU
Open Datasets Yes Datasets DEMU DEMU-D LTS LGM R-VAE R-DDPM CGKD-GAN CVA CGKD-WAE CGKD-VAE Split MNIST 20.18 19.23 71.67 66.31 55.67 63.26 54.34 21.46 47.98 48.72 Split Fashion 43.34 37.64 128.84 109.20 103.25 82.23 85.23 67.28 87.92 88.16 Split SVHN 62.12 53.49 87.25 72.60 65.18 87.22 101.26 57.14 100.15 102.87 Split CIFAR10 83.67 78.70 124.22 177.15 155.72 106.18 115.38 74.97 162.12 163.75 ... Celeb A-3DChair 81.53 79.79 186.25 241.14 210.18 183.72 132.12 142.62 154.45 156.62 Celeb A-CACD 73.11 69.85 124.87 117.76 121.52 103.52 78.00 92.83 142.52 145.23 Split MINIImage Net 141.66 123.65 179.78 216.06 205.12 181.15 176.18 177.17 241.11 243.37
Dataset Splits Yes We divide each original dataset into five independent parts, where each part contains samples from two consecutive classes, as in (Aljundi, Kelchtermans, and Tuytelaars 2019), resulting in Split MNIST, Split Fashion, Split SVHN and Split CIFAR10. ... The Split MINIImage Net data stream is divided into 16 tasks, each containing samples from five successive classes.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using specific models like Denoising Diffusion Generative Model (DDPM), Variational Autoencoders (VAEs), and Generative Adversarial Nets (GANs), but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch 1.9, Python 3.8, TensorFlow 2.x).
Experiment Setup Yes Baselines and hyperparameters. ... The batch size b for each processing time is considered as b = 64. The maximum memory size for all models is D = 2, 000. ... We change the threshold λ [0.02, 0.05] in Eq. (5) when training on Split MNIST and examine the DEMU s performance. ... The proposed mechanism evaluates expansion signals measured by the MMD distance between the information recorded by each expert and that of the incoming data batch, expanding the model whenever needed. ... λ2 [0, 3] is a threshold controlling the dynamic expansion process.