ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think

Authors: Tao Feng, Wei Li, Didi Zhu, Hangjie Yuan, Wendi Zheng, Dan Zhang, Jie Tang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To bridge this gap, we introduce Zero Flow, the first benchmark designed to evaluate gradient-free optimization algorithms for overcoming forgetting. Zero Flow examines a suite of forward pass-based methods across various algorithms, forgetting scenarios, and datasets. Our results show that forward passes alone can be sufficient to mitigate forgetting.
Researcher Affiliation Academia Tao Feng 1 Wei Li 1 * Didi Zhu 2 Hangjie Yuan 2 Wendi Zheng 1 Dan Zhang 1 Jie Tang 1 1Tsinghua University 2Zhejiang University. Correspondence to: Jie Tang <EMAIL>.
Pseudocode Yes Algorithm 1 Genetic formulation of ZO optimization
Open Source Code No The paper provides a URL for a project page (https://zeroflow-bench.github.io/), but it does not contain an explicit statement that the code is available there, nor is it a direct link to a code repository.
Open Datasets Yes Both models are initialized with Vi T-B/16 pretrained on Image Net-1K (IN1K), and are subsequently fine-tuned on four downstream tasks of varying complexity ranging from standard benchmarks such as CIFAR-100 and CUB, to more challenging datasets like Image Net-A and Omni Benchmark, which exhibit a large domain gap from the pretraining distribution (Zhou et al., 2024a;c).
Dataset Splits Yes Following (Zhou et al., 2023a), each dataset is evenly split into 10 incremental tasks by class. For instance, Omni Benchmark contains 300 classes, with 30 classes introduced at each stage.
Hardware Specification No The paper mentions memory usage in GB and the number of GPUs (e.g., "each GPU equipping with 24GB of memory", "1, 2, 3, and 6 GPUs"), but it does not specify exact GPU/CPU models or processor types.
Software Dependencies No The paper does not provide specific software names with version numbers for its implementation. It discusses optimizers like SGD and Adam, but without version details for any software libraries or frameworks used.
Experiment Setup Yes Both models are initialized with Vi T-B/16 pretrained on Image Net-1K (IN1K), and are subsequently fine-tuned on four downstream tasks... Zero Flow covers 7 forward pass-based methods: ZO-SGD, ZO-SGD-Sign, ZO-SGD-Conserve, ZO-Adam, ZO-Adam-Sign, ZO-Adam-Conserve, Forward-Grad. Unless otherwise specified, the query budget is fixed to 1 for efficiency... The per-epoch runtime in seconds (s). ZO-SGD w/ query budget q = 1, 4 and all other optimizers w/ query budget q = 1... For a learning rate of 0.001... With a higher learning rate of 0.01... replay buffer = 2000.