ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think
Authors: Tao Feng, Wei Li, Didi Zhu, Hangjie Yuan, Wendi Zheng, Dan Zhang, Jie Tang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To bridge this gap, we introduce Zero Flow, the first benchmark designed to evaluate gradient-free optimization algorithms for overcoming forgetting. Zero Flow examines a suite of forward pass-based methods across various algorithms, forgetting scenarios, and datasets. Our results show that forward passes alone can be sufficient to mitigate forgetting. |
| Researcher Affiliation | Academia | Tao Feng 1 Wei Li 1 * Didi Zhu 2 Hangjie Yuan 2 Wendi Zheng 1 Dan Zhang 1 Jie Tang 1 1Tsinghua University 2Zhejiang University. Correspondence to: Jie Tang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Genetic formulation of ZO optimization |
| Open Source Code | No | The paper provides a URL for a project page (https://zeroflow-bench.github.io/), but it does not contain an explicit statement that the code is available there, nor is it a direct link to a code repository. |
| Open Datasets | Yes | Both models are initialized with Vi T-B/16 pretrained on Image Net-1K (IN1K), and are subsequently fine-tuned on four downstream tasks of varying complexity ranging from standard benchmarks such as CIFAR-100 and CUB, to more challenging datasets like Image Net-A and Omni Benchmark, which exhibit a large domain gap from the pretraining distribution (Zhou et al., 2024a;c). |
| Dataset Splits | Yes | Following (Zhou et al., 2023a), each dataset is evenly split into 10 incremental tasks by class. For instance, Omni Benchmark contains 300 classes, with 30 classes introduced at each stage. |
| Hardware Specification | No | The paper mentions memory usage in GB and the number of GPUs (e.g., "each GPU equipping with 24GB of memory", "1, 2, 3, and 6 GPUs"), but it does not specify exact GPU/CPU models or processor types. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for its implementation. It discusses optimizers like SGD and Adam, but without version details for any software libraries or frameworks used. |
| Experiment Setup | Yes | Both models are initialized with Vi T-B/16 pretrained on Image Net-1K (IN1K), and are subsequently fine-tuned on four downstream tasks... Zero Flow covers 7 forward pass-based methods: ZO-SGD, ZO-SGD-Sign, ZO-SGD-Conserve, ZO-Adam, ZO-Adam-Sign, ZO-Adam-Conserve, Forward-Grad. Unless otherwise specified, the query budget is fixed to 1 for efficiency... The per-epoch runtime in seconds (s). ZO-SGD w/ query budget q = 1, 4 and all other optimizers w/ query budget q = 1... For a learning rate of 0.001... With a higher learning rate of 0.01... replay buffer = 2000. |