Robust Representation Consistency Model via Contrastive Denoising
Authors: jiachen lei, Julius Berner, Jiongxiao Wang, Zhongzhu Chen, Chaowei Xiao, Zhongjie Ba, Kui Ren, Jun Zhu, anima anandkumar
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on various datasets and achieve state-of-the-art performance with minimal computation budget during inference. For example, our method outperforms the certified accuracy of diffusionbased methods on Image Net across all perturbation radii by 5.3% on average, with up to 11.6% at larger radii, while reducing inference costs by 85 on average. Codes are available at: https://github.com/jiachenlei/rRCM. |
| Researcher Affiliation | Collaboration | 1Zhejiang University, 2NVIDIA, 3UW Madison, 4Amazon, 5Shengshu, 6Tsinghua University, 7Caltech |
| Pseudocode | Yes | Algorithm 1 r RCM Pre-training Pseudocode |
| Open Source Code | Yes | Codes are available at: https://github.com/jiachenlei/rRCM. |
| Open Datasets | Yes | In this section, we evaluate our r RCM model on two datasets: Image Net (Deng et al., 2009) and CIFAR10 (Krizhevsky et al., 2009). |
| Dataset Splits | Yes | Certification. We follow the settings of Carlini et al. (2022). Specifically, on both Image Net and CIFAR10, we certify a subset that contains 500 images from their test set with confidence 99.9%. |
| Hardware Specification | Yes | We measure the inference latency of all methods on a single A800 GPU. |
| Software Dependencies | No | The paper mentions 'x Formers (https://github.com/facebookresearch/xformers)' and 'DPM-Solver (Lu et al., 2022)' but does not specify version numbers for these software components. It also implicitly uses a deep learning framework, likely PyTorch, but no version is stated. |
| Experiment Setup | Yes | Pre-training During pre-training, we adopt the definition of diffusion models proposed in EDM (Karras et al., 2022) and refer to the implementation of consistency models (Song et al., 2023), including noise schedule, input scaling, time embedding strategy, and time discretization strategy. As for data augmentation strategies, we adopt those utilized in Mo Co-v3 (Chen et al., 2021). The temperature value τ in (9) is set to 0.2 for all experiments. By default, we pre-train r RCM-B and r RCM-B-Deep for 600k steps with a batch size of 4096 on the Image Net dataset. We pre-train r RCM-B for 300k steps on the CIFAR10 dataset, with a batch size of 2048. Subsequently, we fine-tune our r RCM models separately at various noise levels σ {0.25, 0.5, 1.0}. In specific, for both Image Net and CIFAR-10, we set η1 in (12) to 10 at the noise level of 0.25 , and to 20 for noise levels 0.5 and 1.0. In all experiments, η2 in (12) is fixed as 0.5. To enhance training stability, we apply a dynamic EMA schedule for the target model utilized when computing the contrastive loss. Specifically, we gradually increase the EMA rate from 0.99 to 0.9999 following a pre-defined sigmoid schedule...We present hyper-parameters used in our pre-training experiments in Table 4 and the data augmentation strategies in Table 5. Fine-tuning We fine-tune the pre-trained model following the implementation in (Jeong & Shin, 2020) at three different noise levels σ [0.25, 0.5, 1.0], and report the best results at each perturbation radius. We tune the pre-trained model for 150 epochs on Image Net and 100 epochs on CIFAR10. |