CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization
Authors: Dasol Hong, Wooju Lee, Hyun Myung
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that Co Co A-Mix, a mixture model with Co A-loss and Co A-weights, outperforms stateof-the-art methods by enhancing specialization and generalization. Our code is publicly available at https://github.com/url-kaist/Co Co A-Mix. |
| Researcher Affiliation | Academia | 1Urban Robotics Lab, School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Republic of Korea. Correspondence to: Dasol Hong <EMAIL>, Wooju Lee <EMAIL>, Hyun Myung <EMAIL>. |
| Pseudocode | No | The paper describes the proposed method in prose and mathematical equations (e.g., Section 3.1, 3.2, 3.3, 3.4) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is publicly available at https://github.com/url-kaist/Co Co A-Mix. |
| Open Datasets | Yes | We evaluate base-to-new generalization and cross-dataset transfer performance using 11 datasets: Image Net (Deng et al., 2009), Caltech101 (Fei-Fei et al., 2004), Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), FGVCAircraft (Maji et al., 2013), Euro SAT (Helber et al., 2019), UCF101 (Soomro, 2012), DTD (Cimpoi et al., 2014), and SUN397 (Xiao et al., 2010). For FSCIL, we use CIFAR100 (Krizhevsky et al., 2009). |
| Dataset Splits | Yes | Each dataset is evenly split into two disjoint subsets: base classes (Base) for tuning and unseen new classes (New). Following Tao et al. (2020), we split the classes into 60 Base and 40 New classes and adopted a 5-shot 5-way setting, resulting in a total of 9 training sessions. |
| Hardware Specification | No | The paper mentions using CLIP with Vi T-B/16 and Vi T-L/14 backbones, which are model architectures, not specific hardware components. No details about the GPUs, CPUs, or other computing resources used for experiments are provided. |
| Software Dependencies | No | The paper mentions the Adam optimizer (Kingma, 2014) and SGD for optimization, and refers to a Py Torch library for CAM methods (Gildenblat et al., 2021). However, it does not provide specific version numbers for these software components or other core libraries like PyTorch itself. |
| Experiment Setup | Yes | Training Details The prompt length M is initialized randomly and set to 16 unless specified. ... Prompt tuning is performed using the Adam optimizer (Kingma, 2014) with a learning rate of 0.002. Optimization for the Co A-weights is conducted with SGD. ... Training was conducted over 50 epochs with a batch size of 32. The prompt t was optimized using the Adam optimizer with a learning rate of 0.002 and a weight decay of 5 10-4. Co A-weights were optimized using SGD with the same learning rate, a momentum of 0.9, and a weight decay of 5 10-4. The weight for LCo A was set to w = 5.0, the weight for LEnt was set to 10.0, and the margin was set to d = 0.2. The prompt length M was set to 16. ... Co Co A-Mix used prompts of length M = 2 per session ... Co A-weights were optimized for 2 epochs in the initial session, and for 100 epochs in all subsequent sessions. The margin d of the loss LEnt was set to 0.1. |