reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization

Authors: Dasol Hong, Wooju Lee, Hyun Myung

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that Co Co A-Mix, a mixture model with Co A-loss and Co A-weights, outperforms stateof-the-art methods by enhancing specialization and generalization. Our code is publicly available at https://github.com/url-kaist/Co Co A-Mix.
Researcher Affiliation	Academia	1Urban Robotics Lab, School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Republic of Korea. Correspondence to: Dasol Hong <EMAIL>, Wooju Lee <EMAIL>, Hyun Myung <EMAIL>.
Pseudocode	No	The paper describes the proposed method in prose and mathematical equations (e.g., Section 3.1, 3.2, 3.3, 3.4) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is publicly available at https://github.com/url-kaist/Co Co A-Mix.
Open Datasets	Yes	We evaluate base-to-new generalization and cross-dataset transfer performance using 11 datasets: Image Net (Deng et al., 2009), Caltech101 (Fei-Fei et al., 2004), Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), FGVCAircraft (Maji et al., 2013), Euro SAT (Helber et al., 2019), UCF101 (Soomro, 2012), DTD (Cimpoi et al., 2014), and SUN397 (Xiao et al., 2010). For FSCIL, we use CIFAR100 (Krizhevsky et al., 2009).
Dataset Splits	Yes	Each dataset is evenly split into two disjoint subsets: base classes (Base) for tuning and unseen new classes (New). Following Tao et al. (2020), we split the classes into 60 Base and 40 New classes and adopted a 5-shot 5-way setting, resulting in a total of 9 training sessions.
Hardware Specification	No	The paper mentions using CLIP with Vi T-B/16 and Vi T-L/14 backbones, which are model architectures, not specific hardware components. No details about the GPUs, CPUs, or other computing resources used for experiments are provided.
Software Dependencies	No	The paper mentions the Adam optimizer (Kingma, 2014) and SGD for optimization, and refers to a Py Torch library for CAM methods (Gildenblat et al., 2021). However, it does not provide specific version numbers for these software components or other core libraries like PyTorch itself.
Experiment Setup	Yes	Training Details The prompt length M is initialized randomly and set to 16 unless specified. ... Prompt tuning is performed using the Adam optimizer (Kingma, 2014) with a learning rate of 0.002. Optimization for the Co A-weights is conducted with SGD. ... Training was conducted over 50 epochs with a batch size of 32. The prompt t was optimized using the Adam optimizer with a learning rate of 0.002 and a weight decay of 5 10-4. Co A-weights were optimized using SGD with the same learning rate, a momentum of 0.9, and a weight decay of 5 10-4. The weight for LCo A was set to w = 5.0, the weight for LEnt was set to 10.0, and the margin was set to d = 0.2. The prompt length M was set to 16. ... Co Co A-Mix used prompts of length M = 2 per session ... Co A-weights were optimized for 2 epochs in the initial session, and for 100 epochs in all subsequent sessions. The margin d of the loss LEnt was set to 0.1.