Concept Matching with Agent for Out-of-Distribution Detection

Authors: Yuxiao Lee, Xiaofeng Cao, Jingcai Guo, Wei Ye, Qing Guo, Yi Chang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experimental results showcase the superior performance of CMA over both zero-shot and training-required methods in a diverse array of real-world scenarios. We conducted experiments on various datasets with distinct ID scenarios and demonstrated that CMA achieves superior performance across a wide range of real-world tasks.
Researcher Affiliation Academia 1School of Artificial Intelligence, Jilin University, China 2The Hong Kong Polytechnic University 3College of Electronic and Information Engineering, Tongji University, China 4CFAR and IHPC, Agency for Science, Technology and Research (A*STAR), Singapore 5Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, Ministry of Education, China
Pseudocode No The paper describes its method using textual explanations and mathematical formulas (e.g., SCMA(x; Yin, Yntc, T, I) in Section 3.3) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/yuxiao Lee Marks/CMA
Open Datasets Yes Datasets We conducted a comprehensive evaluation of the performance of our method across various dimensions and compared it with widely employed OOD detection algorithms. (1) We assessed our approach on the Image Net-1k OOD benchmark. This benchmark utilizes the large-scale visual dataset Image Net-1k (Deng et al. 2009) as the ID data and four OOD datasets (including subsets of i Naturalist (Van Horn et al. 2018), SUN (Xiao et al. 2010), Places(Zhou et al. 2017), and Textures (Cimpoi et al. 2014)... (2) We evaluated our method on various smallscale datasets. Specifically, we considered the following ID datasets: Fashion MNIST (Xiao, Rasul, and Vollgraf 2017), STL10 (Coates, Ng, and Lee 2011), Oxford IIIPet (Parkhi et al. 2012), Food-101 (Bossard, Guillaumin, and Van Gool 2014), CUB-200 (Wah et al. 2011), Plant Village (Hughes, Salath e et al. 2015), LFW (Huang et al. 2012), Stanford-dogs (Khosla et al. 2011), FGVC-Aircraft (Maji et al. 2013), Grocery Store (Klasson, Zhang, and Kjellstr om 2019), and CIFAR-10 (Krizhevsky, Hinton et al. 2009).
Dataset Splits No The paper mentions using 'Image Net-1k as the ID data and four OOD datasets' for evaluation and 'subsets of Image Net1k, namely Image Net-10 and Image Net-20'. However, it does not specify explicit training, validation, or testing splits (e.g., percentages or sample counts) for the datasets used in its experiments to allow for reproduction of the data partitioning.
Hardware Specification No The paper mentions using 'CLIP-B/16 as the foundational evaluation model' but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper mentions using 'CLIP (Radford et al. 2021)' and 'CLIP-B/16' as the pre-trained model backbone, which uses 'Vi TB/16 transformer (Dosovitskiy et al. 2020)' and 'masked self-attention Transformer (Vaswani et al. 2017)'. These are models and architectures, but no specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) are provided.
Experiment Setup Yes Additionally, unless otherwise specified, the temperature coefficient is uniformly set to 1 across all algorithms. In Section 5 Discussion, 'empirical analysis on Image Net-1k reveals optimal performance at k = 1' where k = M/N (M denotes agent count).