Memory-Modular Classification: Learning to Generalize with Memory Replacement
Authors: Dahyun Kang, Ahmet Iscen, Eunchan Jo, Sua Choi, Minsu Cho, Cordelia Schmid
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate the promising performance and versatility of our approach in handling diverse classification tasks, including zero-shot/few-shot classification of unseen classes, fine-grained classification, and class-incremental classification. ... Table 1 compares zero-shot baselines and MML on cross-dataset transfer. ... Table 2 compares MML and other zero-shot models on the zero-shot CUB benchmark ... Table 3 presents the ablation study of the main model components of MML. |
| Researcher Affiliation | Collaboration | Dahyun Kang EMAIL POSTECH Ahmet Iscen EMAIL Google Deep Mind Eunchan Jo EMAIL Sua Choi EMAIL Minsu Cho EMAIL POSTECH Cordelia Schmid EMAIL Google Deep Mind |
| Pseudocode | No | The paper describes the methodology using textual explanations and mathematical equations (e.g., Eq. 1, 3, 4, 5), accompanied by a high-level architectural diagram (Figure 2). However, it does not include any explicitly labeled pseudocode blocks, algorithms, or structured, step-by-step procedures in a code-like format. |
| Open Source Code | No | We will make our attached code and data publicly available once accepted. |
| Open Datasets | Yes | To construct the external image memory for Image Net derivatives, we employ a readily available web-crawled image dataset, Web Vision ver. 2 (Li et al., 2017). Web Vision is collected from Google and Flickr by the keyword search of the 1000 class names of Image Net1K (Russakovsky et al., 2015). ... For single-dataset zero-shot classification, Image Net-S and CUB are used, where the classes of each dataset are split into disjoint sets for few-shot training and zero-shot testing. We adopt the existing zero-shot classification CUB benchmark (Wah et al., 2011; Akata et al., 2013) ... For text memory, we query Wikipedia (Tian et al., 2022; Hu et al., 2023a; Naeem et al., 2023) for each class name and retrieve the corresponding article text by web crawling. ... The total classes of these datasets amount to 1,310 classes and their details including the references are found in Table 10. |
| Dataset Splits | Yes | Similarly, we introduce an Image Net (Russakovsky et al., 2015) split such that it comprises 600/200/200 classes for train/validation/test and call it Image Net-S (S stands for class split). We use 16 images per class for training, i.e., 9.6K training images. ... We adopt the existing zero-shot classification CUB benchmark (Wah et al., 2011; Akata et al., 2013) of which classes are split into 150/50 bird species classes for train/validation. ... We adopt a public benchmark, Image Net100-Base0-Inc10 (Rebuffi et al., 2017), where 10 unseen classes and their annotated samples are sequentially given for 10 consecutive stages. |
| Hardware Specification | Yes | For all training and testing, we use a single 2080 Ti or an RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions using "Py Torch (Paszke et al., 2017) built-in top K module" and the "CLIP encoder (Radford et al., 2021)", but it does not specify concrete version numbers for PyTorch or other critical software libraries, which are necessary for reproducible software dependencies. |
| Experiment Setup | Yes | For training, we use a batch size of 256, a learning rate of 1e 6 and weight decay of 5e 4 on a single 2080 Ti or an RTX 3090 GPU for all training and testing. We retrieve 32 NNs from both the image and text memory. We use M = 16 for prototype construction and set the logit temperature τ = 16, which is chosen via hyperparameter search. |