reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GenOL: Generating Diverse Examples for Name-only Online Learning

Authors: Minhyuk Seo, Seongwon Cho, Minjae Lee, Diganta Misra, Hyeonbeom Choi, Seon Joo Kim, Jonghyun Choi

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate that the proposed Gen OL outperforms prior arts, even a model trained with fully supervised data by large margins, in various tasks, including image recognition and multi-modal visual reasoning. Code is available at https://github.com/snumprlab/genol. ... 5 Experiments 5.1 Experimental Setup 5.2 Quantitative Analysis 5.3 Analysis of Bias 5.4 Qualitative Analysis 5.5 Comparison of Computational and Memory Cost 5.6 Ablation Study
Researcher Affiliation	Academia	Minhyuk Seo KU Leuven, Seoul National University Seongwon Cho Seoul National University Minjae Lee Seoul National University Diganta Misra ELLIS MPI-IS Tübingen Hyeonbeom Choi Seoul National University Seon Joo Kim Yonsei University Jonghyun Choi Seoul National University
Pseudocode	Yes	A.17 Pseudocode for the Gen OL Algorithm 1 Gen OL Algorithm 2 RPG Algorithm 3 HIRPG Algorithm 4 Set of Generators G Algorithm 5 Ensembler
Open Source Code	Yes	Code is available at https://github.com/snumprlab/genol.
Open Datasets	Yes	Datasets. We evaluate Gen OL s domain generalization in CIL using PACS (Zhou et al., 2020), CIFAR10-W (Sun et al., 2024), Image Net-R (Hendrycks et al., 2021), and Domain Net (Neyshabur et al., 2020)... For MVCIL experiments, we use Bongard-HOI (Jiang et al., 2022) and Bongard-Open World (Wu et al., 2024a).
Dataset Splits	Yes	Continual Learning Setups. We empirically validate Gen OL by comparing it with state-of-the-art methods in name-only class-incremental learning (CIL) and name-only multi-modal visual-concept incremental learning (MVCIL) setups (Seo et al., 2025), where class names and concepts (e.g., ride a bike , kick a ball ) are encountered incrementally. ... We provide the details of the task split in Section A.2. ... Table 7: Task configurations for the CIL setup on each domain generalization dataset. ... Table 8: Task configurations for the MVCI setup on each Bongard benchmark. ... For PACS, the first task includes 3 classes, while the subsequent tasks include data for 2 classes each.
Hardware Specification	Yes	Additionally, it significantly accelerates data collection, e.g., generating Domain Net with SDXL(Podell et al., 2023) on 8 NVIDIA RTX 4090 GPUs takes only 80 hours... Generative baselines, such as Glide-Syn, LE, CHB, SC, CCG, and our proposed Gen OL, utilize 32 RTX 4090 GPUs for image generation.
Software Dependencies	No	The paper mentions specific models like 'GPT-4 model', 'LLa MA-3-8B', 'Qwen-3-8B' and algorithms like 'Adam optimizer', 'Rand Augment', 'Res Net18', 'Vision Transformer'. However, it does not provide specific version numbers for general software components such as Python, PyTorch, or CUDA libraries, which are required for a reproducible description of ancillary software.
Experiment Setup	Yes	Hyperparameters. For τ, which refers to the temperature of the softmax function in CONAN, is uniformly set to 0.5 across all datasets. For L, the truncation ratio used in RMD score normalization, we set it to 5% for all experiments. For diverse prompt generation baselines, as well as Gen OL, we generate 50 different prompts for all baselines across all benchmarks, including HIRPG, to ensure a fair comparison. Specifically, in Gen OL, we set depth D = 2, and K = 7 for all setups to generate 50 prompts using HIRPG. For the optimizer and the learning rate (LR) scheduler in the CIL setup, we employed the Adam optimizer with an initial LR of 0.0003 and the Constant LR scheduler, respectively, following prior works (Koh et al., 2023; Seo et al., 2024). In the MVCIL setup, we use Adam optimizer with LR 5 10 5 and Constant LR scheduler. Following (Koh et al., 2021; 2023; Kim et al., 2024a), we conduct batch training for each incoming sample. Specifically, for PACS, CIFAR-10, Domain Net, and Image Net, the number of batch iterations per incoming sample is set to 2, 2, 3, and 0.5, respectively, with batch sizes of 16, 16, 128, and 256. Episodic memory sizes are configured as 200, 2000, 10000, and 80000 for PACS, CIFAR-10-W, Domain Net, and Image Net, respectively. For MVCIL setups, the number of batch iterations per incoming sample is set to 0.5, with a batch size of 2 and a memory size of 500 in both Bongard-HOI and Bongard-Open World.