CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale
Authors: ZeMing Gong, Austin Wang, Xiaoliang Huo, Joakim Bruslund Haurum, Scott C Lowe, Graham W Taylor, Angel Chang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose CLIBD...Our experiments show our pretrained embeddings that align modalities can (1) improve on the representational power of image and DNA embeddings alone by obtaining higher taxonomic classification accuracy and (2) provide a bridge from image to DNA to enable image-to-DNA based retrieval. 5 EXPERIMENTS We evaluate the model s ability to retrieve correct taxonomic labels using images and DNA barcodes from the BIOSCAN-1M dataset [23]. |
| Researcher Affiliation | Academia | Simon Fraser University1 Aalborg University2 Vector Institute3 University of Guelph4 Alberta Machine Intelligence Institute (Amii)5 EMAIL, {joha}@create.aau.dk, {scott.lowe}@vectorinstitute.ai, {gwtaylor}@uoguelph.ca |
| Pseudocode | No | The paper describes the contrastive learning scheme and inference process using textual descriptions and mathematical formulas, for example, in Section 3.1 'TRAINING' and '3.2 INFERENCE'. It also uses Figure 1 to illustrate the overview of CLIBD, but no explicitly labeled pseudocode or algorithm blocks are present. |
| Open Source Code | Yes | https://bioscan-ml.github.io/clibd/ |
| Open Datasets | Yes | The BIOSCAN-1M dataset [23] is a curated collection of over one million insect data records sourced from a biodiversity monitoring workflow. Each record in the dataset includes a high-quality insect image, expert-annotated taxonomic label, and a DNA barcode. Reference [23]: Zahra Gharaee, Ze Ming Gong, Nicholas Pellegrino, Iuliia Zarubiieva, Joakim Bruslund Haurum, Scott Lowe, Jaclyn Mc Keown, Chris Ho, Joschka Mc Leod, Yi-Yun Wei, Jireh Agda, Sujeevan Ratnasingham, Dirk Steinke, Angel Chang, Graham W Taylor, and Paul Fieguth. A step towards worldwide biodiversity assessment: The BIOSCAN-1M insect dataset. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (eds.), Advances in Neural Information Processing Systems, volume 36, pp. 43593 43619. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/file/ 87dbbdc3a685a97ad28489a1d57c45c1-Paper-Datasets_and_Benchmarks.pdf. |
| Dataset Splits | Yes | Data partitioning. We split BIOSCAN-1M into train/val/test sets to evaluate zero-shot classification and model generalization to unseen species. Records for well-represented species (at least 9 records) are partitioned at 80/20 ratio into seen and unseen, with seen records allocated to each of the splits and unseen records allocated to val and test. All records without species labels are used in contrastive pretraining, and species with 2 to 8 records are divided between the unseen splits in the val and test sets...For the seen species, we subdivide the records at a 70/10/10/10 ratio into train/val/test/key, where the keys for the seen species are shared across all splits. The unseen species for each of validation and test are split evenly between queries and keys. |
| Hardware Specification | Yes | Models were trained on four 80GB A100 GPUs for 50 epochs with batch size 2000, using the Adam optimizer [33] and one-cycle learning rate schedule [57] with learning rate from 1e 6 to 5e 5. |
| Software Dependencies | No | For each modality we use a pretrained model to initialize our encoders. Images: Vi T-B1 pretrained on Image Net-21k and fine-tuned on Image Net-1k [21]. 1Loaded as vit base patch16 224 in the timm library. DNA barcodes: Barcode BERT [2]...Text: we use the pretrained BERT-Small [68] for taxonomic labels. The paper mentions software libraries and models like 'timm library', 'Barcode BERT', 'BERT-Small', but it does not specify version numbers for these software components or other dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Models were trained on four 80GB A100 GPUs for 50 epochs with batch size 2000, using the Adam optimizer [33] and one-cycle learning rate schedule [57] with learning rate from 1e 6 to 5e 5. For efficient training, we use automatic mixed precision (AMP). |