CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

Authors: ZeMing Gong, Austin Wang, Xiaoliang Huo, Joakim Bruslund Haurum, Scott C Lowe, Graham W Taylor, Angel Chang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose CLIBD...Our experiments show our pretrained embeddings that align modalities can (1) improve on the representational power of image and DNA embeddings alone by obtaining higher taxonomic classification accuracy and (2) provide a bridge from image to DNA to enable image-to-DNA based retrieval. 5 EXPERIMENTS We evaluate the model s ability to retrieve correct taxonomic labels using images and DNA barcodes from the BIOSCAN-1M dataset [23].
Researcher Affiliation Academia Simon Fraser University1 Aalborg University2 Vector Institute3 University of Guelph4 Alberta Machine Intelligence Institute (Amii)5 EMAIL, {joha}@create.aau.dk, {scott.lowe}@vectorinstitute.ai, {gwtaylor}@uoguelph.ca
Pseudocode No The paper describes the contrastive learning scheme and inference process using textual descriptions and mathematical formulas, for example, in Section 3.1 'TRAINING' and '3.2 INFERENCE'. It also uses Figure 1 to illustrate the overview of CLIBD, but no explicitly labeled pseudocode or algorithm blocks are present.
Open Source Code Yes https://bioscan-ml.github.io/clibd/
Open Datasets Yes The BIOSCAN-1M dataset [23] is a curated collection of over one million insect data records sourced from a biodiversity monitoring workflow. Each record in the dataset includes a high-quality insect image, expert-annotated taxonomic label, and a DNA barcode. Reference [23]: Zahra Gharaee, Ze Ming Gong, Nicholas Pellegrino, Iuliia Zarubiieva, Joakim Bruslund Haurum, Scott Lowe, Jaclyn Mc Keown, Chris Ho, Joschka Mc Leod, Yi-Yun Wei, Jireh Agda, Sujeevan Ratnasingham, Dirk Steinke, Angel Chang, Graham W Taylor, and Paul Fieguth. A step towards worldwide biodiversity assessment: The BIOSCAN-1M insect dataset. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (eds.), Advances in Neural Information Processing Systems, volume 36, pp. 43593 43619. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/file/ 87dbbdc3a685a97ad28489a1d57c45c1-Paper-Datasets_and_Benchmarks.pdf.
Dataset Splits Yes Data partitioning. We split BIOSCAN-1M into train/val/test sets to evaluate zero-shot classification and model generalization to unseen species. Records for well-represented species (at least 9 records) are partitioned at 80/20 ratio into seen and unseen, with seen records allocated to each of the splits and unseen records allocated to val and test. All records without species labels are used in contrastive pretraining, and species with 2 to 8 records are divided between the unseen splits in the val and test sets...For the seen species, we subdivide the records at a 70/10/10/10 ratio into train/val/test/key, where the keys for the seen species are shared across all splits. The unseen species for each of validation and test are split evenly between queries and keys.
Hardware Specification Yes Models were trained on four 80GB A100 GPUs for 50 epochs with batch size 2000, using the Adam optimizer [33] and one-cycle learning rate schedule [57] with learning rate from 1e 6 to 5e 5.
Software Dependencies No For each modality we use a pretrained model to initialize our encoders. Images: Vi T-B1 pretrained on Image Net-21k and fine-tuned on Image Net-1k [21]. 1Loaded as vit base patch16 224 in the timm library. DNA barcodes: Barcode BERT [2]...Text: we use the pretrained BERT-Small [68] for taxonomic labels. The paper mentions software libraries and models like 'timm library', 'Barcode BERT', 'BERT-Small', but it does not specify version numbers for these software components or other dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Models were trained on four 80GB A100 GPUs for 50 epochs with batch size 2000, using the Adam optimizer [33] and one-cycle learning rate schedule [57] with learning rate from 1e 6 to 5e 5. For efficient training, we use automatic mixed precision (AMP).