Hierarchically Encapsulated Representation for Protocol Design in Self-Driving Labs

Authors: Yu-Zhe Shi, Mingchen Liu, Fanxu Meng, Qiao Xu, Zhangqian Bi, Kun He, Lecheng Ruan, Qining Wang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The results demonstrate that the proposed method could effectively complement Large Language Models in the protocol design process, serving as an auxiliary module in the realm of machine-assisted scientific exploration. The complete quantitative results across the four domains, the three tasks, and the six dimensions of evaluation metrics are presented at Appx. B. Through paired samples t-test, we find that EE+ and EI+ significantly outperform other alternative approaches (EE+ outperforms EE: t(278) = 8.007, µd < 0, p < .0001; EI+ outperforms EI: t(278) = 8.397, µd < 0, p < .0001; EE+ outperforms II: t(278) = 24.493, µd < 0, p < .0001; EI+ outperforms II: t(278) = 23.855, µd < 0, p < .0001; see Fig. 3C-E).
Researcher Affiliation Academia Yu-Zhe Shi1 , Mingchen Liu2 , Fanxu Meng1, Qiao Xu1, Zhangqian Bi2, Kun He2, Lecheng Ruan1 , Qining Wang1 1 Department of Advanced Manufacturing and Robotics, Peking University 2 School of Computer Science and Technology, Huazhong University of Science and Technology Equal contribution EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Reciprocative Verification
Open Source Code Yes The project page with supplementary files for reproducing the results of this paper will be available at https://autodsl.org/procedure/papers/iclr25shi.html.
Open Datasets Yes The corpora C for the automatic generation of representations (Sec. 3.1) and the corpora for selecting the testing set (Sec. 4.1) are both retrieved from open-sourced websites run by top-tier publishers, including Nature’s Protocolexchange6, Cell’s Star-protocols7, Bio-protocol8, Wiley’s Current Protocols9, and Jove10.
Dataset Splits Yes The testing set includes 140 new protocols and 1757 steps in total, across the domains of Genetics, Medical, Bioengineering, and Ecology, with 23% for planning, 52% for modification, and 25% for adjustment (see Tab. 1 and Fig. 3A for details).
Hardware Specification Yes The design of the DSLs was executed on a Mac Book with an M2 chip, running 1,000 iterations to ensure convergence.
Software Dependencies No The protocol pre-processing steps begin by reading all JSON files of the protocols. Each protocol is then splitted sentence-by-sentence using Spacy1, with the constraint that every sentence is longer than ten characters. ... Afterwards, we use sklearn4 to identify potentially similar entity pairs by calculating the cosine similarity of the candidate entities, and then passing these entity pairs to the GPT model for synonym detection... We primarily used GPT-4o mini with Open AI’s Batch API5 for preprocessing...
Experiment Setup Yes The design of the DSLs was executed on a Mac Book with an M2 chip, running 1,000 iterations to ensure convergence. This process required an average of 55 seconds per iteration for the operation-centric view DSL and an average of 2 seconds per iteration for the product-centric view DSL.