ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning
Authors: Xiangru Tang, Tianyu Hu, Muyang Ye, Daniel Shao, Xunjian Yin, Siru Ouyang, Wangchunshu Zhou, Pan Lu, Zhuosheng Zhang, Yilun Zhao, Arman Cohan, Mark Gerstein
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on four chemical reasoning datasets from Sci Bench demonstrate that Chem Agent achieves performance gains of up to 46% (GPT-4), significantly outperforming existing methods. Our experiments are conducted on four chemical reasoning datasets from Sci Bench (Wang et al., 2024a) with GPT-3.5, GPT-4 (Open AI et al., 2024), and open-source models like Llama3 (Llama Team, 2024). |
| Researcher Affiliation | Academia | 1Yale University 2UIUC 3Stanford University 4Shanghai Jiao Tong University EMAIL |
| Pseudocode | Yes | 2.4 LIBRARY CONSTRUCTION Algorithm 1: Library Construction Input: Development set D, LLM F, prompts {psplit, pref, prank} Output: Static memory M consisting of units U = {condition, question, solution} for (P, S) in D do |
| Open Source Code | Yes | Our code can be found at https://github.com/gersteinlab/chemagent. |
| Open Datasets | Yes | Our experiments are conducted on four chemical reasoning datasets from Sci Bench (Wang et al., 2024a) with GPT-3.5, GPT-4 (Open AI et al., 2024), and open-source models like Llama3 (Llama Team, 2024). |
| Dataset Splits | Yes | Each dataset is divided into a development set (Dd) and a test set (Dt), with exact sizes provided in Table 6. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'GPT-3.5, GPT-4, and Llama3' as models but does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the implementation of Chem Agent. |
| Experiment Setup | Yes | During the reasoning stage, we configure the planning memory (Mp) to provide a maximum of two related memory instances (2-shot) for each query, and the execution memory (Me) to provide up to four related instances (4-shot). However, during the construction of the library, only the knowledge memory (Mk) is used, as the standard solutions are already available in the development set (Dd). We evaluate the accuracy by comparing their outputs with the correct answers, using a relative tolerance of 0.01. |