reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

What can large language models do for sustainable food?

Authors: Anna Thomas, Adam Yee, Andrew Mayne, Maya B. Mathur, Dan Jurafsky, Kristina Gligorić

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate six LLMs on four tasks in our typology. For example, for a sustainable protein design task, food science experts estimated that collaboration with an LLM can reduce time spent by 45% on average, compared to 22% for collaboration with another expert human food scientist. However, for a sustainable menu design task, LLMs produce suboptimal solutions when instructed to consider both human satisfaction and climate impacts. We propose a general framework for integrating LLMs with combinatorial optimization to improve reasoning capabilities. Our approach decreases emissions of food choices by 79% in a hypothetical restaurant while maintaining participants satisfaction with their set of choices. Our results demonstrate LLMs potential, supported by optimization techniques, to accelerate sustainable food development and adoption.
Researcher Affiliation	Collaboration	1Stanford University 2Umai Works. Correspondence to: Anna T. Thomas <EMAIL>, Kristina Gligoric <EMAIL>.
Pseudocode	Yes	We can then solve (1) via the following steps: 1. Generate the ground set U = {u1, ..., u N}, e.g. a diverse set of recipes or exercises. 2. Obtain the estimates ˆp(ui) i [1, ..., N] via an LLM. 3. Solve the combinatorial optimization problem using standard techniques, e.g. submodular optimization or integer programming, depending on the forms of f(x) and gi(x). This will yield a subset S U.
Open Source Code	Yes	Our code is available at https://github.com/thomasat/llms-sustainable-food.
Open Datasets	Yes	NECTAR Sustainable Protein Dataset. The NECTAR Initiative s (nectar.org) sensory panel data1 is freely available to academic researchers. The dataset, which will continue to expand in size, consists of 47 products across five categories. ... 1Access can be requested here. Food.com Recipe Dataset. The Food.com dataset2 contains 522,517 recipes, including ingredients and preparation instructions. We use the associated 1,401,982 reviews, containing ratings and text, to capture online users preferences. ... 2Publicly available here.
Dataset Splits	No	The paper uses pre-trained LLMs in a zero-shot setting for evaluation and describes the construction of evaluation pairs (e.g., "This yielded 495 pairs", "We created a set of 500 recipe pairs"). However, it does not provide traditional train/validation/test splits for training models, as the LLMs are used off-the-shelf. The experimental setup for human subjects involves random assignment to menus, which is not a dataset split for model training.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. It only mentions the LLMs used, which are external models.
Software Dependencies	No	The paper mentions evaluating specific LLMs (e.g., Claude 3.5 Sonnet, Gemini 1.5 Pro, GPT-3.5 Turbo, GPT-4o, Llama 3.1 70b Instruct, and o1-preview) and using Python's 'difflib'. However, it does not provide specific version numbers for any key software components or libraries, which are required for a reproducible description of ancillary software.
Experiment Setup	Yes	All evaluations were in a zero-shot setting. For experimental design: o1-preview was instructed to limit its response to 250 words. For menu design: Each LLM was instructed to reduce emissions of food choices by 75% while maintaining satisfaction, cost, nutrition, preparation time, and animal welfare... We set K = 36... We prompt the LLM to generate 20 additional recipes. Thus, N = 56. Finally, we set λ = 100.