Discovering Symbolic Cognitive Models from Human and Animal Behavior
Authors: Pablo Samuel Castro, Nenad Tomasev, Ankit Anand, Navodita Sharma, Rishika Mohanta, Aparna Dev, Kuba Perlin, Siddhant Jain, Kyle Levin, Noemi Elteto, Will Dabney, Alexander Novikov, Glenn C Turner, Maria K Eckstein, Nathaniel D. Daw, Kevin J Miller, Kim Stachenfeld
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We consider datasets from three species performing a classic reward-learning task that has been the focus of substantial modeling effort, and find that the discovered programs outperform state-of-the-art cognitive models for each. The discovered programs can readily be interpreted as hypotheses about human and animal cognition, instantiating interpretable symbolic learning and decision-making algorithms. Figure 1. Discovered models outperform human-designed models. We evaluate the best program discovered by Cog Fun Search for each dataset, using average normalized likelihood of the choices made by held-out test subjects, and it to the best existing model from the neuroscience and psychology literature (all p < 0.002, signed-rank test) |
| Researcher Affiliation | Collaboration | 1Google Deep Mind 2Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA |
| Pseudocode | Yes | The paper includes structured code blocks in Appendix E ('Baseline Programs'), Appendix G ('Best Discovered Programs'), and Appendix H ('Seed program comparison'), such as: 'def agent( params: chex.Array, choice: int, reward: int, agent_state: Optional[chex.Array], ) -> Tuple[chex.Array, chex.Array]:' in section E.1. |
| Open Source Code | No | The paper does not provide an explicit statement about the release of their source code or a link to a repository for the methodology described in this paper. It mentions using Fun Search and Python community tools, but not the specific code implemented for this work. |
| Open Datasets | Yes | Human Dataset (Fig. 3A; Eckstein et al. 2024) considers human participants performing a four-alternative task with graded rewards. Rat Dataset (Fig. 3B; Miller et al. 2021) considers rats performing a two-armed bandit task with binary rewards. Fruit Fly Dataset (Fig. 3C; Mohanta 2022; Rajagopalan et al. 2023) considers fruit flies performing a two-armed bandit task with binary rewards. |
| Dataset Splits | Yes | In particular, for each subject i, we split its sessions into even and odd sets deven i := {si,0, si,2, . . . , si,M 1} and di,odd := {si,1, si,3, . . . , si,M}, respectively. For the fruit fly dataset, since we have only one session per subject, we forego this additional level of variation and treat the dataset as though it were multiple sessions from a single subject with a single θ. We maintain a group of held-out subjects, Dtest, in order to validate our discovered programs. Figure 9. Organizing data for train and test. In Human and Rat datasets we use half of the subjects for training, and half for testing; for each train subject, we use half of its sessions for parameter fitting and half for evaluation. For the Fly dataset we use half the subjects for training and half for testing, and proceed similarly as for the other datasets. |
| Hardware Specification | No | The paper does not explicitly state the specific hardware used for running its experiments. It mentions using LLMs like Gemini 1.5 Flash, which implies high-performance computing, but no details on CPU, GPU models, or other hardware specifications are provided. |
| Software Dependencies | Yes | The authors would also like to thank the Python community (Van Rossum & Drake Jr, 1995; Oliphant, 2007) for developing tools that enabled this work, including Num Py (Harris et al., 2020), Matplotlib (Hunter, 2007), Jupyter (Kluyver et al., 2016), Pandas (Mc Kinney, 2013) and JAX (Bradbury et al., 2018b). Cog Fun Search s programs must be implemented in Jax (Bradbury et al., 2018a) so that they are differentiable. |
| Experiment Setup | Yes | We use the Ada Belief optimizer with learning rate 5 10 2 which is run until convergence or until 10,000 steps of gradient descent are reached. In order to test convergence, we compare the current score at iteration k, Ωk to the previously recorded score Ωk 100 every 100 steps. If the relative change in score |(Ωk Ωk 100)/Ωk 100| is less than a convergence threshold 10 2, we conclude that parameter fitting has converged. Specifically, we train a GRU model (Cho et al., 2014) over di,even, run a sweep on the number of hidden units (over {1, 2, 4, 8, 16, 32, 64, 128}), and use early-stopping to select the best parameters. All the variants were trained with the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 1e 4. |