Combining Induction and Transduction for Abstract Reasoning
Authors: Wen-Ding Li, Keya Hu, Carter Larsen, Yuqing Wu, Simon Alford, Caleb Woo, Spencer Dunn, Hao Tang, Wei-Long Zheng, Yewen Pu, Kevin Ellis
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study this question on ARC by training neural models for induction (inferring latent functions) and transduction (directly predicting the test output for a given test input). We train on synthetically generated variations of Python programs that solve ARC training tasks. We find inductive and transductive models solve different kinds of test problems, despite having the same training problems and sharing the same neural architecture: Inductive program synthesis excels at precise computations, and at composing multiple concepts, while transduction succeeds on fuzzier perceptual concepts. Ensembling them approaches human-level performance on ARC. |
| Researcher Affiliation | Collaboration | 1Cornell 2Shanghai Jiao Tong University 4Autodesk |
| Pseudocode | No | The paper describes methods and processes in natural language and provides Python code examples in the appendix, but it does not contain explicit pseudocode or algorithm blocks with structured, non-code-like steps. |
| Open Source Code | Yes | Our code, data, and model weights are freely available at https: //github.com/xu3kev/BARC. |
| Open Datasets | Yes | Testing these neural methods requires a large dataset of function-learning problems, which is challenging to generate because not only must we make novel functions, but also good inputs to those functions. ... To address this challenge, we first generate a deterministic Python function for f, and then a probabilistic program for sampling inputs to f, finally executing those programs to produce input-outputs. ...Our code, data, and model weights are freely available at https: //github.com/xu3kev/BARC. |
| Dataset Splits | Yes | We report performance on the 400-problem public validation split of ARC, which is harder than the training split. |
| Hardware Specification | Yes | per device batch size device epcoh weight decay learning rate scheduler type 8 8x A100 3 0 cosine |
| Software Dependencies | Yes | Therefore the induction model must generate Python code, so we initialize our models with Llama3.1-8B-instruct (Dubey et al., 2024) because it was pretrained on source code.1 Our preliminary experiments suggested Llama3.1-8B-instruct was better than Mistral-7B-v0.3, Qwen27B-Instruct, and deepseek-coder-6.7b-instruct ... Unless otherwise mentioned, we create synthetic datasets with GPT4o-mini and ada-002. |
| Experiment Setup | Yes | Fine-tuning Hyperparameters: training type lora rank lora alpha learning rate gradient accumulate steps lora finetune 64 64 2e-4 2 per device batch size device epcoh weight decay learning rate scheduler type 8 8x A100 3 0 cosine ... Inference Hyperparameters: temperature: 0.8 (1.0 for the full-data fine-tuned model) ... beam width: 1. engineer results: 40 2. 100k data scale: 20 3. all other experiment results: 3 |