Emergent Response Planning in LLMs
Authors: Zhichen Dong, Zhanhui Zhou, Zhixuan Liu, Chao Yang, Chaochao Lu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present experimental results across six tasks, showing that LLM hidden prompt representations encode rich information about upcoming responses and can be used to probe and predict global response attributes. Insight 1: Models present emergent planning on structure, content, and behavior attributes, which can be probed with high accuracy (Fig. 2). Our in-dataset probing experiments (where probes are trained and tested on different splits of the same prompt dataset) reveal that both base and fine-tuned models encode structure, content, and behavior attributes, with fine-tuned models showing superior performance. |
| Researcher Affiliation | Academia | 1Shanghai Artificial Intelligence Laboratory 2Work done while at Shanghai Artificial Intelligence Laboratory. Correspondence to: Chao Yang <EMAIL>, Zhichen Dong <EMAIL>, Zhanhui Zhou <EMAIL>. |
| Pseudocode | No | The paper describes methods but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is provided or made publicly available, nor does it include any repository links. |
| Open Datasets | Yes | Datasets Links Ultrachat (Ding et al., 2023) https://huggingface.co/datasets/stingning/ ultrachat Alpaca Eval (Taori et al., 2023) https://huggingface.co/datasets/tatsu-lab/ alpaca GSM8K (Cobbe et al., 2021) https://huggingface.co/datasets/openai/gsm8k MATH (Saxton et al., 2019) https://huggingface.co/datasets/deepmind/math_ dataset Tiny Stories (Eldan & Li, 2023) https://huggingface.co/datasets/roneneldan/ Tiny Stories ROCStories (Mostafazadeh et al., 2016) https://huggingface.co/datasets/Ximing/ ROCStories Commonsense QA (Talmor et al., 2019) https://huggingface.co/datasets/tau/ commonsense_qa Social IQA (Sap et al., 2019) https://huggingface.co/datasets/allenai/social_ i_qa Med MCQA (Pal et al., 2022) https://huggingface.co/datasets/ openlifescienceai/medmcqa ARC-Challenge (Clark et al., 2018) https://huggingface.co/datasets/allenai/ai2_arc CREAK (Onoe et al., 2021) https://huggingface.co/datasets/amydeng2000/ CREAK FEVER (Thorne et al., 2018) https://huggingface.co/datasets/fever/fever |
| Dataset Splits | Yes | Datasets are split 60 : 20 : 20 for train-validation-test. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) that are required to reproduce the experiments. |
| Experiment Setup | Yes | We train one-hidden-layer MLPs with Re LU activation, with hidden sizes chosen among W = {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024}. The output size is 1 for regression and the number of classes with a softmax layer for classification. Each probe is trained for 400 epochs using MSELoss for regression and Cross Entropy Loss for classification. Datasets are split 60 : 20 : 20 for train-validation-test. We perform a grid search over MLP hidden sizes W and representation layers H (as inputs to the probes), reporting the test scores for the best hyperparameters. |