Adaptive Elicitation of Latent Information Using Natural Language
Authors: Jimmy Wang, Thomas P Zollo, Richard Zemel, Hongseok Namkoong
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments focus on three applications: the Twenty Questions game (using our novel and publicly available dataset, described below), opinion polling, and student assessment. In each scenario, the objective is to adaptively select questions that reveal as much information as possible with respect to a separate (though potentially overlapping) set of target questions. ... Overall results for our method and 2 baselines across all 3 datasets are shown in Figure 2. The top row of plots record accuracy on the target questions, while the bottom row record perplexity (or negative log-likelihood loss). |
| Researcher Affiliation | Academia | 1Columbia University. Correspondence to: Jimmy Wang <EMAIL>, Thomas Zollo <EMAIL>. |
| Pseudocode | No | The paper describes algorithms and procedures like 'Greedy Selection' and 'Lookahead / Monte Carlo Planning' in narrative text within Section 2.4, but it does not present them in a structured pseudocode or algorithm block format. |
| Open Source Code | Yes | Our code is available at https://github.com/namkoong-lab/adaptive-elicitation. |
| Open Datasets | Yes | To operationalize this game for benchmarking, we construct a novel Twenty Questions dataset from a curated set of objects in the THINGS database (Hebart et al., 2019)... Our dataset is publicly available,1 including the complete set of objects, curated questions, generated answers, and relevant metadata. Opinion QA (Santurkar et al., 2023) Originally created to evaluate the alignment of LLM opinions... EEDI Tutoring Dataset (Wang et al., 2020) EEDI is an online educational and tutoring platform... |
| Dataset Splits | Yes | We first split the training datasets by entity into train, validation, and test with a 70%, 15%, 15% split. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running the experiments. It mentions using 'Llama-3.1-8B model in FP16 precision' but no hardware specifications. |
| Software Dependencies | No | The paper mentions using a 'pre-trained Llama-3.1-8B model' and 'Lo RA (Hu et al., 2021) to finetune our model', and 'Adam W (Loshchilov & Hutter, 2019) optimizer', along with 'Alibaba-NLP/gte-large-en-v1.5 as our embedding model'. While these refer to specific models and techniques, the paper does not provide version numbers for underlying software libraries or programming languages (e.g., Python, PyTorch/TensorFlow, CUDA). |
| Experiment Setup | Yes | We initialize a pre-trained Llama-3.1-8B model in FP16 precision and use Lo RA (Hu et al., 2021) to finetune our model with parameters α = 24, rank= 8, and dropout= 0.1. Additional details are shown in Appendix C.1. ... We initialize the Adam W (Loshchilov & Hutter, 2019) optimizer with learning rate of 0.0001 and β = (0.9, 0.95), weight decay of 0.1, and we use a linear warmup for the learning rate after which we use a cosine scheduler. We train our model for 10, 000 epochs with a batch size of 4 and block size of 1024, after which we take the checkpoint with the lowest validation loss. |