Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training

Authors: Maximillian Chen, Ruoxi Sun, Tomas Pfister, Sercan Arik

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate ACT s efficacy under in data-efficient tuning scenarios, even when there is no action label available, using multiple real-world conversational tasks: tabular-grounded question-answering, machine reading comprehension, and Ambig SQL, a novel task for disambiguating information-seeking requests for complex SQL generation towards data analysis agents. Additionally, we propose evaluating LLMs ability to function as conversational agents by examining whether they can implicitly recognize and reason about ambiguity in conversation. ACT demonstrates substantial conversation modeling improvements over standard tuning approaches like supervised fine-tuning and DPO.
Researcher Affiliation Industry Maximillian Chen 1,2, Ruoxi Sun1, Tomas Pfister1, Sercan Ö. Arık1 1Google 2Columbia University EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Building Contrastive Action Pairs input Dataset D, Conditional generation model M, Action Space S, Action Annotation Agent G Algorithm 2 ACT: Action-Based Contrastive Self-Training input Initial Policy Model πθ0, Action Contrast Dataset Dpref, Number of Batches B, Action Classifier A, User Simulator U, Task Heuristic H, Heuristic Tolerance ϵ
Open Source Code No The code to create Ambig SQL will be released publicly.
Open Datasets Yes PACIFIC (Deng et al., 2022): MIT Open-Source License. https://github.com/dengyang17/PACIFIC/tree/main Abg-Co QA (Guo et al., 2021): MIT Open-Source License. https://github.com/Meiqi Guo/AKBC2021-Abg-Co QA Spider (Yu et al., 2018): CC BY-SA 4.0. https://yale-lily.github.io/spider
Dataset Splits Yes Table A10: Overview of Ambig SQL, an ambiguous Text-to-SQL dataset synthesized from Spider. Train Dev Test Num. Unambiguous Requests 7,000 1,034 1,034 Num. Ambiguous Requests 7,000 1,034 1,034 Num. Unique Schemas 1,056 145 145 Types of Ambiguity 3 3 3
Hardware Specification Yes We conduct all experiments using one Google Compute Engine Virtual Machine with 8x 80GB A100 GPUs.
Software Dependencies No The paper mentions software like "PyTorch (Paszke et al., 2019)", "Hugging Face Transformers (Wolf et al., 2020)", and "Vertex AI SDK", but does not provide specific version numbers for these software dependencies. Only general citations and licenses are given without explicit version numbers.
Experiment Setup Yes For all of our SFT experiments with Zephyr, Mistral, and Gemma, we tune the model for up to 8 epochs. We choose the best-performing model with learning rates from {1e 4, 2e 5, 1e 5} with the Adam W optimizer. For our SFT experiments with Gemini Pro, we use the Vertex AI API6 and tune for up to 4 epochs with an Adapter size of 4. For all of our RL tuning experiments, we allow the model to train for up to 12 epochs, and select the checkpoint that results in the highest reward margin on the validation set... For all experiments, we use a batch size of 4, and a maximum sequence length of 1, 280. Hyperparameters for Equation 2 For experiments with Zephyr 7B on PACIFIC, we achieve our strongest results using β = 0.01 and a learning rate of 5e 7. On Ambig SQL, we use β = 0.01 and a learning rate of 5e 7. On Ambig SQL, we use β = 0.5 and a learning rate of 5e 7.