Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

Authors: Zhenyu Pan, Haozheng Luo, Manling Li, Han Liu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we exploit both public benchmarks and a Web3 case study to demonstrate the capability of Co A over other methods. ... In this section, we compare the performance of our Chain-of-Action framework with state-of-the-art baselines across public benchmarks. Subsequently, we provide a detailed analysis of our launched case study: a Question Answering (QA) application in the Web3 domain.
Researcher Affiliation Academia Zhenyu Pan Haozheng Luo Manling Li Han Liu Department of Computer Science, Northwestern University, Evanston, IL 60208, USA Department of Statistics and Data Science, Northwestern University, Evanston, IL 60208, USA EMAIL EMAIL
Pseudocode Yes B ALGORITHMS Algorithm 1 Description of Actions Workflow
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository.
Open Datasets Yes We select 4 classic, 1 long-form, and 1 open-domain QA task. Four classic QA tasks that include web-based QA (WQA) [2], general QA1 (DATE, General Knowledge, Social QA (So QA)), Truth QA [24], Strategy QA (SQA) [6], and Fact Checking (FEVER [26]). Long-form QA task is the first long-form QA dataset focusing on ambiguous factoid questions, ASQA [25]. Open-domain QA task is QRe CC [1], testing the ability to handle context-dependent queries across different domains.
Dataset Splits No The paper references public benchmarks (WQA [2], Truth QA [24], Strategy QA (SQA) [6], FEVER [26], ASQA [25], QRe CC [1]), but does not explicitly provide the specific training/test/validation split percentages or methodology within its text.
Hardware Specification Yes All experiments are carried out on a cluster, with the exception of the distributed compute node experiment. Each node within the cluster is equipped with 1 NVIDIA GEFORCE RTX 2080 Ti GPUs and 6 8-core Intel XEON Silver 4214 processors running at 2.20GHz. The combined RAM capacity across the cluster nodes amounts to 755GB, and the operating system employed is Ubuntu 18.04.
Software Dependencies No The paper mentions using 'gpt-3.5-turbo' and 'GPT-4' as models and 'Langchain' for React implementation, but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or TensorFlow.
Experiment Setup Yes Below, we provide a list of all the hyperparameters used in our experiments. Table 8: Hyperparameter used in the task. parameter values temperature 0.0 max_length 1000 top_p 1.0 n_clusters 5 retrieval_number 3 seed 1