Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Authors: Zhenyu Pan, Haozheng Luo, Manling Li, Han Liu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we exploit both public benchmarks and a Web3 case study to demonstrate the capability of Co A over other methods. ... In this section, we compare the performance of our Chain-of-Action framework with state-of-the-art baselines across public benchmarks. Subsequently, we provide a detailed analysis of our launched case study: a Question Answering (QA) application in the Web3 domain. |
| Researcher Affiliation | Academia | Zhenyu Pan Haozheng Luo Manling Li Han Liu Department of Computer Science, Northwestern University, Evanston, IL 60208, USA Department of Statistics and Data Science, Northwestern University, Evanston, IL 60208, USA EMAIL EMAIL |
| Pseudocode | Yes | B ALGORITHMS Algorithm 1 Description of Actions Workflow |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository. |
| Open Datasets | Yes | We select 4 classic, 1 long-form, and 1 open-domain QA task. Four classic QA tasks that include web-based QA (WQA) [2], general QA1 (DATE, General Knowledge, Social QA (So QA)), Truth QA [24], Strategy QA (SQA) [6], and Fact Checking (FEVER [26]). Long-form QA task is the first long-form QA dataset focusing on ambiguous factoid questions, ASQA [25]. Open-domain QA task is QRe CC [1], testing the ability to handle context-dependent queries across different domains. |
| Dataset Splits | No | The paper references public benchmarks (WQA [2], Truth QA [24], Strategy QA (SQA) [6], FEVER [26], ASQA [25], QRe CC [1]), but does not explicitly provide the specific training/test/validation split percentages or methodology within its text. |
| Hardware Specification | Yes | All experiments are carried out on a cluster, with the exception of the distributed compute node experiment. Each node within the cluster is equipped with 1 NVIDIA GEFORCE RTX 2080 Ti GPUs and 6 8-core Intel XEON Silver 4214 processors running at 2.20GHz. The combined RAM capacity across the cluster nodes amounts to 755GB, and the operating system employed is Ubuntu 18.04. |
| Software Dependencies | No | The paper mentions using 'gpt-3.5-turbo' and 'GPT-4' as models and 'Langchain' for React implementation, but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | Below, we provide a list of all the hyperparameters used in our experiments. Table 8: Hyperparameter used in the task. parameter values temperature 0.0 max_length 1000 top_p 1.0 n_clusters 5 retrieval_number 3 seed 1 |