OrcaLoca: An LLM Agent Framework for Software Issue Localization
Authors: Zhongming Yu, Hejia Zhang, Yujie Zhao, Hanxian Huang, Matrix Yao, Ke Ding, Jishen Zhao
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that ORCALOCA becomes the new open-source stateof-the-art (SOTA) in function match rate (65.33%) on SWE-bench Lite. It also improves the final resolved rate of an open-source framework by 6.33 percentage points through its patch generation integration. |
| Researcher Affiliation | Collaboration | 1University of California, San Diego, USA 2Intel Corporation. Correspondence to: Jishen Zhao <EMAIL>. |
| Pseudocode | Yes | To have a better understanding of Figure 2, we provide a core algorithm pseudocode in Algorithm 1. It summarizes the essential components discussed in Sections 3.2, 3.3, and 3.4. |
| Open Source Code | Yes | ORCALOCA is available at https: //github.com/fishmingyu/Orca Loca. |
| Open Datasets | Yes | SWE-bench (Jimenez et al., 2023) is a widely used dataset for evaluating the ability of LLM systems to address real-world software engineering challenges. It comprises 2,294 task instances derived from 12 popular Python repositories, where each task requires a patch to resolve the issue described in its corresponding Git Hub issue. |
| Dataset Splits | No | The paper describes subsets of the SWE-bench dataset, such as SWE-bench Lite (300 instances), SWE-bench Verified (500 instances), and SWE-bench Common (93 instances), used for evaluation. However, it does not specify explicit training/test/validation splits for model development or how these instances are partitioned for their own experimental processes beyond being evaluation benchmarks. |
| Hardware Specification | No | This research was partially conducted using computational resources provided by the Google Cloud Platform (GCP) Credits Award. However, specific hardware details like GPU/CPU models or memory amounts are not provided. |
| Software Dependencies | Yes | ORCALOCA is built on the Llama Index framework (Liu, 2022), which supports various foundation models. For our experiments, we used Claude-3.5-Sonnet-20241022 (Anthropic, 2024) as the underlying model, with a sampling temperature set to 0.1 to prioritize deterministic results. [...] We then generate and execute a reproduction snippet using an LLM and record its execution trace with Viz Tracer (Gao, 2025). |
| Experiment Setup | Yes | For our experiments, we used Claude-3.5-Sonnet-20241022 (Anthropic, 2024) as the underlying model, with a sampling temperature set to 0.1 to prioritize deterministic results. For the top-k values used in action decomposition (Section 3.3), we set k = 3 for class decomposition and k = 2 for file decomposition. In the context pruning (Section 3.4), the context window size is configured to retain 12 entries (top-k). [...] For the repair process, we generated 40 patches (1 at a temperature of 0 and the rest at 0.8) with the str_replace_format argument set. [...] Regression tests were filtered with a temperature of 0, while reproduction tests were generated using 40 samples (1 at a temperature of 0 and the rest at 0.8). |