CursorCore: Assist Programming through Aligning Anything
Authors: Hao Jiang, Qi Liu, Rui Li, Shengyu Ye, Shijin Wang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we propose a new framework that comprehensively integrates these information sources, and collect data to train models and evaluate their performance. Firstly, to thoroughly evaluate how well models align with different types of information and the quality of their outputs, we introduce a new benchmark, APEval (Assist Programming Eval), to comprehensively assess the performance of models in programming assistance tasks. Then, for data collection, we develop a data generation pipeline, Programming-Instruct, which synthesizes training data from diverse sources, such as Git Hub and online judge platforms. This pipeline can automatically generate various types of messages throughout the programming process. Finally, using this pipeline, we generate 219K samples, fine-tune multiple models, and develop the Cursor Core series. We show that Cursor Core outperforms other models of comparable size. |
| Researcher Affiliation | Collaboration | 1State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China 2Institute of Artificial Intelligence, Hefei Comprehensive National Science Center 3i FLYTEK Co., Ltd. |
| Pseudocode | No | The paper describes methods and processes in narrative text and uses code examples in figures (e.g., Figure 1, Figure 2) to illustrate concepts, but does not contain any structured pseudocode or algorithm blocks with step-by-step procedures. |
| Open Source Code | Yes | Code, models and data are freely available at https:// github.com/Techx Genus/Cursor Core. |
| Open Datasets | Yes | For AI Programmer, we gather code snippets from datasets such as the Stack (Kocetkov et al., 2023) and OSS-Instruct (Wei et al., 2023b), then prompt LLMs to generate the programming process. For Git Commit data, we collect relevant information from Edit Pack FT (Cassano et al., 2023b) (a filtered version of Commit Pack FT (Muennighoff et al., 2024)) and further refine it through post-processing and filtering. Regarding Online Judge Submission data, we source the programming process from the Codenet dataset (Puri et al., 2021). ... we also incorporate the Evol-Instruct dataset (ISE-UIUC, 2023) collected using the GPT series (Ouyang et al., 2022) |
| Dataset Splits | No | The paper states that 219K samples are generated for training data, and a new benchmark APEval is introduced for evaluation. While APEval's collection process is described, there is no explicit mention of how the 219K training samples were split into training, validation, or test sets for the Cursor Core models' development. The paper mentions evaluating on APEval's Python version using the test set created by Eval Plus (Liu et al., 2023), but this refers to the evaluation benchmark, not the internal splits of their model's training data. |
| Hardware Specification | Yes | For Mistral-Large-Instruct, we quantize the model using the GPTQ (Frantar et al., 2022) algorithm and deploy it locally with SGLang (Zheng et al., 2023a) and Marlin kernel (Frantar et al., 2024) on 4 Nvidia RTX 4090 GPUs. |
| Software Dependencies | Yes | Our models are trained for 2 epochs using the Transformers library (Wolf et al., 2020). We enhance memory efficiency and speed with techniques such as Deepspeed Ze RO3 (Rajbhandari et al., 2019), Ze RO Offload (Ren et al., 2021), Flash Attention2 (Dao, 2024), and triton kernels (Hsu et al., 2024). ... The training process employs the Adafactor optimizer (Shazeer & Stern, 2018) with a learning rate of 5e-5, coupled with a cosine scheduler featuring 15 warm-up steps. |
| Experiment Setup | Yes | Our models are trained for 2 epochs using the Transformers library (Wolf et al., 2020). ... The training process employs the Adafactor optimizer (Shazeer & Stern, 2018) with a learning rate of 5e-5, coupled with a cosine scheduler featuring 15 warm-up steps. |