NextCoder: Robust Adaptation of Code LMs to Diverse Code Edits
Authors: Tushar Aggarwal, Swayam Singh, Abhijeet Awasthi, Aditya Kanade, Nagarajan Natarajan
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using our approach, we obtain a new series of models Next Coder (adapted from Qwen Coder-2.5) that achieves strong results on five code-editing benchmarks, outperforming comparable size models and even several larger ones. We show the generality of our approach on two model families (Deep Seek Coder and Qwen Coder), compare against other fine-tuning approaches, and demonstrate robustness by showing retention of code generation and general problemsolving abilities post adaptation. |
| Researcher Affiliation | Industry | 1Microsoft Research India. Correspondence to: Tushar Aggarwal <EMAIL>, Swayam Singh <EMAIL>, Abhijeet Awasthi <EMAIL>, Aditya Kanade <EMAIL>, Nagarajan Natarajan <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Sele KT: Selective Knowledge Transfer Require: Base LM weights θbase, training data D, epochs E, periodicity M, sparsity α. Ensure: Final fine-tuned weights θFT. 1: Initialize θ ← θbase. 2: for epoch e = 1 to E do 3: for each minibatch D[s] do 4: θ ← Train Step(θ, D[s]) [Dense Gradients] 5: if s mod M = 0 then 6: Compute task vector: τ ← θ − θbase 7: Select top-αN parameters: (1, i ∈ top-k(|τ|, α N ) 0, otherwise 8: θ ← θbase + γ τ [Sparse Projection] 9: end if 10: end for 11: end for 12: return θ as θFT. |
| Open Source Code | Yes | We opensource the models, synthetic dataset, and implementation at aka.ms/nextcoder. |
| Open Datasets | Yes | We opensource the models, synthetic dataset, and implementation at aka.ms/nextcoder. |
| Dataset Splits | No | In addition to the synthetic data (Table 1), we used 127K instances from Commit Pack FT to fine-tune our models. The paper does not specify explicit training/validation/test splits for this combined dataset. |
| Hardware Specification | Yes | For fine-tuning and inference, we use 8 NVIDIA H100 GPUs, each with 80GB of VRAM. For data generation using GPT-4o (version 2024-05-13), we use the Open AI API. Following Singhal et al. (2024), we perform run-time evaluations for No Fun Eval on an Azure NC16 VM (NC16). |
| Software Dependencies | No | The paper mentions using 'Adam W optimizer' and 'Warmup LR scheduler', 'Deep Speed' (Rajbhandari et al., 2020) and 'bfloat16' for memory optimizations, but does not provide specific version numbers for these software components or libraries. |
| Experiment Setup | Yes | We fine-tune for 3 epochs, across all our experiments, using Adam W optimizer (Loshchilov & Hutter, 2017) with a learning rate of 10^-5, and a Warmup LR scheduler (Kim et al., 2021) with a warmup ratio of 0.1. For efficient memory management, we used sample packing with a maximum sequence length of 8192 tokens for Deep Seek Coder-6.7B and 16384 tokens for Qwen Coder variants, with batch sizes of 4 and 1 per GPU, respectively. Gradient accumulation steps were set to 4, resulting in respective effective batch sizes of 64 and 32. We fix the periodicity to 1 epoch in the Sele KT algorithm unless specified otherwise, i.e., M = total number of mini-batches. We set sparsity α = 0.05 per layer. |