Open-source Code Graph Models enhance repository-level code generation tasks by integrating code graph structures into LLMs’ attention mechanisms, achieving high performance without agent-based approaches.
Recent advances in Large Language Models (LLMs) have shown promise in
function-level code generation, yet repository-level software engineering tasks
remain challenging. Current solutions predominantly rely on proprietary LLM
agents, which introduce unpredictability and limit accessibility, raising
concerns about data privacy and model customization. This paper investigates
whether open-source LLMs can effectively address repository-level tasks without
requiring agent-based approaches. We demonstrate this is possible by enabling
LLMs to comprehend functions and files within codebases through their semantic
information and structural dependencies. To this end, we introduce Code Graph
Models (CGMs), which integrate repository code graph structures into the LLM’s
attention mechanism and map node attributes to the LLM’s input space using a
specialized adapter. When combined with an agentless graph RAG framework, our
approach achieves a 43.00% resolution rate on the SWE-bench Lite benchmark
using the open-source Qwen2.5-72B model. This performance ranks first among
open weight models, second among methods with open-source systems, and eighth
overall, surpassing the previous best open-source model-based method by 12.33%.