We introduce 🤗 MigrationBench dataset, a benchmark dataset tailored for repository-level code migration, specifically targeting java 8 to 17 or other long-term support versions.
1. Dataset
MigrationBench comprises a large-scale collection of GitHub repositories, organized into three subsets:
🤗 AmazonScience/migration-bench-java-full contains 5,102 repos
Each repo has a test directory or at least one test case
🤗 AmazonScience/migration-bench-java-selected with 300 repos
A curated subset of 🤗 migration-bench-java-full
🤗 AmazonScience/migration-bench-java-utg has 4,814 repos
The unit test generation (utg) dataset, disjoint with 🤗 migration-bench-java-full
2. Evaluation Framework
To enable standardized and rigorous evaluation of LLM performance on this complex task, we provide a comprehensive open-source evaluation framework, available at: https://github.com/amazon-science/MigrationBench.
3. Baseline: Code Migration with LLMs
Inspired by Teaching Large Language Models to Self-Debug, we introduce SD-Feedback and demonstrate that LLMs can effectively tackle repository-level code migration from java 8 to 17.
On the selected subset using Claude-3.5-Sonnet-v2, SD-Feedback achieves 62.33% and 27.33% success rate (pass@1) for minimal and maximal migration respectively.