Instruction-guided video editing has emerged as a rapidly advancing research
direction, offering new opportunities for intuitive content transformation
while also posing significant challenges for systematic evaluation. Existing
video editing benchmarks fail to support the evaluation of instruction-guided
video editing adequately and further suffer from limited source diversity,
narrow task coverage and incomplete evaluation metrics. To address the above
limitations, we introduce IVEBench, a modern benchmark suite specifically
designed for instruction-guided video editing assessment. IVEBench comprises a
diverse database of 600 high-quality source videos, spanning seven semantic
dimensions, and covering video lengths ranging from 32 to 1,024 frames. It
further includes 8 categories of editing tasks with 35 subcategories, whose
prompts are generated and refined through large language models and expert
review. Crucially, IVEBench establishes a three-dimensional evaluation protocol
encompassing video quality, instruction compliance and video fidelity,
integrating both traditional metrics and multimodal large language model-based
assessments. Extensive experiments demonstrate the effectiveness of IVEBench in
benchmarking state-of-the-art instruction-guided video editing methods, showing
its ability to provide comprehensive and human-aligned evaluation outcomes.