SpineBench: A Clinically Salient, Level-Aware Benchmark Powered By The SpineMed-450k Corpus - Takara TLDR

Spine disorders affect 619 million people globally and are a leading cause of
disability, yet AI-assisted diagnosis remains limited by the lack of
level-aware, multimodal datasets. Clinical decision-making for spine disorders
requires sophisticated reasoning across X-ray, CT, and MRI at specific
vertebral levels. However, progress has been constrained by the absence of
traceable, clinically-grounded instruction data and standardized,
spine-specific benchmarks. To address this, we introduce SpineMed, an ecosystem
co-designed with practicing spine surgeons. It features SpineMed-450k, the
first large-scale dataset explicitly designed for vertebral-level reasoning
across imaging modalities with over 450,000 instruction instances, and
SpineBench, a clinically-grounded evaluation framework. SpineMed-450k is
curated from diverse sources, including textbooks, guidelines, open datasets,
and ~1,000 de-identified hospital cases, using a clinician-in-the-loop pipeline
with a two-stage LLM generation method (draft and revision) to ensure
high-quality, traceable data for question-answering, multi-turn consultations,
and report generation. SpineBench evaluates models on clinically salient axes,
including level identification, pathology assessment, and surgical planning.
Our comprehensive evaluation of several recently advanced large vision-language
models (LVLMs) on SpineBench reveals systematic weaknesses in fine-grained,
level-specific reasoning. In contrast, our model fine-tuned on SpineMed-450k
demonstrates consistent and significant improvements across all tasks.
Clinician assessments confirm the diagnostic clarity and practical utility of
our model’s outputs.

Source link

What's Hot

How Deepseek 3.2 Reduces Costs While Boosting AI Performance

HCLTech joins MIT Media Lab US to collaborate on AI research

Q3 Venture Funding Jumps 38% As More Massive Rounds Go To AI Giants And Exits Gain Steam

SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus – Takara TLDR

FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents – Takara TLDR

Improving GUI Grounding with Explicit Position-to-Coordinate Mapping – Takara TLDR

Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation – Takara TLDR

Sotheby’s to Sell René Magritte Held in Same Collection for 100 years

Former ARTnews Publisher Dies at 97

National Gallery of Art Closes as a Result of Government Shutdown

Almine Rech Closes London Gallery After More Than a Decade

How Deepseek 3.2 Reduces Costs While Boosting AI Performance

HCLTech joins MIT Media Lab US to collaborate on AI research

Q3 Venture Funding Jumps 38% As More Massive Rounds Go To AI Giants And Exits Gain Steam

What's Hot

SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus – Takara TLDR

Related Posts

Subscribe to Updates