With more than 37,000 stars on GitHub and counting, Docling is one of IBM Research’s most popular toolkits, as it solves a simple yet critical question in AI pre-training and fine-tuning: how do you get clean, structured data from unstructured documents?
“How hard can it be? Well, it can be very hard,” said Peter Staar, a Principal Research Staff Member at IBM Research in Zurich and chair of the technical steering of Docling at the Linux Foundation, during a recent interview.
The Docling team marked an ambitious first year, building tools for document conversion, precision extraction and local deployment. It also collaborated with Red Hat on the launch of Docling OpenShift Operator and launched SmolDocling, an ultra-compact vision-language model for end-to-end multi-modal document conversion.
Docling, donated to the Linux Foundation, continues its growth with a push into agentic AI. “We’re building systems that can generate documents dynamically,” Staar said.
From ideation to open-sourcing the toolkit, IBM Think spoke with Staar on the evolution behind Docling.