Logics-Parsing Technical Report - Takara TLDR

Recent advances in Large Vision-Language models (LVLM) have spurred
significant progress in document parsing task. Compared to traditional
pipeline-based methods, end-to-end paradigms have shown their excellence in
converting PDF images into structured outputs through integrated Optical
Character Recognition (OCR), table recognition, mathematical formula
recognition and so on. However, the absence of explicit analytical stages for
document layouts and reading orders limits the LVLM’s capability in handling
complex document types such as multi-column newspapers or posters. To address
this limitation, we propose in this report Logics-Parsing: an end-to-end
LVLM-based model augmented with reinforcement learning. Our model incorporates
meticulously designed reward mechanisms to optimize complex layout analysis and
reading order inference. In addition, we expand the model’s versatility by
incorporating diverse data types such as chemical formulas and handwritten
Chinese characters into supervised fine-tuning. Finally, to enable rigorous
evaluation of our approach, we introduce LogicsParsingBench, a curated set of
1,078 page-level PDF images spanning nine major categories and over twenty
sub-categories, which will be released later. Comprehensive experiments
conducted on LogicsParsingBench have validated the efficacy and
State-of-the-art (SOTA) performance of our proposed model across diverse
document analysis scenarios. Project Page:
https://github.com/alibaba/Logics-Parsing

Source link

What's Hot

LLMs4All: A Review on Large Language Models for Research and Applications in Academic Disciplines – Takara TLDR

DoWhile loops now supported in Amazon Bedrock Flows

Page 2: Much slimmer, but hardly weaker

Logics-Parsing Technical Report – Takara TLDR

LLMs4All: A Review on Large Language Models for Research and Applications in Academic Disciplines – Takara TLDR

SIM-CoT: Supervised Implicit Chain-of-Thought – Takara TLDR

Video models are zero-shot learners and reasoners – Takara TLDR

Burmese Curator Flees Thailand After China Censors Art Exhibition

New Research Reveals Source for Dog in Rembrandt’s ‘Night Watch’

Treasures Recovered from Titanic Sister Ship Britannic Off Greek Coast

Superheroes Take Over the Met Opera House in “Super Duper”

LLMs4All: A Review on Large Language Models for Research and Applications in Academic Disciplines – Takara TLDR

DoWhile loops now supported in Amazon Bedrock Flows

Page 2: Much slimmer, but hardly weaker

What's Hot

Logics-Parsing Technical Report – Takara TLDR

Related Posts

Subscribe to Updates