Browsing: Hugging Face
Semi-structured tables, widely used in real-world applications (e.g., financial reports, medical records, transactional orders), often involve flexible and complex layouts…
Speech tokenizers serve as foundational components for speech language models, yet current designs exhibit several limitations, including: 1) dependence on…
Compositional visual reasoning has emerged as a key research frontier in multimodal AI, aiming to endow machines with the human-like…
Evaluating natural language generation (NLG) systems remains a core challenge of natural language processing (NLP), further complicated by the rise…
In this paper, we introduce a novel learning paradigm for adaptive Large Language Model (LLM) agents that eliminates the need…
As large language models (LLMs) are increasingly deployed in real-world applications, the need to selectively remove unwanted knowledge while preserving…
Accurate diagnosis with medical large language models is hindered by knowledge gaps and hallucinations. Retrieval and tool-augmented methods help, but…
Multi-Head Latent Attention (MLA), introduced in DeepSeek-V2, compresses key-value states into a low-rank latent vector, caching only this vector to…
Recently, Vision-Language-Action (VLA) models have demonstrated strong performance on a range of robotic tasks. These models rely on multimodal inputs,…
LLMs have shown strong performance on human-centric reasoning tasks. While previous evaluations have explored whether LLMs can infer intentions or…