Browsing: Hugging Face
Rule-based reinforcement learning applied to multimodal large language models demonstrates effective generalization in visual tasks, particularly using jigsaw puzzles, outperforming…
A unified vision language action framework, LoHoVLA, combines a large pretrained vision language model with hierarchical closed-loop control to improve…
MagiCodec, a Transformer-based audio codec, enhances semantic tokenization while maintaining high reconstruction quality, improving compatibility with generative models. Neural audio…
Prolonged reinforcement learning training (ProRL) uncovers novel reasoning strategies in language models, outperforming base models and suggesting meaningful expansion of…
See examples and results at: https://leililab.github.io/HardTests/ RLVR is not just about RL, it’s more about VR! Particularly for LLM coding,…
CAPTCHAs have been a critical bottleneck for deploying web agents in real-world applications, often blocking them from completing end-to-end automation…
Vision language models exhibit strong biases in counting and identification tasks, demonstrating a failure mode that persist even with additional…
A study reveals that Large Language Models (LLMs) struggle with expressing uncertainty accurately and introduces MetaFaith, a prompt-based method that…
We present v1, a lightweight extension to Multimodal Large Language Models (MLLMs) that enables selective visual revisitation during inference. While…
A comprehensive TTS benchmark, EmergentTTS-Eval, automates test-case generation and evaluation using LLMs and LALM to assess nuanced and semantically complex…