Browsing: arXiv AI
arXiv:2504.10165v1 Announce Type: cross Abstract: Live tracking of wildlife via high-resolution video processing directly onboard drones is widely unexplored and…
arXiv:2504.08727v1 Announce Type: cross Abstract: We present a system using Multimodal LLMs (MLLMs) to analyze a large database with tens…
[Submitted on 3 May 2024 (v1), last revised 11 Apr 2025 (this version, v3)] View a PDF of the paper…
[Submitted on 7 Apr 2025 (v1), last revised 11 Apr 2025 (this version, v3)] Authors:Yu Yue, Yufeng Yuan, Qiying Yu,…
arXiv:2504.07257v1 Announce Type: new Abstract: Reinforcement learning (RL) agents have shown remarkable performances in various environments, where they can discover…
arXiv:2504.07424v1 Announce Type: new Abstract: Instruction-based Image Editing (IIE) models have made significantly improvement due to the progress of multimodal…
arXiv:2504.07463v1 Announce Type: new Abstract: Supporting learners’ understanding of taught skills in online settings is a longstanding challenge. While exercises…
[Submitted on 11 Jan 2025 (v1), last revised 10 Apr 2025 (this version, v3)] View a PDF of the paper…
[Submitted on 3 Apr 2025 (v1), last revised 10 Apr 2025 (this version, v2)] View a PDF of the paper…
arXiv:2504.07836v1 Announce Type: cross Abstract: Visual grounding (VG) aims to localize target objects in an image based on natural language…