Multifaceted Memory-Enhanced Adaptive Planning For Efficient Mobile Task Automation

[Submitted on 17 Oct 2024 (v1), last revised 13 May 2025 (this version, v3)]

Authors:Zichen Zhu, Hao Tang, Yansi Li, Dingye Liu, Hongshen Xu, Kunyao Lan, Danyang Zhang, Yixuan Jiang, Hao Zhou, Chenrun Wang, Situo Zhang, Liangtai Sun, Yixiao Wang, Yuheng Sun, Lu Chen, Kai Yu

View a PDF of the paper titled MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation, by Zichen Zhu and 15 other authors

View PDF
HTML (experimental)

Abstract:Existing Multimodal Large Language Model (MLLM)-based agents face significant challenges in handling complex GUI (Graphical User Interface) interactions on devices. These challenges arise from the dynamic and structured nature of GUI environments, which integrate text, images, and spatial relationships, as well as the variability in action spaces across different pages and tasks. To address these limitations, we propose MobA, a novel MLLM-based mobile assistant system. MobA introduces an adaptive planning module that incorporates a reflection mechanism for error recovery and dynamically adjusts plans to align with the real environment contexts and action module’s execution capacity. Additionally, a multifaceted memory module provides comprehensive memory support to enhance adaptability and efficiency. We also present MobBench, a dataset designed for complex mobile interactions. Experimental results on MobBench and AndroidArena demonstrate MobA’s ability to handle dynamic GUI environments and perform complex mobile tasks.

Submission history

From: Zichen Zhu [view email]
[v1]
Thu, 17 Oct 2024 16:53:50 UTC (2,904 KB)
[v2]
Sun, 2 Mar 2025 07:34:35 UTC (3,016 KB)
[v3]
Tue, 13 May 2025 06:25:09 UTC (3,031 KB)

Source link

What's Hot

DeepSeek 3.1 Update : Features, Benefits & Limitations Explained

Making Sense Of AI’s Moment

How Tesla’s (TSLA) Robotaxi, AI Deals and U.K. Energy Push Could Shape Software Revenue Growth

Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation

LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents

From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection

VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

Mütter Museum in Philadelphia Announces New Policy for Human Remains

Inigo Philbrick, Art Dealer Convicted of Fraud, Appears in BBC Film

Links for August 22, 2025

White House Targets Specific Artworks at Smithsonian Museums