YouTube
哔哩哔哩
谷歌浏览器与钉钉
Search.NBA.FMVP.and.send.to.friend.mp4
Word
Write.an.introduction.of.Alibaba.in.Word.mp4
Mobile-Agent-v2.mp4
Mobile-Agent.mp4
- 🔥🔥[9.26] Mobile-Agent-v2 被 The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024) 接收。
- 🔥[8.23]我们发布了一个支持Mac和Windows平台的PC操作助手PC-Agent, 通过Mobile-Agent-v2框架实现。
- 🔥[7.29] Mobile-Agent获得了 第二十三届中国计算语言学大会 (CCL 2024) 的 最佳demo奖项。在CCL 2024上,我们展示了即将开源的Mobile-Agent-v3,拥有更小的内存开销(8 GB)、更快的推理速度(每次操作10-15秒),并且使用开源模型。视频Demo请见上一个板块📺Demo。
- [6.27] 我们在Hugging Face和ModelScope发布了可以上传手机截图体验Mobile-Agent-v2的Demo,无需配置模型和设备,即刻便可体验。
- [6. 4] Modelscope-Agent 已经支持 Mobile-Agent-V2,基于 Android Adb Env,请查看 application。
- [6. 4] 我们发布了新一代移动设备操作助手 Mobile-Agent-v2, 通过多智能体协作实现有效导航。
- [3.10] Mobile-Agent 被 ICLR 2024 Workshop on Large Language Model (LLM) Agents 接收。
- Mobile-Agent-v3
- Mobile-Agent-v2 - 通过多代理协作有效导航的移动设备操作助手
- Mobile-Agent - 视觉感知方案的自动化移动设备操作智能体
If you find Mobile-Agent useful for your research and applications, please cite using this BibTeX:
@article{wang2024mobile2,
title={Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration},
author={Wang, Junyang and Xu, Haiyang and Jia Haitao and Zhang Xi and Yan, Ming and Shen, Weizhou and Zhang, Ji and Huang, Fei and Sang, Jitao},
journal={arXiv preprint arXiv:2406.01014},
year={2024}
}
@article{wang2024mobile,
title={Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception},
author={Wang, Junyang and Xu, Haiyang and Ye, Jiabo and Yan, Ming and Shen, Weizhou and Zhang, Ji and Huang, Fei and Sang, Jitao},
journal={arXiv preprint arXiv:2401.16158},
year={2024}
}
- AppAgent: Multimodal Agents as Smartphone Users
- mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
- Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
- GroundingDINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
- CLIP: Contrastive Language-Image Pretraining