Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols 0001, Yinfei Yang, Zhe Gan. Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs. In Ales Leonardis, Elisa Ricci 0001, Stefan Roth 0001, Olga Russakovsky, Torsten Sattler, Gül Varol, editors, Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part LXIV. Volume 15122 of Lecture Notes in Computer Science, pages 240-255, Springer, 2024. [doi]

Abstract

Abstract is missing.

  翻译: