Python
Analysis and evaluation of the “reasoning–execution” gap for multimodal GUI agents. This repo provides inference scripts for multiple models (UI-TARS, GUI-Owl, AgentCPM-GUI), EM evaluation, CoT reasoning and GTA annotation/evaluation, plus quadrant analysis utilities.
Latest commits.
Builders behind this project.