Python

Reasoning-Executing-Gaps

Analysis and evaluation of the “reasoning–execution” gap for multimodal GUI agents. This repo provides inference scripts for multiple models (UI-TARS, GUI-Owl, AgentCPM-GUI), EM evaluation, CoT reasoning and GTA annotation/evaluation, plus quadrant analysis utilities.

PythonEmerging

GitHub

Stars

Forks

—

Contributors

Last push

9mo ago

Recent commits

Latest commits.

Initial commit
3774aa0lingzhong9mo ago

Top contributors

Builders behind this project.

LZ-Dong

1 commits