⌘Ctrlk

vision

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use OS-Copilot: Towards Generalist Computer Agents with Self-Improvement OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis PG-Agent: An Agent Powered by Page Graph UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction Synthesis Vision-G1: Towards General Vision Language Reasoning with Multi-Domain Data Curation Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement WebSailor: Navigating Super-human Reasoning for Web Agent WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis WebVLN: Vision-and-Language Navigation on Websites You Only Look at Screens: Multimodal Chain-of-Action Agents

PreviousOn LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey NextAgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Last updated 5 months ago