vision
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web TutorialsFrom the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data SynthesisMMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal CritiqueMobileVLM: A Vision-Language Model for Better Intra- and Inter-UI UnderstandingOS Agents: A Survey on MLLM-based Agents for General Computing Devices UseOS-Copilot: Towards Generalist Computer Agents with Self-ImprovementOS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task SynthesisPG-Agent: An Agent Powered by Page GraphUI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction SynthesisVision-G1: Towards General Vision Language Reasoning with Multi-Domain Data CurationWeaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual ChainsWebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement LearningWebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable ReinforcementWebSailor: Navigating Super-human Reasoning for Web AgentWebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory SynthesisWebVLN: Vision-and-Language Navigation on WebsitesYou Only Look at Screens: Multimodal Chain-of-Action Agents
PreviousOn LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A SurveyNextAgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials
Last updated