💡
QMMMS的idea
Ctrlk
BlogNotes
  • QMMMS的idea
  • Agent
  • Annotation
  • CV
  • LLM and Complex Table
  • Methodology
  • RAG
  • Semi-Supervised Learning
  • evaluation
  • Synthetic Data
    • common
    • vision
      • AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials
      • From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis
      • MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique
      • MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding
      • OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
      • OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
      • OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
      • PG-Agent: An Agent Powered by Page Graph
      • UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction Synthesis
      • Vision-G1: Towards General Vision Language Reasoning with Multi-Domain Data Curation
      • Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
      • WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
      • WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement
      • WebSailor: Navigating Super-human Reasoning for Web Agent
      • WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis
      • WebVLN: Vision-and-Language Navigation on Websites
      • You Only Look at Screens: Multimodal Chain-of-Action Agents
Powered by GitBook
On this page
  1. Synthetic Data

vision

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web TutorialsFrom the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data SynthesisMMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal CritiqueMobileVLM: A Vision-Language Model for Better Intra- and Inter-UI UnderstandingOS Agents: A Survey on MLLM-based Agents for General Computing Devices UseOS-Copilot: Towards Generalist Computer Agents with Self-ImprovementOS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task SynthesisPG-Agent: An Agent Powered by Page GraphUI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction SynthesisVision-G1: Towards General Vision Language Reasoning with Multi-Domain Data CurationWeaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual ChainsWebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement LearningWebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable ReinforcementWebSailor: Navigating Super-human Reasoning for Web AgentWebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory SynthesisWebVLN: Vision-and-Language Navigation on WebsitesYou Only Look at Screens: Multimodal Chain-of-Action Agents
PreviousOn LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A SurveyNextAgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Last updated 3 months ago