DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Fangyu Lei1,2,3*§, Jinxiang Meng1,2*, Yiming Huang5, Junjie Zhao3, Yitong Zhang6,
Jianwen Luo1,2, Xin Zou3, Ruiyi Yang3, Wenbo Shi3, Yan Gao3, Shizhu He1,2,
Zuo Wang3, Qian Liu4, Yang Wang3, Ke Wang3,†, Jun Zhao1,2, Kang Liu1,2,†
1Institute of Automation, CAS   2University of Chinese Academy of Sciences
3ByteDance Seed   4TikTok   5UC San Diego   6NUS
*Equal Contribution, §Work done at ByteDance Seed, Corresponding authors

Due to legal requirements, we will release the full baseline and evaluation code on Dec 8. Stay tuned!

Overview

Real-world enterprise data intelligence workflows encompass data engineering (DE) that turns raw sources into analytical-ready tables and data analysis (DA) that converts those tables into decision-oriented insights. We introduce DAComp, a benchmark of 210 tasks that mirrors these complex workflows.

Data Engineering (DE) tasks require repository-level engineering on industrial schemas, including designing and building multi-stage SQL pipelines. Data Analysis (DA) tasks pose open-ended business problems that demand strategic planning and insight synthesis. Our experiments reveal that even state-of-the-art agents falter, with DE success rates under 20% and DA scores averaging below 40%.

DAComp Overview

🧩 Case Studies

Scenario: Designing a Data Warehouse Blueprint

Given a vague business requirement, the agent must design a comprehensive Architecture Blueprint. This involves identifying necessary data sources, defining the grain for Staging/Intermediate/Marts layers, and specifying strict data contracts before writing any code.

Architecture Blueprint

Scenario: Building Pipelines from Scratch

The agent is tasked with building a multi-layer SQL-based data pipeline from zero. It must generate SQL files for Staging (cleaning), Intermediate (logic), and Marts (aggregation), ensuring all 30+ files pass compilation and data integrity tests.

Implementation DAG

Scenario: Evolving Business Logic

A new requirement demands evolving the existing pipeline. The agent must interpret the request, identify the impact scope, and add or modify specific SQL files to update the DAG without breaking existing dependencies.

Evolution Pipeline

🏆 Leaderboard

Dataset Language:
RankModelFramework Architecture Score Type
1GPT-5DE-Agent63.93Proprietary
2DeepSeek-V3.1DE-Agent52.66Open
3Gemini-2.5-ProDE-Agent51.96Proprietary
4Qwen3-CoderDE-Agent51.43Open
5Qwen3-235B-A22BDE-Agent50.73Open
6o3DE-Agent48.32Proprietary
7Qwen3-8BDE-Agent45.12Open
RankModelFramework CFS Score CS Score Type
1GPT-5DE-Agent30.7961.98Proprietary
2Gemini-2.5-ProDE-Agent27.6655.32Proprietary
3Qwen3-CoderDE-Agent23.6454.21Open
4DeepSeek-V3.1DE-Agent22.3350.04Open
5o3DE-Agent15.0735.55Proprietary
6Qwen3-235B-A22BDE-Agent2.4320.15Open
7Qwen3-8BDE-Agent1.3115.33Open
RankModelFramework CFS Score Success Rate (SR) Type
1GPT-5DE-Agent38.7520.00%Proprietary
2Qwen3-CoderDE-Agent27.1212.00%Open
3o3DE-Agent24.426.00%Proprietary
4DeepSeek-V3.1DE-Agent24.1110.00%Open
5Gemini-2.5-ProDE-Agent23.978.00%Proprietary
6Qwen3-8BDE-Agent15.892.00%Open
7Qwen3-235B-A22BDE-Agent12.432.00%Open
RankModelFramework 📊 DA Score Type
1GPT-5DA-Agent50.84Proprietary
2GPT-5OpenHands46.99Proprietary
3Kimi-K2DA-Agent41.89Proprietary
4Gemini-2.5-ProDA-Agent34.70Proprietary
5DeepSeek-V3.1DA-Agent34.33Open
6DeepSeek-V3.1OpenHands33.87Open
7Gemini-2.5-ProOpenHands33.38Proprietary
8o3DA-Agent28.20Proprietary
9o3OpenHands26.57Proprietary
10Qwen3-CoderDA-Agent25.13Open
11Qwen3-CoderOpenHands24.28Open
12Doubao-Seed-1.6DA-Agent20.74Proprietary
13Qwen3-235B-A22BDA-Agent13.25Open
14Qwen3-235B-A22BOpenHands12.43Open
15Qwen3-8BDA-Agent4.47Open

* DA Score is the aggregate of Completeness, Accuracy, Insightfulness, Readability, Analytical Depth, and Visualization scores.

BibTeX

@misc{lei2025dacompbenchmarkingdataagents,
      title={DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle}, 
      author={Fangyu Lei and Jinxiang Meng and Yiming Huang and Junjie Zhao and Yitong Zhang and Jianwen Luo and Xin Zou and Ruiyi Yang and Wenbo Shi and Yan Gao and Shizhu He and Zuo Wang and Qian Liu and Yang Wang and Ke Wang and Jun Zhao and Kang Liu},
      year={2025},
      eprint={2512.04324},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.04324}, 
}