The State of YC AI Agents (2026)
Most teams have agents in production. Almost none know if they're actually working

We surveyed YC companies building AI agent products and found that the hardest part is no longer getting agents into production — it's figuring out whether agents are actually helping users so adoption and usage can scale.
Two signals from the data stood out.
First, most agent deployments are still early in scale: 89% of production systems handle fewer than 10k conversations per month.
Second, even teams already using tools like LangSmith, Langfuse, or Grafana report that understanding and improving agent behavior is still the hardest operational problem.
To better understand how YC companies are deploying agents today, we asked founders about:
- usage volume
- agent use cases
- interaction patterns
- architecture design
- operational challenges
The full report is exclusive to survey respondents, but here are a few key signals that emerged from the data.
🚀 1) Most agent startups already have production deployments
- 86% of respondents already have agents live in production
- 14% are still in development
This suggests YC teams are moving beyond demos quickly. Agents are already embedded into real workflows and products. This shouldn’t be a big surprise these days — but what comes after deployment (adoption, usage, churn) is what really matters.
📊 2) Most production agents are still early in scale
Among the companies that reported agents live in production, conversation scale is still mostly small:
- 89% handle fewer than 10k conversations per month
- Only one respondent reported volumes in the 1M–10M/month range
So while YC companies are already shipping agents to production, most deployments are still early in scale. This surprised me, considering all the LinkedIn, Reddit, and Twitter claims about how agents are everywhere doing everything. Good reminder not to spend too much time comparing yourself to what you see online.
⚙️ 3) What YC agents are actually doing
The most common agent use cases (multiple select) were:
- 62% data extraction / processing
- 62% workflow automation
- 38% research and analysis
- 38% content generation
- 33% search / retrieval
- 29% customer support
This suggests many YC agent products focus primarily on structured work and operational tasks, where outcome success is more clearly defined. This becomes relevant when we look at the key challenges facing agent products today.
🔄 4) Agents are both product features and background systems
One interesting pattern from the survey is that agents are not just chat interfaces — many run as background systems inside the product.
Across responses:
- 76% expose agents directly to customers
- 48% run agents in the background performing automated tasks
- 33% run agents primarily for internal teams
This suggests many products now include both:
- interactive agent experiences (chat, copilots, assistants)
- background agents that execute tasks, process data, or automate workflows
In other words, agents are increasingly becoming core product infrastructure, not just user-facing chat features.
🧠 5) Agent architectures are converging on hybrid systems
Teams reported using a mix of architectural patterns:
- 76% use iterative reasoning loops
- 57% use deterministic workflows or state machines
- 38% use multi-agent systems
- 33% use single-pass tool calling
- 24% use planner–executor architectures
The interesting part is the overlap.
Many teams reported using both iterative loops and deterministic workflows, suggesting a common architecture pattern emerging:
- deterministic workflow orchestration and agent reasoning loops
In practice, workflows often control the high-level product logic, while agent loops handle reasoning and tool use inside each step.
🧪 6) The biggest challenge: Evaluation tooling
Now comes the fun part — the pain points.
We asked respondents what their biggest challenge is with building, running, and scaling production agents. When we clustered the free-text responses, they grouped into three broad categories:
Agent quality & improvement
(evals, reliability, monitoring)
System performance & infrastructure
(latency, cost, traffic/availability, voice limitations)
Product & organizational constraints
(discovering use cases, time/resources)
The largest cluster by far was the first one: agent quality and improvement.
About 38% of respondents explicitly mentioned evaluation challenges, including:
- building eval suites
- running A/B tests
- improving agent behavior over time
What’s interesting is that many founders mentioned using systems like LangSmith, Langfuse, Braintrust, or internal observability tooling, yet still struggle to understand what their agents are doing in the wild — and whether they are actually helping users.
(You can bet I forwarded this insight to our team at Voker — it’s a clear signal that reinforces our thesis that the space needs a purpose-built Agent Analytics platform.)
As product builders, when people tell us their pain, we should always ask: what’s the pain behind the pain?
When I did this, two takeaways stood out.
1️⃣ “Evals” are a symptom, not the full solution.
Many respondents already have evaluation tools in their stack. What they’re really describing is the broader problem:
AI promises magic, but delivers lumpy outcomes — sometimes magical, sometimes frustrating.
Evals help address part of this, but they’re only one tool in the box. In a previous survey we ran, a super-majority of respondents said evals often under-deliver because keeping them up to date becomes an impossible task.
2️⃣ Quality problems ultimately show up as product problems.
Why does quality matter? Because adoption, usage, and churn are on the line.
2025 was the year agents got into production.
2026 is the year teams have to harden and optimize them.
A huge wave of churn may be coming as the unbelievable promise of “Ask me anything” (the slogan behind almost every agentic product out there) starts to come under scrutiny from real users.
An early indicator is the low interaction volumes we saw in this survey.
The lagging indicator is the C-word: churn.
That’s why I believe founders are pointing to agent quality and improvement as their biggest challenge.
🧾 Final thoughts
A few early signals from this dataset:
- YC companies are already shipping agents to production (duh)
- Most deployments are still early in scale
- Agents are primarily used for automation and structured work, not the unbounded intelligence we’re often promised
- Architectures appear to be converging around hybrid workflow + agent systems, so it’s not as simple as “just prompt a model”
- The most common operational challenge is evaluating and improving agent behavior
In the world of agent evaluation tools, here’s a quick cheat sheet:
Observability tools help engineers debug individual traces.
Evals tools help prevent unintended agent regressions when configurations change.
Agent Analytics tools (Voker) help engineers, product managers, and business leaders measure whether agents are actually helping users and identify performance patterns that need attention.
I plan to run this survey again next year to track how the agent space evolves.
In the meantime, if you’re building an Agent-First product, submit your response to our survey and I’ll send you the full report. We’ll continue updating the analysis as more data comes in.
And of course — if you have agents in production and are struggling to understand or improve agent behavior, let’s talk.

More from the Blog

The Rise of the Agent Engineer
The prompt engineer died so Agent Engineers could thrive


Agent Analytics FAQ
Everything you need to know to start measuring what your AI agent actually does

