min read

Joanna Graiver

Investment Associate

The state of agentic AI in 2025: what’s working, what isn’t, and what’s next

At the start of the year, “2025: The Year of Agents” sounded like headline bait. Eight months in, it feels like an understatement. Sam Altman said agents would “join the workforce” in 2025. Satya Nadella expects them to replace segments of knowledge work. Marc Benioff wants Salesforce to be the “#1 provider of digital labour.” That’s not future tense anymore - today, agents are moving tickets, shipping code, and digesting documents.

What actually changed this year? A handful of innovations did the heavy lifting: vibe coding agents, browser use, stronger reasoning models, deep-research agents, and models that excel at agentic benchmarks. Together, they have unlocked billions of dollars in enterprise value, both in revenue for the companies building them and in productivity gains for the businesses deploying them.

The wins so far have clear similarities: agents thrive when tasks are structured, data is rich, and results are easy to judge. Beyond that, things get harder. Agents still struggle with nuance, real-world conditions, and the enterprise demand for reliability and oversight.

To understand what’s working, what isn’t, and what’s next, we will explore:

A framework for measuring agent utility: what have been the most compelling and successful use cases so far, and why have they worked?
Agentic limitations - looking past the hype: where have AI agents have fallen behind expectations or struggled to gain traction, and why?
The next frontier: what breakthroughs are required to unlock tougher domains, such as robotics and real-world decision-making?

‍

A framework for measuring agent utility

Some agents we expected to thrive in 2025 did, but others did not. Why?

The use cases where AI agents dominated had these three traits in common:

Ease of task verification - the outputs were straightforward to check. For example, reasoning models began with math and coding, where exact answers could be quickly verified and replicated.
Rich and abundant data - the models were trained on large amounts of high-quality data, much of it publicly available and already curated.
Clarity of scope - the tasks were well-defined and tightly bounded, meaning real-world data wasn’t too different from the training data.

Perhaps the sector that showcases this success the most is software development. Ironically, computer scientists have long joked that “the first thing to get automated is the domain we know best” - coding.

‍

Coding automation

When I was deciding what to study at university, computer science felt like a safe bet - a stable path with strong career prospects. Over the last few months, however, The New York Times and Wall Street Journal reported that unemployment among CS graduates in the US has risen to 6.1% - more than twice the rate for biology or art history majors.

Whilst I do believe (and hope) this is just a temporary readjustment for CS grads, it’s been striking to see the positive impact this technology has had on so many people. New tools like Cursor and Lovable have enabled those with little to no background in software engineering to build apps and platforms with near-native proficiency - at a fraction of the time, cost, and resources.

It’s no surprise, then, that Cursor was one of the fastest companies in history to reach $100M ARR, and surged past $500M ARR this June. Lovable isn’t far behind - the company claims to power over 10% of all new websites monthly and is projected to hit $250M ARR by year-end.

They have been able to develop extremely powerful and reliable AI coding agents because:

Training data for the models is plentiful: GitHub repos, documentation, code review history, and compiler output all offer dense, structured data.
Verifiability is high: success can be measured via test outcomes, performance metrics, or bug reduction.
Feedback loops are rapid: errors surface quickly, enabling fast iteration.

‍

Customer service & revenue workflows

Another clear success story for agentic AI is in customer service, where agents are replacing brittle chatbots with context-aware support. Notably, Gartner predicts that by 2029, agentic AI will autonomously resolve 80% of common customer issues without human intervention. These systems don’t respond; they resolve.

This is a KPI-rich and high-volume sector where performance is judged on resolution time, customer satisfaction scores, and retention, making it easier to verify good from bad. It’s no surprise these workflows have transitioned rapidly from rule-based logic to agentic orchestration.

Parloa is a great example of this. The German startup recently achieved unicorn status after raising $120M from General Catalyst, EQT, and others. Their AI platform is used regularly by multiple Fortune 200 companies to automate and scale enterprise customer communications across channels, such as email, chat, and instant messaging.

‍

Deep research and document-centric work

Sectors that rely on structured document analysis, particularly in industries that are highly regulated and have substantial text-based documentation, such as law, healthcare, and finance, have also been reaping the benefits of AI agents. These systems can ingest, synthesise, and analyse large volumes of text, delivering faster, more consistent insights for professionals working across these functions.

This trend has been especially evident with the meteoric rise of legal tech startups, which have seen huge funding rounds over the past year. US-based Harvey, for example, raised a $300M Series E round led by Kleiner Perkins (having only been founded three years ago), and here in Europe, Legora (formerly Leya) raised $80M from Iconiq and General Catalyst. Startups like these - alongside newer players like Flank and Wordsmith - are transforming legal operations by doing much of what junior associates used to do, faster, at scale, and with ever-improving reliability. Looking ahead, AI2027 predicts that by early 2026, we will see the rise of research-optimised agents capable of accelerating AI innovation itself, producing R&D outcomes faster and more reliably than human teams. This next wave could enable breakthroughs not just in legal and compliance work, but in science, medicine, and engineering.

Enterprise knowledge and analytics agents are also gaining steam. Platforms like Glean, which reached $100M ARR by early 2025 and now supports 100 million agent actions annually, are enabling teams to turn internal data into dashboards, graphs, personalized insights, and automated workflows. Similarly, AI assistant startups like Onyx - which connects across Google Drive, Slack, Salesforce, GitHub, and more - raised a $10M seed round by promising insight-driven enterprise search with lower hallucination rates. And in workflow automation, n8n (backed by Felicis, Sequoia, and others) has grown its community to more than 100,000 active users monthly, positioning itself as a backbone for orchestrating multi-agent systems across enterprise stacks.

‍

Mapping what’s to come

As we said earlier, most successful AI agents to date share three qualities:

Outputs are highly verifiable
There is abundant, high-quality training data
Tasks are well-defined and bounded

But what happens when we move away from these conditions? To make sense of which have thrived and which have further to go, we mapped agentic use cases across two dimensions:

X-Axis: Accessibility of Training Data: volume, quality, structure, and historical labels that show “what good looks like.”

Y-Axis: Verifiability of Outcomes: how cheaply and confidently we can score correctness or business impact after each action.

‍

Alongside this framework, it’s helpful to see the companies driving real progress. Below is a market map of some of the most promising startups building with agentic AI today.

‍

Agent limitations - looking past the hype

While momentum is strong, unlocking the next level of agentic performance across all industries will require solving several foundational challenges:

Lack of human behavioural data

Agents need a deep understanding of user context, decision-making processes, and nuanced personal preferences -the kinds of cues humans absorb intuitively but that rarely exist in structured data. This information is essential for building agents that can handle open-ended, personalised tasks like scheduling, triaging inboxes, or anticipating intent. Startups like Perplexity and Strawberry are experimenting with browsers that passively capture ambient signals across workflows, creating a continuous feed of context. Big tech is moving the same way: OpenAI is reportedly working with Sir Jony Ive on a hardware device designed for ambient assistance, and Apple’s recent product updates point toward deeper on-device intelligence that can tap into personal context. Together, these moves signal a coming shift from reactive tools to truly situationally aware agents that live alongside the user.

‍

Limited verifiability

In high-stakes domains like healthcare or finance, we often lack a consistent, objective ground truth. Was a doctor’s diagnosis truly the optimal one, or just the most likely given incomplete information? Did a financial advisor’s recommendation maximise long-term outcomes, or merely align with short-term constraints? Even if human experts agree on an action, we rarely store that judgment in a structured, queryable way, let alone capture the full reasoning chain behind it. Without reliable labelled preference data or clear performance metrics, it becomes nearly impossible to evaluate, reward, or improve an agent’s decision-making. The result: progress stalls, models overfit to proxy metrics, and “success” gets defined by what’s easiest to measure rather than what actually matters. This also applies to broader domains that require a level of subjectivity. For example, how do we consistently decide what the best writing style looks like?

‍

Real-world limitations:

In the bottom-left of our matrix sits embodied AI. Robotics remains limited by the physical world.

Unlike the internet, this data is scarce, environments vary, and verification requires time, sensors, and often human safety protocols. These workflows are evolving, and advances in simulation, sensor tech, and low-cost robotics could unlock new forms of agentic intelligence in physical environments.

‍

Trust, hallucinations, and regulation:

Enterprises need guarantees, not just capabilities. Startups like Alinia, Cala AI, and LatticeFlow are tackling hallucination mitigation, model alignment, and secure deployment pipelines. In regulated industries, especially, agents must be compliant, traceable, and safe by design. The future here lies in hybrid models that combine automation with human judgment, bridging trust and scale.

‍

The next frontier

Agentic AI is not a passing trend - it’s redefining how businesses are built. We’re moving from prompt-based interfaces to truly collaborative, goal-driven systems.

The early success stories show what’s possible when data quality and task clarity intersect. However, unlocking the next generation of AI agents will require deeper changes.

Going forward, we expect several trends to define the second half of 2025 and beyond:

Vertical agents will go deeper. Domain-specific agents will increasingly demonstrate real, measurable ROI by combining LLMs with proprietary knowledge and workflows.

Ambient data capture will evolve from an enabler to a category-defining advantage. The next breakthroughs in personalisation and proactivity will come from systems that learn passively through observation - be it across browser sessions, documents, calendars, or even conversations stored in long-term memory.

Agent orchestration frameworks will evolve from novelty to necessity. The most advanced use cases will rely on teams of agents working in coordination, governed by high-level goals, and connected across the tools enterprises already use. A key enabler here is the rise of interoperability standards like Anthropic’s Model Context Protocol (MCP), which allows agents to plug directly into thousands of applications without bespoke integrations. Startups such as LangChain, CrewAI, and n8n are already experimenting with frameworks that let multiple agents collaborate and automate end-to-end workflows. As these orchestration layers mature, we’ll move from simple tasks to complex, multi-agent systems that can span entire organisations - bringing both new capabilities and new challenges around reliability, security, and governance

Enterprise AI procurement will shift from exploration to platforms. Rather than experimenting with LLM wrappers, companies will demand robust, modular agent platforms with strong security, observability, and integration standards.

‍

It’s been an extraordinary year for AI agents. While there’s still plenty to solve, the speed of progress makes it hard to imagine these changes are far off. The real question now is how we choose to shape them: will agents become partners we trust, labour we delegate, or something altogether different?

If you’re a founder building in this space, or an investor looking to dive deeper - get in touch! I’d love to connect.

The state of agentic AI in 2025: what’s working, what isn’t, and what’s next

A framework for measuring agent utility

Agent limitations - looking past the hype

The next frontier

How to manage high-performance individuals in 2025

Enterprise automation market industry update: automation after GenAI

What we’ve learned from the front lines of AI-enabled services

Quantum computing in Europe: lessons from IQM and OpenOcean

News from our network

What founders get wrong about speed and scale

Quantum computing in Europe: lessons from IQM and OpenOcean

What we’ve learned from the front lines of AI-enabled services