"Streamline your chat and voice assistant development with CI/CD."
Founded by Brooke Hopkins
Teams are racing to market with AI agents, but slow manual testing processes are holding them back. Engineers currently spend hours manually evaluating and playing whack-a-mole just to discover that fixing one issue introduces another.
At Coval, they build automated simulation and evaluation for AI agents inspired by the autonomous vehicle industry to boost test coverage, speed up development, and validate consistent performance.
They have a waitlist, but YC companies go first! Grab some time here: https://bit.ly/coval-demo
Coval's Story
Before starting Coval, Brooke led the evaluation job infrastructure team at Waymo. She coded the first versions of their dataset storage and other foundational simulation systems, and her team built all of the dev tools for launching and running evals.
Through her conversations with hundreds of engineering teams at startups and enterprises, Brooke has seen that AI agents—models that operate independently and handle complex tasks—are facing similar challenges to those in self-driving.
In the early days, autonomous vehicle companies relied heavily on manual evaluation, testing the self-driving cars on racetracks and city streets (remember when autonomous cars still had safety drivers?). However, as startups scaled, a significant shift happened: they moved towards simulating every code change in a “virtual” environment, using the vast amounts of data they collected. The new approach dramatically improved vehicle behavior, leading to hundreds of autonomous cars zipping around the San Francisco streets today!
This story mirrors what's happening today with AI agents across various industries. Teams are coming up with promising prototypes but often hit a wall when it comes to their reliability.
Building for the future, where AI agents execute much of our work, ranging from sending emails to prescribing medication, the risks posed by untested systems could severely throttle the progress.
At Waymo, Brooke developed tools that tested each code modification made by engineers, ensuring that every change improved the Waymo Driver's performance. She believes this methodical approach was key in helping the team address edge cases and maintain peak performance, and it ultimately cemented Waymo's status as a leader in the autonomous vehicle space.
Now, at Coval, they’re taking these proven strategies and adapting them in a completely new way to speed up the development of AI agents. The goal is to help engineers build agent experiences that genuinely work for users in the real world.
Automated simulation and evaluation are critical to trusting agents with impactful tasks across industries.