Atla Launches Selene: The World’s Most Accurate LLM-as-a-Judge
Atla recently launched Selene!
"Half of AI’s answers are brilliant, half aren’t. Atla trained a model to tell them apart."
TL;DR Meet Selene, a state-of-the-art LLM Judge trained specifically to evaluate AI responses. Selene is the best model on the market for evals, beating all frontier models from leading labs across 11 commonly used benchmarks for evaluators. Atla has announced the release of:
• API/SDK - Integrate Selene into your AI workflow
• Alignment Platform - Build custom evaluation metrics for your use case
Get started using Selene for free.

Watch the demo here.
Founded by Maurice Burger & Roman Engeler
Who The Team Is & Why They’re Building This
Atla is a small, highly technical team of AI researchers and engineers, with folks from leading AI labs and startups. Their mission is to enable the safe development of AGI. As models grow more powerful, a ‘frontier evaluator’ that keeps pace with frontier AI is needed. The team at Atla sees Selene as a stepping stone toward scalable oversight of powerful AI.
The Problem
Generative AI is unpredictable. Even the best models occasionally hallucinate, contradict themselves, or produce unsafe outputs. Many teams rely on the same general-purpose LLMs to evaluate AI outputs, but these models weren’t trained to be judges. That leads to:
- Inaccurate evaluations and inefficient iteration cycles in development.
- Risky, unpredictable AI behavior in production.
The Solution
- A SOTA model for evals: Selene outperforms all frontier models (OpenAI’s o-series, Claude 3.5 Sonnet, DeepSeek R1, etc.) across 11 benchmarks for scoring, classifying, and pairwise comparisons.

- A platform to align the evaluator: Adapt Selene to your exact evaluation criteria—like “detect medical advice,” “flag legal errors,” or “judge whether the agent upgraded its workflow correctly.”
Selene works seamlessly with popular frameworks like DeepEval (YC W25) and Langfuse (YC W23) — just add it to your pipeline. And it runs faster than GPT-4o and Claude 3.5 Sonnet.
Learn More
🌐 Visit www.atla-ai.com to learn more.
🎁 Try Selene for free → Integrate the API into your eval pipeline.
✨ Try the Alignment Platform → Craft a custom eval for your application.
⭐ Discord → Leave feedback, get to know the team, and brainstorm cool ideas.
👣 Follow Atla on LinkedIn & X.
Simplify Startup Finances Today
Take the stress out of bookkeeping, taxes, and tax credits with Fondo’s all-in-one accounting platform built for startups. Start saving time and money with our expert-backed solutions.
Get Started