Atla recently launched Selene!

Launch YC: Selene - The World’s Most Accurate LLM-as-a-Judge

"Half of AI’s answers are brilliant, half aren’t. Atla trained a model to tell them apart."

TL;DR Meet Selene, a state-of-the-art LLM Judge trained specifically to evaluate AI responses. Selene is the best model on the market for evals, beating all frontier models from leading labs across 11 commonly used benchmarks for evaluators. Atla has announced the release of:

• API/SDK - Integrate Selene into your AI workflow
• Alignment Platform - Build custom evaluation metrics for your use case

Get started using Selene for free.

Image Credits: Atla

Watch the demo here.

Founded by Maurice Burger & Roman Engeler

Who The Team Is & Why They’re Building This

Atla is a small, highly technical team of AI researchers and engineers, with folks from leading AI labs and startups. Their mission is to enable the safe development of AGI. As models grow more powerful, a ‘frontier evaluator’ that keeps pace with frontier AI is needed. The team at Atla sees Selene as a stepping stone toward scalable oversight of powerful AI.

The Problem

Generative AI is unpredictable. Even the best models occasionally hallucinate, contradict themselves, or produce unsafe outputs. Many teams rely on the same general-purpose LLMs to evaluate AI outputs, but these models weren’t trained to be judges. That leads to:

  • Inaccurate evaluations and inefficient iteration cycles in development.
  • Risky, unpredictable AI behavior in production.

The Solution

  • A SOTA model for evals: Selene outperforms all frontier models (OpenAI’s o-series, Claude 3.5 Sonnet, DeepSeek R1, etc.) across 11 benchmarks for scoring, classifying, and pairwise comparisons.
Image Credits: Atla

  • A platform to align the evaluator: Adapt Selene to your exact evaluation criteria—like “detect medical advice,” “flag legal errors,” or “judge whether the agent upgraded its workflow correctly.”
Animated GIF

Selene works seamlessly with popular frameworks like DeepEval (YC W25) and Langfuse (YC W23) — just add it to your pipeline. And it runs faster than GPT-4o and Claude 3.5 Sonnet.

Learn More

🌐 Visit www.atla-ai.com to learn more.

🎁 Try Selene for free → Integrate the API into your eval pipeline.
Try the Alignment Platform → Craft a custom eval for your application.
⭐  Discord → Leave feedback, get to know the team, and brainstorm cool ideas.
👣 Follow Atla on LinkedIn & X.

Posted 
March 14, 2025
 in 
Launch
 category
← Back to all posts  

Join Our Newsletter and Get the Latest
Posts to Your Inbox

No spam ever. Read our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.