Fondo | Atla Launches Selene: The World’s Most Accurate LLM-as-a-Judge

Atla recently launched Selene!

Launch YC: Selene - The World’s Most Accurate LLM-as-a-Judge

^‍

^{"Half of AI’s answers are brilliant, half aren’t. Atla trained a model to tell them apart."}

‍

^{TL;DR Meet Selene, a state-of-the-art LLM Judge trained specifically to evaluate AI responses. Selene is the best model on the market for evals, beating all frontier models from leading labs across 11 commonly used benchmarks for evaluators.}*^Atla^{has announced the release of:}***

‍

^{• API/SDK - Integrate Selene into your AI workflow}

^{• Alignment Platform - Build custom evaluation metrics for your use case}

‍

Get started using Selene for free.

‍

Watch the demo here.

‍

‍Founded by Maurice Burger & Roman Engeler‍

‍

Who The Team Is & Why They’re Building This

Atla is a small, highly technical team of AI researchers and engineers, with folks from leading AI labs and startups. Their mission is to enable the safe development of AGI. As models grow more powerful, a ‘frontier evaluator’ that keeps pace with frontier AI is needed. The team at Atla sees Selene as a stepping stone toward scalable oversight of powerful AI.

‍

The Problem

Generative AI is unpredictable. Even the best models occasionally hallucinate, contradict themselves, or produce unsafe outputs. Many teams rely on the same general-purpose LLMs to evaluate AI outputs, but these models weren’t trained to be judges. That leads to:

Inaccurate evaluations and inefficient iteration cycles in development.
Risky, unpredictable AI behavior in production.

‍

The Solution

A SOTA model for evals: Selene outperforms all frontier models (OpenAI’s o-series, Claude 3.5 Sonnet, DeepSeek R1, etc.) across 11 benchmarks for scoring, classifying, and pairwise comparisons.

‍

A platform to align the evaluator: Adapt Selene to your exact evaluation criteria—like “detect medical advice,” “flag legal errors,” or “judge whether the agent upgraded its workflow correctly.”

‍

Selene works seamlessly with popular frameworks like DeepEval (YC W25) and Langfuse (YC W23) — just add it to your pipeline. And it runs faster than GPT-4o and Claude 3.5 Sonnet.

‍

Learn More

‍

^{🌐 Visit}^{www.atla-ai.com}^{to learn more.}

‍

^🎁*^{Try Selene for free}^{→ Integrate the API into your eval pipeline.}***

^‍

^✨*^{Try the Alignment Platform}^{→ Craft a custom eval for your application.}***

^‍

^⭐*^Discord^{→ Leave feedback, get to know the team, and brainstorm cool ideas.}***

^‍

^‍^‍*^{👣 Follow Atla on}^LinkedIn***^&^X^.

^‍

Posted

March 14, 2025

Launch

David J. Phillips

CEO & Founder

View Posts

About The Author

David is the CEO & Founder of Fondo (YC W18). He is an angel investor in Rippling, Flexport, LiquidDeath, and 85+ other startups. David began his career as an accountant at Deloitte before learning to code and becoming a founder. Previously, he was co-founder of Hackbright where 1,000+ software engineers have been trained and placed at tech companies including Slack, Disney, and Uber and was acquired by Capella Education NASDAQ: $CPLA in 2016.

← Back to all posts

Atla Launches Selene: The World’s Most Accurate LLM-as-a-Judge

Save time, money, and run a better startup.

The all-in-one accounting platform for startups. Bookkeeping, taxes, and tax credits on autopilot.

"Half of AI’s answers are brilliant, half aren’t. Atla trained a model to tell them apart."

TL;DR Meet Selene, a state-of-the-art LLM Judge trained specifically to evaluate AI responses. Selene is the best model on the market for evals, beating all frontier models from leading labs across 11 commonly used benchmarks for evaluators. Atla has announced the release of:

• API/SDK - Integrate Selene into your AI workflow

• Alignment Platform - Build custom evaluation metrics for your use case

Who The Team Is & Why They’re Building This

The Problem

The Solution

Learn More

🌐 Visit www.atla-ai.com to learn more.

🎁 Try Selene for free → Integrate the API into your eval pipeline.

‍

✨ Try the Alignment Platform → Craft a custom eval for your application.

‍

⭐ Discord → Leave feedback, get to know the team, and brainstorm cool ideas.

‍

‍‍👣 Follow Atla on LinkedIn & X.

Featured

c/ua Launches: Docker Container for Computer-Use Agents

Quickbooks Cash vs Accrual

Quickbooks Accrual vs Cash

Categories

David J. Phillips

About The Author

Simplify Startup Finances Today

Take the stress out of bookkeeping, taxes, and tax credits with Fondo’s all-in-one accounting platform built for startups. Start saving time and money with our expert-backed solutions.

Simplify Startup Finances Today

Take the stress out of bookkeeping, taxes, and tax credits with Fondo’s all-in-one accounting platform built for startups. Start saving time and money with our expert-backed solutions.

Atla Launches Selene: The World’s Most Accurate LLM-as-a-Judge

"Half of AI’s answers are brilliant, half aren’t. Atla trained a model to tell them apart."

TL;DR Meet Selene, a state-of-the-art LLM Judge trained specifically to evaluate AI responses. Selene is the best model on the market for evals, beating all frontier models from leading labs across 11 commonly used benchmarks for evaluators. Atla has announced the release of:

• API/SDK - Integrate Selene into your AI workflow

• Alignment Platform - Build custom evaluation metrics for your use case

Who The Team Is & Why They’re Building This

The Problem

The Solution

Learn More

🌐 Visit www.atla-ai.com to learn more.

🎁 Try Selene for free → Integrate the API into your eval pipeline.

‍

✨ Try the Alignment Platform → Craft a custom eval for your application.

‍

⭐ Discord → Leave feedback, get to know the team, and brainstorm cool ideas.

‍

‍‍👣 Follow Atla on LinkedIn & X.

David J. Phillips

About The Author

Join Our Newsletter and Get the LatestPosts to Your Inbox

Featured

c/ua Launches: Docker Container for Computer-Use Agents

Quickbooks Cash vs Accrual

Quickbooks Accrual vs Cash

Categories

Newsletter

Save time, money, and run a better startup.

The all-in-one accounting platform for startups. Bookkeeping, taxes, and tax credits on autopilot.

Products

Resources

About

Get started ⚡

^{"Half of AI’s answers are brilliant, half aren’t. Atla trained a model to tell them apart."}

^{TL;DR Meet Selene, a state-of-the-art LLM Judge trained specifically to evaluate AI responses. Selene is the best model on the market for evals, beating all frontier models from leading labs across 11 commonly used benchmarks for evaluators.}*^Atla^{has announced the release of:}***

^{• API/SDK - Integrate Selene into your AI workflow}

^{• Alignment Platform - Build custom evaluation metrics for your use case}

^{🌐 Visit}^{www.atla-ai.com}^{to learn more.}

^🎁*^{Try Selene for free}^{→ Integrate the API into your eval pipeline.}***

^‍

^✨*^{Try the Alignment Platform}^{→ Craft a custom eval for your application.}***

^‍

^⭐*^Discord^{→ Leave feedback, get to know the team, and brainstorm cool ideas.}***

^‍

^‍^‍*^{👣 Follow Atla on}^LinkedIn***^&^X^.

^{"Half of AI’s answers are brilliant, half aren’t. Atla trained a model to tell them apart."}

^{TL;DR Meet Selene, a state-of-the-art LLM Judge trained specifically to evaluate AI responses. Selene is the best model on the market for evals, beating all frontier models from leading labs across 11 commonly used benchmarks for evaluators.}*^Atla^{has announced the release of:}***

^{• API/SDK - Integrate Selene into your AI workflow}

^{• Alignment Platform - Build custom evaluation metrics for your use case}

^{🌐 Visit}^{www.atla-ai.com}^{to learn more.}

^🎁*^{Try Selene for free}^{→ Integrate the API into your eval pipeline.}***

^‍

^✨*^{Try the Alignment Platform}^{→ Craft a custom eval for your application.}***

^‍

^⭐*^Discord^{→ Leave feedback, get to know the team, and brainstorm cool ideas.}***

^‍

^‍^‍*^{👣 Follow Atla on}^LinkedIn***^&^X^.

Join Our Newsletter and Get the Latest
Posts to Your Inbox