Mundo AI recently launched!

Launch YC: Mundo AI - High Quality Multilingual Training Data for AI Models

"Non-English datasets up to 10,000x larger than anything open source"

TLDR: AI models are great at English, but struggle with almost every other language. So, Mundo AI is building the world’s largest and highest quality multilingual data library to help AI labs build better non-English models.


Founded by Jason Liao, Kenneth Wu, Naijide Anwaer & Garreth Lee

Jason Liao helped build a record-breaking fraud detection AI model at Tsinghua University. Before that, he led a quant research team at a $60B quant hedge fund.

Kenneth Wu was a quant at Canada’s largest quant fund. Previous roles in SWE at Amazon Web Services and Analyst at the Ontario Teachers’ Pension Plan.

Naijide Anwaer was the youngest Platform PM at Binance US. He speaks 4 languages.

Garreth Lee was an ML engineer and the first Indonesian at Hugging Face, where he helped build the world’s best open pre-training dataset. Previously a member of technical staff at Cohere.

Also shoutout to their founding PM @Ahnaf Muqset Haque

The Story

When Jason was working on AI research abroad, he found that it was incredibly difficult to find training data in non-English languages. Because of this, his peers were all working on English models rather than ones in their native language.

The Problem

After speaking with researchers and entrepreneurs around the world, it became clear to the team that AI usability was dramatically behind in non-English languages - even for major languages like Hindi and Arabic. This is because of the severe shortage of high quality training data in non-English languages. That leaves the 75% of the world that does not speak English out of the AI revolution.

Data has been a major bottleneck for researchers and AI labs building multilingual AI models, and the demand for better and larger datasets is only increasing.

Current workarounds such as synthetic data and machine translation simply don’t achieve the desired results, and open-source efforts fail to produce datasets in the quantity and quality required.

How are they solving this

Mundo AI works directly with native speakers to build and create completely novel and high quality datasets. They do this by setting up end-to-end operations in the country where native speakers of a language reside, and by using their proprietary software platform to streamline data collection, generation, annotation, and quality assurance.

Demo Video

https://www.youtube.com/watch?v=zZiilPrhDJs

Learn More

🌐 Visit mundoai.world to learn more.
🤝 Do you know any researchers or data partnership managers at any AI labs? They would love to get in touch! They are trying to learn as much as they can about the data bottlenecks that are preventing researchers from making progress. Reach the founders here.
👣 Follow Mundo AI on LinkedIn & X.

Posted 
March 14, 2025
 in 
Launch
 category
← Back to all posts  

Join Our Newsletter and Get the Latest
Posts to Your Inbox

No spam ever. Read our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.