Fondo | Mundo AI Launches: High Quality Multilingual Training Data for AI Models

Launch YC: Mundo AI - High Quality Multilingual Training Data for AI Models

‍

^{"Non-English datasets up to 10,000x larger than anything open source"}

‍

^{TLDR: AI models are great at English, but struggle with almost every other language. So,}^{Mundo AI}^{is building the world’s largest and highest quality multilingual data library to help AI labs build better non-English models.}

‍
‍Founded by Jason Liao, Kenneth Wu, Naijide Anwaer & Garreth Lee

Jason Liao helped build a record-breaking fraud detection AI model at Tsinghua University. Before that, he led a quant research team at a $60B quant hedge fund.

Kenneth Wu was a quant at Canada’s largest quant fund. Previous roles in SWE at Amazon Web Services and Analyst at the Ontario Teachers’ Pension Plan.

Naijide Anwaer was the youngest Platform PM at Binance US. He speaks 4 languages.

Garreth Lee was an ML engineer and the first Indonesian at Hugging Face, where he helped build the world’s best open pre-training dataset. Previously a member of technical staff at Cohere.

Also shoutout to their founding PM @Ahnaf Muqset Haque

‍

The Story

When Jason was working on AI research abroad, he found that it was incredibly difficult to find training data in non-English languages. Because of this, his peers were all working on English models rather than ones in their native language.

‍

The Problem

After speaking with researchers and entrepreneurs around the world, it became clear to the team that AI usability was dramatically behind in non-English languages - even for major languages like Hindi and Arabic. This is because of the severe shortage of high quality training data in non-English languages. That leaves the 75% of the world that does not speak English out of the AI revolution.

Data has been a major bottleneck for researchers and AI labs building multilingual AI models, and the demand for better and larger datasets is only increasing.

Current workarounds such as synthetic data and machine translation simply don’t achieve the desired results, and open-source efforts fail to produce datasets in the quantity and quality required.

‍

How are they solving this

Mundo AI works directly with native speakers to build and create completely novel and high quality datasets. They do this by setting up end-to-end operations in the country where native speakers of a language reside, and by using their proprietary software platform to streamline data collection, generation, annotation, and quality assurance.

‍

Demo Video

https://www.youtube.com/watch?v=zZiilPrhDJs

‍

Learn More

‍

^{🌐 Visit}^{mundoai.world}^{to learn more.}

^‍

^🤝^{Do you know any researchers or data partnership managers at any AI labs? They would love to get in touch! They are trying to learn as much as they can about the data bottlenecks that are preventing researchers from making progress. Reach the founders}^here^.

^‍

^‍^‍*^{👣 Follow Mundo AI on}^LinkedIn***^&^X^.

‍

Posted

March 14, 2025

Launch

David J. Phillips

CEO & Founder

View Posts

About The Author

David is the CEO & Founder of Fondo (YC W18). He is an angel investor in Rippling, Flexport, LiquidDeath, and 85+ other startups. David began his career as an accountant at Deloitte before learning to code and becoming a founder. Previously, he was co-founder of Hackbright where 1,000+ software engineers have been trained and placed at tech companies including Slack, Disney, and Uber and was acquired by Capella Education NASDAQ: $CPLA in 2016.

← Back to all posts

Mundo AI Launches: High Quality Multilingual Training Data for AI Models

Save time, money, and run a better startup.

The all-in-one accounting platform for startups. Bookkeeping, taxes, and tax credits on autopilot.

"Non-English datasets up to 10,000x larger than anything open source"

TLDR: AI models are great at English, but struggle with almost every other language. So, Mundo AI is building the world’s largest and highest quality multilingual data library to help AI labs build better non-English models.

The Story

The Problem

How are they solving this

Demo Video

Learn More

🌐 Visit mundoai.world to learn more.

‍

🤝 Do you know any researchers or data partnership managers at any AI labs? They would love to get in touch! They are trying to learn as much as they can about the data bottlenecks that are preventing researchers from making progress. Reach the founders here.

‍

‍‍👣 Follow Mundo AI on LinkedIn & X.

Featured

c/ua Launches: Docker Container for Computer-Use Agents

Quickbooks Cash vs Accrual

Quickbooks Accrual vs Cash

Categories

David J. Phillips

About The Author

Simplify Startup Finances Today

Take the stress out of bookkeeping, taxes, and tax credits with Fondo’s all-in-one accounting platform built for startups. Start saving time and money with our expert-backed solutions.

Simplify Startup Finances Today

Take the stress out of bookkeeping, taxes, and tax credits with Fondo’s all-in-one accounting platform built for startups. Start saving time and money with our expert-backed solutions.

Mundo AI Launches: High Quality Multilingual Training Data for AI Models

"Non-English datasets up to 10,000x larger than anything open source"

TLDR: AI models are great at English, but struggle with almost every other language. So, Mundo AI is building the world’s largest and highest quality multilingual data library to help AI labs build better non-English models.

The Story

The Problem

How are they solving this

Demo Video

Learn More

🌐 Visit mundoai.world to learn more.

‍

🤝 Do you know any researchers or data partnership managers at any AI labs? They would love to get in touch! They are trying to learn as much as they can about the data bottlenecks that are preventing researchers from making progress. Reach the founders here.

‍

‍‍👣 Follow Mundo AI on LinkedIn & X.

David J. Phillips

About The Author

Join Our Newsletter and Get the LatestPosts to Your Inbox

Featured

c/ua Launches: Docker Container for Computer-Use Agents

Quickbooks Cash vs Accrual

Quickbooks Accrual vs Cash

Categories

Newsletter

Save time, money, and run a better startup.

The all-in-one accounting platform for startups. Bookkeeping, taxes, and tax credits on autopilot.

Products

Resources

About

Get started ⚡

^{"Non-English datasets up to 10,000x larger than anything open source"}

^{TLDR: AI models are great at English, but struggle with almost every other language. So,}^{Mundo AI}^{is building the world’s largest and highest quality multilingual data library to help AI labs build better non-English models.}

^{🌐 Visit}^{mundoai.world}^{to learn more.}

^‍

^🤝^{Do you know any researchers or data partnership managers at any AI labs? They would love to get in touch! They are trying to learn as much as they can about the data bottlenecks that are preventing researchers from making progress. Reach the founders}^here^.

^‍

^‍^‍*^{👣 Follow Mundo AI on}^LinkedIn***^&^X^.

^{"Non-English datasets up to 10,000x larger than anything open source"}

^{TLDR: AI models are great at English, but struggle with almost every other language. So,}^{Mundo AI}^{is building the world’s largest and highest quality multilingual data library to help AI labs build better non-English models.}

^{🌐 Visit}^{mundoai.world}^{to learn more.}

^‍

^🤝^{Do you know any researchers or data partnership managers at any AI labs? They would love to get in touch! They are trying to learn as much as they can about the data bottlenecks that are preventing researchers from making progress. Reach the founders}^here^.

^‍

^‍^‍*^{👣 Follow Mundo AI on}^LinkedIn***^&^X^.

Join Our Newsletter and Get the Latest
Posts to Your Inbox