April 3, 2025

[sync.] launches: The Most Natural Lipsync Model In The World, Again.

David J. Phillips

CEO & Founder

Fondo

April 3, 2025

[sync.] launches: The Most Natural Lipsync Model In The World, Again.

[sync.] recently launched!

Launch YC: [sync.] we built the most natural lipsync model in the world, again.

‍
^{"lipsync any video to any audio"}

‍

^tldr;

^{lipsync-2 is the most advanced video-to-video lipsyncing model in the world}
^{It’s zero-shot, so you don’t need to wait for an “actor”, “clone”, or “avatar” to train before using it.}
^{Even so, it learns and generates a speaker’s unique style of speech}
^{It works across live-action, animated, and AI-generated humans}
^{Thousands of developers use it to build video translation, word-level editing of video, and character re-animation workflows today (including generating realistic AI UGC)}
^{We’re launching our YC deal, 4mo’s of our scale plan for free plus $1000 in credits 🚀}

https://www.youtube.com/watch?v=j5iJ2k05ltc

Founded by Prady Modukuru, Prajwal K R, Pavan Reddy & Rudrabha Mukhopadhyay

‍

What did they build?

The team built lipsync-2, the first in a new generation of zero-shot lipsyncing models. It seamlessly edits any person's lip movements in a video to match any audio without having to train or be fine-tuned on that person.

Zero-shot lipsync models are versatile because they edit any arbitrary person and voice without having to train or fine-tune on every speaker. But traditionally they can lose traits unique to the person, like their speaking style, skin textures, teeth, etc.

With lipsync-2, they introduce a new capability in zero-shot lipsync: style preservation. They learn a representation of how a person speaks by watching how they speak in the input video. They train a spatiotemporal transformer that encodes the different mouth shapes in the input video into a style representation. A generative transformer synthesizes new mouth movements by conditioning on the new target speech and the learned style representation.

‍

How can you use it?

The team built a simple API that let’s you build workflows around our core lipsyncing models. You submit a video and an audio (or a script and voiceID to generate audio from), and get a response with the final output.

They see thousands of developers and businesses integrating their APIs to build generative video workflows into their products and services.

‍

[1] Video translation

Notice how even across different languages, we preserve the speaking style of Nicolas Cage. They are the first zero-shot lipsyncing model to achieve this.

https://youtu.be/GaCoHy99zT4

‍

They can even handle long videos with multiple speakers — they built a state-of-the-art active speaker detection pipeline that associates a unique voice with a unique face, and only applies lipsync when they detect that person is actively speaking.

https://www.youtube.com/watch?v=ZaXbiKdoBz8

‍

It also works across animated characters, from Pixar-level animations to AI generated characters.

https://www.youtube.com/watch?v=F_6lGFl6bcA

‍

But translation is only the beginning, with the power to edit dialogue in any video in post-production they are on the cusp of reimagining how we create, edit, and consume videos forever.

‍

[2] Record once and edit dialogue to use forever.

https://youtu.be/HJR4BbhZ8Uo

‍

Imagine a world where you only ever have to hit record once. lipsync-2 is the only model that let’s you edit a dialogue while preserving the original speakers style, without needing to train or fine-tune beforehand.

‍

[3] AI video

In an age where we can generate any video by typing a few lines of text, we don’t have to limit ourselves to what we can capture with a camera.

https://youtube.com/shorts/KnzWtu3niKQ

‍

Their YC deal

For any YC company they are giving away their Scale Plan for free for 4 months, plus $1000 to spend on usage.

With the scale plan you get access to up to 15 concurrent jobs processing at once and handle up to 30 minute video at a time — leveraging this maximally you have the ability to generate around ~90 minutes of video per hour every hour.

Launch an AI admaker, video translation tool, or any other content generation workflow you want and serve viral load with speed, reliability, and best-in-class quality.

Email the founders here and they will get you set up.

‍

So why does this matter?

At sync, AI lipsync is just the beginning.

We live in an extraordinary age.

A high schooler can craft a masterpiece with an iPhone. A studio can produce a movie at a tenth of the cost 10x faster. Every video can be distributed worldwide in any language, instantly. Video is becoming as malleable as text.

But we have two fundamental problems to tackle before this is a reality:

[1] Large video models are great at generating entirely new scenes and worlds, but struggle with precise control and fine grained edits. The ability to make subtle, intentional adjustments – the kind that separates good content from great content – doesn’t exist yet.

[2] If video generation is world modeling, each human is a world unto themselves. We each have idiosyncrasies that make us unique — building primitives to capture, express, and modify them with high precision is the key to breaking through the uncanny valley.

sync is excited about lipsync-2, and for what’s coming up next. Reach out to the founders here if you have any questions or are curious about their roadmap.

‍

‍
‍Learn More

_{🌐 Visit}_sync.so_{to learn more
‍}_‍_‍_{👥 Follow}_{sync. on}_LinkedIn_&_X

‍

[sync.] launches: The Most Natural Lipsync Model In The World, Again.

Beam Launches: The Real-time Copilot for CX Agents

Arpari Launches: Multibank Treasury Management

David J. Phillips

CEO & Founder

About The Author

David is the CEO & Founder of Fondo (YC W18). He is an angel investor in Rippling, Flexport, LiquidDeath, and 85+ other startups. David began his career as an accountant at Deloitte before learning to code and becoming a founder. Previously, he was co-founder of Hackbright where 1,000+ software engineers have been trained and placed at tech companies including Slack, Disney, and Uber and was acquired by Capella Education NASDAQ: $CPLA in 2016.

[sync.] recently launched!

‍
^{"lipsync any video to any audio"}

‍

^tldr;

^{lipsync-2 is the most advanced video-to-video lipsyncing model in the world}
^{It’s zero-shot, so you don’t need to wait for an “actor”, “clone”, or “avatar” to train before using it.}
^{Even so, it learns and generates a speaker’s unique style of speech}
^{It works across live-action, animated, and AI-generated humans}
^{Thousands of developers use it to build video translation, word-level editing of video, and character re-animation workflows today (including generating realistic AI UGC)}
^{We’re launching our YC deal, 4mo’s of our scale plan for free plus $1000 in credits 🚀}

https://www.youtube.com/watch?v=j5iJ2k05ltc

Founded by Prady Modukuru, Prajwal K R, Pavan Reddy & Rudrabha Mukhopadhyay

‍

What did they build?

‍

How can you use it?

They see thousands of developers and businesses integrating their APIs to build generative video workflows into their products and services.

‍

[1] Video translation

Notice how even across different languages, we preserve the speaking style of Nicolas Cage. They are the first zero-shot lipsyncing model to achieve this.

https://youtu.be/GaCoHy99zT4

‍

https://www.youtube.com/watch?v=ZaXbiKdoBz8

‍

It also works across animated characters, from Pixar-level animations to AI generated characters.

https://www.youtube.com/watch?v=F_6lGFl6bcA

‍

But translation is only the beginning, with the power to edit dialogue in any video in post-production they are on the cusp of reimagining how we create, edit, and consume videos forever.

‍

[2] Record once and edit dialogue to use forever.

https://youtu.be/HJR4BbhZ8Uo

‍

[3] AI video

In an age where we can generate any video by typing a few lines of text, we don’t have to limit ourselves to what we can capture with a camera.

https://youtube.com/shorts/KnzWtu3niKQ

‍

Their YC deal

For any YC company they are giving away their Scale Plan for free for 4 months, plus $1000 to spend on usage.

Launch an AI admaker, video translation tool, or any other content generation workflow you want and serve viral load with speed, reliability, and best-in-class quality.

Email the founders here and they will get you set up.

‍

So why does this matter?

At sync, AI lipsync is just the beginning.

We live in an extraordinary age.

But we have two fundamental problems to tackle before this is a reality:

sync is excited about lipsync-2, and for what’s coming up next. Reach out to the founders here if you have any questions or are curious about their roadmap.

‍

‍
‍Learn More

_{🌐 Visit}_sync.so_{to learn more
‍}_‍_‍_{👥 Follow}_{sync. on}_LinkedIn_&_X

‍

Posted

April 3, 2025

Launch

David J. Phillips

CEO & Founder

View Posts

About The Author

← Back to all posts

[sync.] launches: The Most Natural Lipsync Model In The World, Again.

Save time, money, and run a better startup.

The all-in-one accounting platform for startups. Bookkeeping, taxes, and tax credits on autopilot.

‍"lipsync any video to any audio"

tldr;

What did they build?

How can you use it?

[1] Video translation

[2] Record once and edit dialogue to use forever.

[3] AI video

Their YC deal

So why does this matter?

‍‍Learn More

🌐 Visit sync.so to learn more‍‍‍👥 Follow sync. on LinkedIn & X

Featured

[sync.] launches: The Most Natural Lipsync Model In The World, Again.

Beam Launches: The Real-time Copilot for CX Agents

Arpari Launches: Multibank Treasury Management

Categories

David J. Phillips

About The Author

Simplify Startup Finances Today

Take the stress out of bookkeeping, taxes, and tax credits with Fondo’s all-in-one accounting platform built for startups. Start saving time and money with our expert-backed solutions.

Simplify Startup Finances Today

Take the stress out of bookkeeping, taxes, and tax credits with Fondo’s all-in-one accounting platform built for startups. Start saving time and money with our expert-backed solutions.

[sync.] launches: The Most Natural Lipsync Model In The World, Again.

‍"lipsync any video to any audio"

tldr;

What did they build?

How can you use it?

[1] Video translation

[2] Record once and edit dialogue to use forever.

[3] AI video

Their YC deal

So why does this matter?

‍‍Learn More

🌐 Visit sync.so to learn more‍‍‍👥 Follow sync. on LinkedIn & X

David J. Phillips

About The Author

Join Our Newsletter and Get the LatestPosts to Your Inbox

Featured

[sync.] launches: The Most Natural Lipsync Model In The World, Again.

Beam Launches: The Real-time Copilot for CX Agents

Arpari Launches: Multibank Treasury Management

Categories

Newsletter

Save time, money, and run a better startup.

The all-in-one accounting platform for startups. Bookkeeping, taxes, and tax credits on autopilot.

Products

Resources

About

Get started ⚡

‍
^{"lipsync any video to any audio"}

^tldr;

‍
‍Learn More

_{🌐 Visit}_sync.so_{to learn more
‍}_‍_‍_{👥 Follow}_{sync. on}_LinkedIn_&_X

‍
^{"lipsync any video to any audio"}

^tldr;

‍
‍Learn More

_{🌐 Visit}_sync.so_{to learn more
‍}_‍_‍_{👥 Follow}_{sync. on}_LinkedIn_&_X

Join Our Newsletter and Get the Latest
Posts to Your Inbox