[sync.] recently launched!

Launch YC: [sync.] we built the most natural lipsync model in the world, again.

"lipsync any video to any audio"

tldr;
  • lipsync-2 is the most advanced video-to-video lipsyncing model in the world
  • It’s zero-shot, so you don’t need to wait for an “actor”, “clone”, or “avatar” to train before using it.
  • Even so, it learns and generates a speaker’s unique style of speech
  • It works across live-action, animated, and AI-generated humans
  • Thousands of developers use it to build video translation, word-level editing of video, and character re-animation workflows today (including generating realistic AI UGC)
  • We’re launching our YC deal, 4mo’s of our scale plan for free plus $1000 in credits 🚀

https://www.youtube.com/watch?v=j5iJ2k05ltc


Founded by
Prady Modukuru, Prajwal K R, Pavan Reddy & Rudrabha Mukhopadhyay

What did they build?

The team built lipsync-2, the first in a new generation of zero-shot lipsyncing models. It seamlessly edits any person's lip movements in a video to match any audio without having to train or be fine-tuned on that person.

Zero-shot lipsync models are versatile because they edit any arbitrary person and voice without having to train or fine-tune on every speaker. But traditionally they can lose traits unique to the person, like their speaking style, skin textures, teeth, etc.

With lipsync-2, they introduce a new capability in zero-shot lipsync: style preservation. They learn a representation of how a person speaks by watching how they speak in the input video. They train a spatiotemporal transformer that encodes the different mouth shapes in the input video into a style representation. A generative transformer synthesizes new mouth movements by conditioning on the new target speech and the learned style representation.

How can you use it?

The team built a simple API that let’s you build workflows around our core lipsyncing models. You submit a video and an audio (or a script and voiceID to generate audio from), and get a response with the final output.

They see thousands of developers and businesses integrating their APIs to build generative video workflows into their products and services.

[1] Video translation

Notice how even across different languages, we preserve the speaking style of Nicolas Cage. They are the first zero-shot lipsyncing model to achieve this.

https://youtu.be/GaCoHy99zT4

They can even handle long videos with multiple speakers — they built a state-of-the-art active speaker detection pipeline that associates a unique voice with a unique face, and only applies lipsync when they detect that person is actively speaking.

https://www.youtube.com/watch?v=ZaXbiKdoBz8

It also works across animated characters, from Pixar-level animations to AI generated characters.

https://www.youtube.com/watch?v=F_6lGFl6bcA

But translation is only the beginning, with the power to edit dialogue in any video in post-production they are on the cusp of reimagining how we create, edit, and consume videos forever.

[2] Record once and edit dialogue to use forever.

https://youtu.be/HJR4BbhZ8Uo

Imagine a world where you only ever have to hit record once. lipsync-2 is the only model that let’s you edit a dialogue while preserving the original speakers style, without needing to train or fine-tune beforehand.

[3] AI video

In an age where we can generate any video by typing a few lines of text, we don’t have to limit ourselves to what we can capture with a camera.

https://youtube.com/shorts/KnzWtu3niKQ

Their YC deal

For any YC company they are giving away their Scale Plan for free for 4 months, plus $1000 to spend on usage.

With the scale plan you get access to up to 15 concurrent jobs processing at once and handle up to 30 minute video at a time — leveraging this maximally you have the ability to generate around ~90 minutes of video per hour every hour.

Launch an AI admaker, video translation tool, or any other content generation workflow you want and serve viral load with speed, reliability, and best-in-class quality.

Email the founders here and they will get you set up.

So why does this matter?

At sync, AI lipsync is just the beginning.

We live in an extraordinary age.

A high schooler can craft a masterpiece with an iPhone. A studio can produce a movie at a tenth of the cost 10x faster. Every video can be distributed worldwide in any language, instantly. Video is becoming as malleable as text.

But we have two fundamental problems to tackle before this is a reality:

[1] Large video models are great at generating entirely new scenes and worlds, but struggle with precise control and fine grained edits. The ability to make subtle, intentional adjustments – the kind that separates good content from great content – doesn’t exist yet.

[2] If video generation is world modeling, each human is a world unto themselves. We each have idiosyncrasies that make us unique — building primitives to capture, express, and modify them with high precision is the key to breaking through the uncanny valley.

sync is excited about lipsync-2, and for what’s coming up next. Reach out to the founders here if you have any questions or are curious about their roadmap.


Learn More

🌐 Visit sync.so to learn more

👥 Follow sync. on LinkedIn & X

Posted 
April 3, 2025
 in 
Launch
 category
← Back to all posts  

Join Our Newsletter and Get the Latest
Posts to Your Inbox

No spam ever. Read our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.