Launch HN: Uplift (YC S25) – Voice models for under-served languages

106 points by zaidqureshi 2 days ago


Hi HN, we are Zaid, Muhammad and Hammad, the co-founders of Uplift AI (https://upliftai.org). We build models that speak underserved languages — today: Urdu, Sindhi, and Balochi.

A billion people worldwide can't read. In countries like Pakistan – the 5th most populous country – 42% of adults are illiterate. This holds back the entire economy: patients can't read medical reports, parents can't help with homework, banks can't go fully digital, farmers can't research best practices, and people memorize smartphone app button sequences. Voice AI interfaces can fix all of this, and we think this will perhaps be one of the great benefits of modern AI.

Right now, existing voice models barely work for these languages, and big tech is moving slowly.

Uplift AI was originally a side project to make datasets for translation and voice models. For us it was a "cool side-thing" to work on, not an "important full-time thing" to work on. With some initial data we hacked together a Urdu Voice Bot on Whatsapp and gave it to one domestic worker. In two days 800 people were using it. When we dived deeper into understanding the users, we learned that text interfaces don't work for sooo many. So we started Uplift AI to solve this problem fulltime.

The most challenging part is that all the building blocks needed for great voice models are broken for these languages. For example, if you are creating a speech synthesis model, you will scrape a lot of data from youtube and auto-label it using a transcription model… all very easy to do in English. But it doesn't work in under-served languages because the transcription modes are not accurate.

There are many other challenges. Like when you hire human transcribers to label the data, often they don't have any spell correctors for their languages, and this creates lots of noise in the data… making it hard to train models with low data. There are many more challenges in phonemes, silence detection, diacritization etc.

We solve these problems by making great internal tooling to help with data labeling. Also, we source our own data and don't buy it. This is counterintuitive, but a big advantage over companies buying data and then training. By sourcing our own data we create the right data distributions and get much better models with much less data. By doing the entire thing inhouse, (data, labeling, training, deploying) we are able to make a lot faster progress.

Today we publicly offer a text to speech APIs for Urdu, Sindhi, and Balochi. Here's a video which shows this: https://www.loom.com/share/dcd5020967444c228e9c127151e7a9f5.

Khan Academy is using our tech to dub videos to Urdu (https://ur.khanacademy.org).

Our models excel at informational use cases (like AI bots) but need more work in emotive use-cases like poetry.

We have been giving a lot of people private access in beta mode, and today are launching our models publicly. We believe this will be the fastest way for us to learn about areas that are not performing well so we can fix them quickly.

We'd love to hear from all of you, especially around your experiences with under-served languages (not just the Pakistani ones we're starting with) and your comments in general.

primitivesuave - a day ago

The output quality is remarkable. You mentioned that there are 1 billion illiterate people who would benefit from this, and I would add that there are at least 1 billion additional people who would benefit because they speak a regional dialect. There are many countries across the developing world where the AI tools and translation apps only produce output in the official government dialect (e.g. the Thai spoken in Bangkok, the Hindi spoken in Delhi, or the Mandarin spoken in Beijing). It would be interesting to see how a voice model could be "fine tuned" to better serve a specific regional dialect.

jnmandal - 2 days ago

Looks really cool, exciting to see. I have two questions around this:

1. Given that you are concerned with providing access a class of folks that are traditionally ignored by technologists, do you plan to make these models usable for offline purposes? For example an illiterate person I know from Uttarkhand: his home village is not connected to road. Interestingly he does speak Hindi, but his native language I believe is something more obscure. To get home, he walks five hours from the terminus of a road. Connectivity is obviously both limited and intermittent. A usable device might want the voice interface embedded on it. Any plans for this?

2. I have minimal understanding of this but as someone who has learned Hindi/Urdu as a foreign language but in the US, I am often in mixed conversation w/ both Indians and Pakistanis. There never seems to be any issues with communication. I have heard that certain terms (like for example "khub suraat", "shukria", "kitaab") are more Urdu than Hindi. I also studied Arabic, Farsi, and Swahili so I am familiar with these as loanwords Arabic and/or Persian, but in practice I hear Hindi speakers using these terms often. Is the primary value add here political? Is it an accent thing? Thanks in advance for any explanation. This is still very much a mystery to me.

_waqas_ali_ - 2 days ago

As a Sindhi speaker myself, amazing stuff. The output is so good. This unlocks the vastness of the internet for millions of people. I am imaging something like NotebookLM but for under-served languages or a hotline where people can call and talk/learn about anything. Do you guys have plans to create b2c products yourself?

tugdual - 6 hours ago

This is what my Master project was about, working in the case of Wolof. I've trained XTTSv2 and had solid results with less than 20h of paired data that wasn't of the highest quality either - hmu: tkerjan@outlook.com

pavlov - 2 days ago

Nice! Clearly a big and underserved market for voice AI solutions.

Would be nice to have some code examples for using your TTS API with Pipecat.

willwade - a day ago

Your datasets. Are they public? For more under represented languages we DONT need closed voice models - what the world really needs is open voice data repositories (eg TTS ready voice banks AND phonemization db in projects like Mozila CommonVoice). Why? Because there is so small need commercially these countries are not commercially viable - but we DO need TTS for assistive technology purposes and this has very little $$$ associated with it

(Saying that Urdu is NOT a small population so well done..!)

nojs - 2 days ago

Nice, this is really needed. Would be cool to see some of the less common regional Chinese dialects, which are widely spoken and often the only language older people speak. And even just more accurate regional accents for Mandarin.

Lienetic - 2 days ago

Very cool, congrats on the launch! What's your plan for when one of the larger players like ElevenLabs or Google adds support for these languages? I would guess the reason why they haven't is because they don't see a large opportunity. How are you thinking about it?

sanman8119 - 2 days ago

Would love to see Malayalam here one day!

aneeqdhk - a day ago

Any plans for speech to text? I want to automatically generate subtitles for videos which have Urdu audio

adz_6891 - 2 days ago

This is really cool. Congrats on the launch. Would be interested to know which low resource languages in Sub-Saharan Africa you'd be working on, particularly in Nigeria and South Africa.

asadm - a day ago

Congrats on launch, I have been sole-funding a dataset for Sindhi on Common Voice. Did you check that out by any chance?

akshayp29 - 2 days ago

Pretty cool! Do you think the model would be good at other under-served languages as well? Or is it hypertuned to just these?

moinism - 2 days ago

Congrats on the launch! Having support for regional voices is going to open up so many opportunities.

Bilal_io - 2 days ago

Congratulations on the launch! I really hope it doesn't get used to launch misinformation campaigns against the country.

Are you aware of any effort to educate and fight against misinformation in Pakistan?

ks2048 - a day ago

Nice work.

Have you looked at the MMS models from Meta and how do they compare?

By publicly release, does that mean offering an API or have you considered huggingface model release? I understand why that might not be best for your business model - but what would be your goal from a business perspective?

- 2 days ago
[deleted]