Customise Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorised as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyse the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customised advertisements based on the pages you visited previously and to analyse the effectiveness of the ad campaigns.

No cookies to display.

Home Blog AI Chatbot IISc and Hugging Face Team Up: Revolutionizing AI with Project Vaani in 2025
IISc and Hugging Face Team Up: Revolutionizing AI with Project Vaani in 2025

IISc and Hugging Face Team Up: Revolutionizing AI with Project Vaani in 2025

IISc and Hugging Face Team Up: Revolutionizing AI with Project Vaani in 2025

A Partnership for Inclusive AI

In a groundbreaking move for artificial intelligence, the Indian Institute of Science (IISc) and Hugging Face have joined forces to supercharge Project Vaani, an initiative capturing India’s rich linguistic tapestry. Launched in 2022 with Google’s backing, Vaani aims to build an open-source, multimodal, multilingual dataset that mirrors the diversity of India’s 1.4 billion voices. Now, in 2025, this collaboration with Hugging Face is set to amplify its global reach, empowering developers worldwide to craft AI solutions that resonate with India’s cultural and linguistic mosaic.

Both IISc and Hugging Face share a vision of democratizing AI through open science. By making Vaani accessible on Hugging Face’s platform, they’re breaking barriers, fostering innovation, and ensuring AI speaks the languages of India—from bustling cities to remote villages.

Vaani Unveiled: A Dataset Like No Other

Project Vaani stands out with its geo-centric approach, collecting spontaneous speech and images from 80 districts across India in Phase 1 alone. As of February 2025, it boasts over 16,000 hours of audio from 84,600 speakers, covering 54 languages, with 790 hours transcribed. This isn’t just data—it’s a living archive of dialects, accents, and real-life conversations, paired with 70,000 images for multimodal applications. From Tamil in the south to Assamese in the northeast, Vaani captures India’s linguistic soul.

Hugging Face hosts this treasure trove, offering subsets like transcribed audio for speech recognition and raw data for broader research. It’s a goldmine for building AI that understands India’s diversity, available to anyone with a Hugging Face account and an access token.

Why This Matters for AI Development

India’s 22 official languages and hundreds of dialects pose a unique challenge—and opportunity—for AI. Most language models lean heavily on English or other global tongues, leaving Indic languages underrepresented. Vaani flips the script, providing a robust dataset for training models in speech recognition, language modeling, and even speaker verification. With its vast speaker pool and real-world audio, it’s ideal for creating AI that’s not just smart but inclusive.

The collaboration amplifies this impact. Hugging Face’s platform, with over 1 million models and datasets, ensures Vaani reaches a global audience, sparking research and applications that could transform digital access in India—from education tools to voice assistants.

Beyond Phase 1: The Road Ahead

Phase 1 is just the beginning. IISc and ARTPARK, with Google’s support, have expanded Vaani to Phase 2, covering all Indian states as of February 2025. The goal? Over 150,000 hours of speech, fully transcribed in local scripts, reflecting India’s urban-rural, age, and gender diversity. Hugging Face’s role will grow, hosting new subsets and encouraging community feedback via vaanicontact@gmail.com to refine and expand the project.

This partnership isn’t static—it’s a call to action. Developers, researchers, and innovators are urged to dive in, build with Vaani, and share insights, driving AI that truly serves India’s billion-plus population.

A Vision for the Future

As of March 16, 2025, the IISc-Hugging Face collaboration is a beacon for open-source AI. It’s not just about data—it’s about empowerment, bridging digital divides, and honoring India’s heritage through technology. Whether it’s enhancing multimodal large language models or crafting code-switching speech systems, Vaani and Hugging Face are paving the way for an AI future that’s as diverse as the world it serves.

Add comment

Sign Up to receive the latest updates and news

Newsletter

Bengaluru, Karnataka, India.
Follow our social media
© 2025 Proaitools. All rights reserved.