AI

In a world where AI is synonymous with power hungry GPU compute and web scale data needs, can India—a nation of 1.4 billion voices, 22 official languages, and relentless jugaad—forge a smarter path? The answer lies not in chasing Silicon Valley’s playbook, but in reimagining intelligence itself.

AI is the dawn of the second machine age. We are moving on from intelligent human beings operating dumb machines to intelligent machines doing the work for humans. Asimov’s “Three Laws of Robotics” and the Turing test are not enough anymore.

Winners in AI are being defined by a mix of GPU Access, amount of data scraped, research done and models created before regulation kicks in, and a head-start in the market. LLMs need lots of data, and because of this, data is often scraped, even without the permission of the creator. OpenAI and others have shown one path of AI, which involves scraping data from the Internet, and using cheap labor to collect and clean data. Mega corporations who have created LLMs are also pushing for regulation so that it becomes difficult for others to create AI systems.  LLMs work mostly for English, making that the default language of communication with the AI. 

Above all, they’ve demonstrated that using brute force with lots of GPU compute, because LLMs use a lot of power for training and inference, is important for building LLMs. Because of this, GPU access, especially access to NVDIA GPUs is defining the winners. Even governments are hoarding GPUs or creating an embargo for GPUs.

The goal of established AI companies appears to be to make people believe that: 

1. This is the only way to build AI, by scraping data and using brute force

2. No one else can build AI because they have all the compute and data.

They are wrong on both counts.

There are many other ways, and better ways, with which AI can be built by the people and for the people. To understand how, let us first understand what AI systems actually do: Stephen Wolfram has clearly explained the process here. It boils down to one thing.

It’s Just Adding One Word at a Time

Transformer architectures have shown that by feeding lots of data, this one task of predicting the next word is enough to build AI systems that can do so many things.

Nobody expected this would work so well. But it did. Now we know predicting the next word is enough. But what we are not doing is trying to look for better architectures that can predict the next word without the cost of the transformer architectures.

The fact that next word prediction works “suggests something that’s at least scientifically very important: that human language (and the patterns of thinking behind it) are somehow simpler and more “law-like” in their structure than we thought. ChatGPT has implicitly discovered it. But we can potentially explicitly expose it, with semantic grammar, computational language, etc.”

AI in this society needs to be unbiased, or at least explain the bias. Current models cannot do this. 

AI needs to be more accessible to all languages. AI needs to be developed more sustainably.

“Language is not a mere string of words. It has a suggestive power well beyond the lexical meaning” -Ngũgĩ wa Thiong’o

“Learning(in English) for a colonial child became a cerebral activity. Not an emotionally felt experience”-Ngũgĩ wa Thiong’o

Language is deeply intertwined with cultural nuances. Existing models lack this crucial understanding and risk perpetuating a homogenisation that fails to cater to the diversity of the Global South.

It is not worth it to fight the compute and data war. We have already started late and it will be hard to catch up to the US and China.

With these ideas let us try to set up some guiding principles for AI in India. I present to you, The Four Pillars of Bharat’s AI Revolution

Pillar 1. Don’t chase compute. Chase better algorithms:

If we let compute define the winner, we have already lost. We can never have as much compute as the US or China. So we need an alternative. Let’s say we have a billion sentences(everything ever written etc). That’s our training set. We have to use that and predict the next token given a set of context tokens. Can we come together and find a mathematical equation that can predict the next token: One that does not need to do 175 billion mathematical operations just to find the next word. For inspiration, we can look at the ISRO Mars mission which cost 10 times less than other efforts. We can also look at China and Deepseek. Faced with a possible embargo on NVIDIA GPUs, the Chinese teams came together and came up with much more efficient models that can be trained at a fraction of the cost. When Rajan Anadan asked Sam Altman if some team of super engineers could create a ChatGPT with $10 million dollars, Sam said, “It’s totally hopeless”. Well, the Deepseek team built a ChatGPT with $6 million. So it’s possible.

We have the mathematicians to make it happen. It’s what India does best: build new more efficient things with less resources. Especially when we know that something is possible and we just have to recreate it.

Pillar 2. Collect data responsibly. Collect the right data:

The current approach is to be a data hog: Scrape all the data with or without permission. If that’s not enough, generate synthetic data. A good AI system does not need so much data.

This data-hungry mindset is neither ethical nor necessary. A better approach focuses on collecting the right data, responsibly and thoughtfully, to build high-quality AI systems that are efficient, ethical, and inclusive.

We’ve already seen this in action with speech recognition systems, where smaller, well-curated datasets have delivered outstanding results. Similarly, research into small language models (SLMs) is proving that high-quality, domain-specific data—such as content from textbooks or structured, context-rich sources—can be more impactful than indiscriminate data collection.

India, with its people, linguistic diversity, and cultural richness, has a unique advantage. To unlock this potential, we must engage people directly in the process of data collection for AI systems, particularly for underrepresented languages and dialects. This is where grassroots initiatives like Swecha’s projects stand out:

  1. Chandamama Kathalu: Swecha created an open dataset of culturally relevant stories in Telugu, emphasizing the importance of preserving linguistic heritage while simultaneously contributing valuable training data for natural language processing (NLP) systems. The dataset was collected by engineering college students in a single day in a “Datathon”. This project shows how community-driven efforts can create datasets rooted in local culture and context, making AI systems more inclusive and representative.
  2. Gonthuka Project: This project focused on collecting Automatic Speech Recognition (ASR) data for Indian languages, highlighting how collaboration and participation can fill critical gaps in AI development. This project was also a community-driven effort where students and other volunteers travelled the breadth of Telangana to collect voices with different dialects and accents. The “prize” for the data collection and data sharing was a free ticket to a concert. By involving communities with innovative practices in the data collection process, Gonthuka not only provided high-quality speech data but also empowered people to contribute actively to the future of AI in their languages.

These examples illustrate a crucial point: if we don’t take the lead in creating and curating data for our languages and cultures, no one else will. By mobilizing communities to contribute, we not only preserve our heritage but also create opportunities to design AI systems that work for all of India, not just for a few dominant languages or regions. And by involving students in this pipeline we can create skilled engineers who understand the AI pipeline of data collection, cleaning and model building.

Responsible data collection is about more than just ethics—it’s about sustainability, quality, and relevance. When done thoughtfully, with the involvement of the very people the systems are meant to serve, it leads to AI that is more ethical, equitable, and effective. Through efforts like Swecha’s and broader national initiatives, India can set a global benchmark for responsible, community-driven AI development.

Pillar 3. Use open source

One of the big reasons why we don’t need to chase compute is because of the open source models. Open-source models provide transparency by granting access to their weights, enabling researchers and developers to deeply analyze how these systems function. This insight can lead to better understanding and improvement of AI systems without starting from scratch. The costly pre-training operations are already done.

The mapping of language to a hyper dimensional space is already trained. We can then use this space and fine tune it for our purposes. Because we can look under the hood of open source models, we can take them apart and make them better. Case in point, Kokoro TTS model has proven this last week, by taking the open source StyleTTS model and removing layers and creating an efficient model with just 82 million parameters which is now topping the TTS spaces arena.

Open-source models allow for tailored adaptations to specific contexts, such as fine-tuning for unique languages, regional dialects, or cultural nuances. This localized customization ensures that AI solutions are inclusive and effective across diverse populations.

Releasing datasets and models under open-source licenses further amplifies the collective progress of the AI community. With proper licensing, this approach fosters collaboration while maintaining ethical and legal standards. By encouraging shared innovation, we can accelerate advancements in AI and democratize access to powerful tools that were once limited to a few.

Pillar 4. Use AI as an augmentation technology rather than a replacement technology

A lot of hype around AI is that it’s a replacement technology (will replace drivers, will replace radiologists, will replace call centers etc). India will be better served by considering AI as an augmentation technology. How can AI help our teachers? How can AI help our doctors reach rural areas? How can AI help our farmers? How can AI help our IT engineers do their jobs better? The hype of AI is that it will lead to abundance. We have to understand what this means for India. The strength of India is its people and our natural resources. How can we use AI to get the best out of us?

Instead of displacing workers, AI can amplify their capabilities, enabling them to perform their roles more efficiently and effectively. For instance:

  • Teachers: AI can serve as a powerful assistant to educators, providing personalized learning plans for students, automating routine administrative tasks like grading, and offering real-time insights into student performance. This frees teachers to focus more on mentoring and creative instruction, which machines cannot replicate. This is already being piloted in states like Telangana where we are facing some problems. The stories and examples that AI creates to explain concepts are westernized. Our kids cannot relate to concepts like ice skating, teddy bears etc. We need our own localized AI for teaching.
  • Doctors: AI-powered diagnostic tools can help doctors reach underserved rural areas by enabling telemedicine and offering early detection capabilities through automated scans. These systems don’t replace doctors but extend their reach, allowing them to provide quality care to remote populations. The major problem here is access to India specific data. Governments should come up with policies to provide access to data to researchers while protecting privacy of the patients. Some experiments with sandboxes are being done, but this has to be scaled up quickly.
  • Farmers: AI can assist farmers through precision agriculture technologies, offering real-time weather updates, crop health monitoring, and predictive analytics to optimize yields. This technology complements the farmer’s expertise, ensuring more sustainable and profitable farming practices. A core focus here should be to see how we can create small efficient models which are accessible to the farmers at their locations.
  • IT Engineers: For IT professionals, AI can automate repetitive coding tasks, streamline debugging processes, and offer insights for better system design. Engineers can then focus on more strategic and creative challenges, enhancing innovation and job satisfaction. GCCs should realize that the general process of outsourcing is going to be upended. Their processes have to be updated to include AI coding assistants into their pipeline.

It’s also essential to address the reality that automation can and will transform certain roles. This is why the focus must be on reskilling and upskilling the workforce. By equipping workers with AI literacy and training, we ensure they can collaborate with AI systems rather than compete with them.

When positioned as an augmentation technology, AI becomes a partner to human ingenuity rather than its adversary. For India, with its diverse challenges and massive workforce, we need to find approaches that can drive inclusive growth, foster innovation, and create new opportunities that uplift rather than displace.

With these 4 pillars, we can be sure that we can set India on a path for AI leadership.

Steps are already being taken in all these areas. We have already proven that binary embeddings are enough for sentence embeddings and vector DBs, reducing costs by 8 times. We have shown a path to collect data responsibly collecting 1.5 million audio sentences in a couple of weeks. We have built compute efficient models running ASR systems on mobiles. It’s just a matter of time before we will build models as capable as LLMs running on our mobiles. After all, a kid with 2000 words vocabulary can do most of the things an LLM does.

The West’s AI dreams of godlike machines; China’s of social control. India’s mission? To craft intelligence that doesn’t just compute but comprehends—where a farmer’s AI ally knows the monsoon like his grandfather’s bones, and a child’s chatbot explains math through peacock feathers, not penguins. This isn’t just AI for India—it’s wisdom engineering for the world. The future of AI in India is not about winning a GPU war—it’s about leading a revolution in responsible, accessible, and culturally inclusive AI. The race isn’t to the biggest—it’s to the smartest. And India is ready to claim its place. The question is no longer whether we can compete, but rather: Are we ready to lead?

About the author

Chaitanya Chokkareddy

CTO, Ozonetel