India's AI Push Now Depends On A Less Glamorous Task: Defining Data Properly

India's artificial intelligence ambitions may depend on a task that sounds dull but is fundamental: making government data mean the same thing across departments.

Arjun Malhotra

Published June 10, 2026

India's AI Push Now Depends On A Less Glamorous Task: Defining Data Properly · The Indian Daily Post

India's artificial intelligence ambitions may depend on a task that sounds dull but is fundamental: making government data mean the same thing across departments. Business Standard reported that Saurabh Garg, secretary in the Ministry of Statistics and Programme Implementation, said the Centre is working on common definitions, classifications and standards to make public datasets interoperable and useful for AI applications. Speaking at an event on AI and digital public infrastructure organised by the National Council of Applied Economic Research, Garg described semantic interoperability as a key challenge.

The issue is simple to state and hard to fix. Different ministries may use different definitions for the same concept. Business Standard reported Garg's example of "pucca houses", where one department may focus on wall material, another on roofing, and another on flooring because of public-health implications. If an AI system or welfare platform tries to combine those datasets without understanding the difference, the result can be misleading or unfair.

This matters because India has already built important digital public infrastructure: Aadhaar, UPI, DigiLocker, Account Aggregator and other systems that allow identity, payments and data exchange at scale. Those systems help move information. The next challenge is meaning. Data that can be transferred is not automatically data that can be understood. AI systems need consistent definitions, identifiers, metadata and quality standards before they can support public services responsibly.

Business Standard reported Garg as saying the raw material of AI is data, and that data must be interoperable, harmonised and machine-readable. That point often gets lost in public conversations that focus on models, chips and apps. A powerful AI model can still make poor recommendations if the underlying data is fragmented, contradictory or poorly labelled. In government use, that can affect welfare targeting, urban planning, health services, education systems and economic statistics.

The risks are not only technical. If datasets are combined badly, citizens can be misclassified. A household may be counted as having a permanent home under one definition but not another. A beneficiary may be included in one scheme and excluded from another. A district may appear better served than it really is because departments are measuring different things. AI can scale these mistakes quickly if standards are weak.

The Centre's work on common classifications and metadata should therefore be treated as governance reform, not back-office housekeeping. It requires ministries to agree on terms, publish standards, maintain data quality and update systems when definitions change. It also requires privacy safeguards. Making data interoperable does not mean making every dataset freely available without controls. Public benefit and confidentiality have to be balanced.

For startups and technology companies, better public data could create opportunities to build tools for agriculture, health, climate, logistics, education and financial inclusion. But those opportunities depend on trust. Businesses need reliable APIs and documentation. Citizens need assurance that data use is lawful and limited. Researchers need datasets that are findable, accessible, interoperable and reusable, while still respecting privacy.

India's AI story will not be won only through headline-grabbing model launches. It will be won through patient data architecture. Common definitions may not trend online, but they decide whether AI systems can understand the country they are meant to serve.

Arjun Malhotra reports for The Indian Daily Post on technology and policy.