Asian News International (ANI) Media, a prominent multimedia news agency in India, has filed a copyright infringement suit against OpenAI at the Delhi High Court (ANI Media Pvt Ltd v OpenAI Inc & Anr, CS(COMM) 1028/2024). This is the first time that a generative-AI platform has faced copyright infringement allegations in India. ANI is claiming that ChatGPT – OpenAI’s large-language model (LLM) – has been illegally scraping freely available and paywalled copyrighted content from its website.
This case is having a profound impact in many sectors, with companies from the publishing, music and tech industries submitting intervention applications.
OpenAI maintains data scraping for LLM training does not amount to infringement
OpenAI has commenced its arguments before the court, reiterating that extracting information does not necessarily amount to infringement and that there was no general prohibition on data use. OpenAI’s LLM is trained on non-expressive elements of data, which cannot be copyrighted – rather, it breaks down the data into “tokens”, which strictly contain “non-expressive” syntactic, semantic and grammatical information and any patterns or correlations. These tokens are then converted into vectors by assigning machine-readable numbers, which the LLM uses to make predictions and decisions, forming the basis of responses generated by ChatGPT.
OpenAI has compared the situation to reading a book – it claims that reading constitutes using a book’s content but does not amount to infringement. It was also asserted that OpenAI refines its models to prevent regurgitation and that these no longer have access to the original training data after the pre-training phase.
Experts offer divided opinions before the court
Two copyright experts appointed as impartial advisors in the suit have submitted written opinions and arguments at the Delhi High Court. One maintains that “storing of copyrighted material” for training AI models is permitted under Indian copyright laws and the court must decide whether OpenAI used the material for any purposes beyond this. However, the other has advised the court that OpenAI’s unauthorised use of copyrighted works constitutes infringement, and it cannot claim fair use because it is not a news agency and does not use the material for criticism or review. Both agree that the court has jurisdiction to adjudicate the suit since ANI Media conducts business in Delhi.
Meanwhile, in response to a query in a parliamentary session on 7 February 2025, the Ministry of Electronics and Information Technology has submitted that:
…web scraping with respect to any publicly available user data by any intermediary including social media companies for training Artificial Intelligence models or for any other purpose is regulated under the Information Technology Act, 2000.
Notably, Section 43 of the Information Technology Act penalises unauthorised access, downloading or extraction of data from a computer system. Thus, any data that is scraped without consent could constitute a violation of the act.
The ministry also clarified that the Digital Personal Data Protection Act 2023 requires organisations to implement robust compliance measures such as seeking clear, informed consent before processing data – including web scraping of publicly available user data.
Third parties submit intervention applications highlighting impact on their industries
Multiple stakeholders have recently submitted applications seeking intervention in the suit, presenting distinct concerns regarding the impact of AI-generated content on their respective industries. They advocate for protection of copyrighted works and compensation for unauthorised use.
The Federation of Indian Publishers, which represents more than 80% of India’s publishing sector, has submitted that OpenAI’s use of copyrighted content without authorisation diminishes the economic value of literary works and endangers the publishing industry. Similarly, the Digital News Publishers Association, which represents leading Indian print and television media companies, has asserted that OpenAI’s model illegally extracts and repurposes content, eroding the credibility and financial sustainability of journalism.
The Indian Music Industry – a trust representing distributors in India’s recording industry – has claimed that training AI models on copyrighted songs without a licence amounts to infringement and leads to financial losses for rights holders.
Indian Governance and Policy Project, a technology-focused think tank, has submitted an intervention application highlighting regulatory concerns and the need for a balanced AI policy framework to address copyright and fair-use issues. However, Flux Labs AI – a start-up company that is also seeking intervention – has apprised the court that imposing mandatory licensing fees for use of copyrighted works to train AI models could potentially hinder innovation and stifle competition for smaller companies. It will be interesting to see how the case progresses.