EDITORIAL1: AI & copyright law
Context
Generative AI models are trained on internet data, often including copyrighted content without permission. Two U.S. court rulings have favored tech companies, but key legal questions—like whether this use is fair or infringing—remain unresolved.
The issue
- At a very basic level, AI models such as ChatGPT and Gemini identify patterns from massive amounts of data.
- Their ability to generate passages, scenes, videos, and songs in response to prompts depends on the quality of the data they have been trained on.
- At least 21 ongoing U.S. lawsuits accuse tech companies of “theft” for training AI on copyrighted work, while the companies defend their actions as “fair use” for creating transformative models.
Generative AI
- It refers to AI systems capable of creating new data, whether it’s text, images, or code. It is driven by advancements in Large Language Models (LLMs) that have the capability to generate new data, whether it’s text, images, or code.
- The models are trained on massive datasets, often scraped from the open internet. These datasets frequently include copyrighted material, sometimes without the explicit consent of the copyright holders.
- GenAI tools are now being used in mainstream journalism, advertising, entertainment, and education.
- It has raised ethical and legal concerns over whether the use of such data in training AI constitutes fair use or a breach of copyright law.
The Copyright Issues
- AI companies use web scraping methods to train their LLMs on a vast array of data, including both public and copyrighted content.
- In the USA, OpenAI faces similar lawsuits, where it has invoked ‘fair use’ and ‘fair learning in education’ as defences under American copyright law.
- United States has clarified that purely AI-generated works are not eligible for copyright protection.
- It has prompted creators to include ‘substantial human authorship’ in AI-assisted works to ensure copyrightability.
- Japan has explicitly stated that AI training using copyrighted data does not infringe copyright as long as it is non-consumptive and for machine learning purposes.
Legal Complexities in India
- India’s copyright framework is significantly different from the US model, as India follows an enumerated exceptions approach, not the flexible US ‘fair use’ test.
- Educational exceptions in India are narrowly defined, confined to classroom use. It limits maneuverability for AI developers and may favor right-holders in litigation.
Way Forward
- Regulatory frameworks need to evolve and ensure that original human creators are respected, credited, and, where appropriate, compensated.
- Copyright law stands at a pivotal moment. By reaffirming the core principles of copyright and ensuring fair treatment of all players in the AI ecosystem, the law can continue to serve its dual purpose—protecting creators while promoting learning and innovation.