CEO of image library speaks out amidst outrage over the gathering of material for AI companies’ ‘training data.
Rishi Sunak must decide between supporting the UK’s creative sectors or risking everything on an artificial intelligence boom, according to Getty Images CEO Craig Peters. This comes as the creative and media industries express frustration over the use of their content for AI companies’ “training data.” Getty Images is currently suing an AI image generator for copyright infringement in the UK and US. Peters highlights the importance of the creative industries, constituting around 10% of the UK’s GDP, compared to the smaller contribution from AI, making the decision a perplexing trade-off.
In 2023, the government outlined its objective to address obstacles faced by AI firms in using copyrighted material, aiming to support AI companies in accessing copyrighted work for their models. This commitment followed a consultation from the intellectual property office. The government’s stance represented a modification from an earlier proposal advocating a broad copyright exception for text and data mining. Viscount Camrose, the parliamentary under-secretary of state for artificial intelligence and intellectual property, affirmed a balanced and pragmatic approach to maintain the UK’s leadership in AI while supporting the creative sectors.
The utilization of copyrighted material in AI training has faced growing scrutiny. In the US, the New York Times is taking legal action against OpenAI, the creator of ChatGPT, and Microsoft for incorporating its news stories in the training data for their AI systems. While OpenAI hasn’t disclosed the data used for training GPT-4, the newspaper managed to extract exact quotes from NYT articles using the AI system.
OpenAI, in a court filing, argued that constructing AI systems without employing copyrighted materials is impossible. The organization emphasized that restricting training data to public domain content from over a century ago might be an interesting experiment but would not meet the contemporary needs of citizens.
Peters holds a contrary viewpoint. Getty Images, in conjunction with Nvidia, has developed its image generation AI, exclusively trained on licensed imagery. Peters asserts that their collaboration refutes arguments suggesting that such technologies are incompatible with licensing requirements. He believes in exploring different strategies, dismissing the notion that incorporating a license is unfeasible, dismissing it as mere speculation.
Even within the industry, perspectives are shifting. A dataset named Books3, housing pirated ebooks, was hosted by an AI group with a controversial copyright takedown policy. This policy involved a video featuring clothed women simulating explicit actions while singing. Following objections from the affected authors, the dataset was quietly taken down, but not before being utilized to train various AIs, including Meta’s LLaMa AI.
In addition to legal actions initiated by Getty Images and the New York Times, several other lawsuits are underway against AI companies, focusing on potential infringements related to their training data.
In September, OpenAI faced a lawsuit from 17 authors, including John Grisham, Jodi Picoult, and George RR Martin, accusing the organization of “systematic theft on a mass scale.” Concurrently, in January of the previous year, a group of artists filed a lawsuit against two image generators, marking one of the initial instances of such cases entering the US legal system.
However, the ultimate determination of how courts or governments choose to regulate the use of copyrighted material for training AI systems may not conclusively settle the issue. Numerous AI models, including both text-generating LLMs and image generators, have been released as “open source,” allowing free download, sharing, and reuse without oversight. Even if there were restrictions on using copyrighted material for training new systems, this wouldn’t erase existing models from the internet, and it would have limited impact on individuals retraining, improving, and re-releasing them in the future.
Peters expresses optimism that the outcome is not predetermined. He stated, “Those responsible for creating and disseminating the code are ultimately bound by legal entities and are subject to them. The issue of what runs on your laptop or phone may pose more uncertainty, but individual responsibility comes into play there.”