Many, if not most, generative AI tech vendors argue that fair use entitles them to train AI models on copyrighted material scraped from the internet — even if they don’t get permission from the rightsholders. But some vendors, such as OpenAI, are hedging their bets — perhaps wary of the outcome of pending relevant lawsuits.
OpenAI today announced that it’s reached an agreement with Axel Springer, the Berlin-based owner of publications including Business Insider and Politico, to train its generative AI models on the publisher’s content and add recent Axel Springer-published articles to OpenAI’s viral AI-powered chatbot ChatGPT.
It’s OpenAI’s second such arrangement with a news organization after the startup said that it would license some of the The Associated Press’ archives for model training.
Going forward, ChatGPT users will get summaries of “selected” articles from Axel Springer’s publications — including stories normally gated behind a paywall. The snippets will be accompanied both by attribution and links to the full articles.
In return, Axel Springer will procure payments of an unspecified size and frequency from OpenAI. The deal is valid for several years, and — while it doesn’t commit either side to exclusivity — Axel Springer says that it’ll preserve the outlet’s existing AI-driven ventures “that build upon OpenAI’s technology.”
“We’re excited to have shaped this global partnership between Axel Springer and OpenAI — the first of its kind,” Axel Springer CEO Mathias Döpfner said in a canned statement. “We want to explore the opportunities of AI-empowered journalism — to bring quality, societal relevance and the business model of journalism to the next level.”
Outside of the publishers tapping generative AI for questionable content strategies, publishers and generative AI vendors have a testy relationship, with the former alleging copyright infringement and increasingly concerned about generative models cannibalizing traffic. For example, Google’s new generative AI-powered explore go through, called SGE, has pushed links that appear in traditional explore advance down explore results pages — potentially reducing traffic to those links by as much as 40%.
Publishers also object to vendors training their models on content without compensation agreements in place — particularly in light of reports that tech giants including Google are experimenting with AI tools to summarize news. According to one recent survey, hundreds of news organizations are now using code to impede OpenAI, Google and others from scanning their websites for training data.
In August, several media organizations including Getty Images, The Associated Press, the National Press Photographers Association and The Authors Guild published an open letter calling for more transparency and copyright protection in AI. In the letter, the signatories urged policymakers to consider regulations that necessitate transparency into training datasets and allow media companies to negotiate with AI model operators, among other suggestions.
“[Current] practices impair the media industry’s core business models, which are predicated on readership and viewership (such as subscriptions), licensing, and advertising,” the letter reads. “In addition to violating copyright law, the resulting impact is to meaningfully reduce media diversity and impair the financial viability of companies to invest in media coverage, advance reducing the public’s access to high-quality and trustworthy information.”