The New York Times Wants OpenAI And Microsoft To Pay For Training Data

Spread the love

The New York Times is suing OpenAI and Microsoft, a close partner and investor, for allegedly breaking copyright laws when they trained generative AI models using information from the Times.

The Times claims in the case, filed in the Federal District Court in Manhattan, that millions of its stories were improperly utilized to train AI models, including those that power Microsoft’s Copilot and OpenAI’s wildly popular ChatGPT. In addition to demanding that Microsoft and OpenAI “destroy” training data and models that contain the offensive content, The Times is suing them for “billions of dollars in statutory and actual damages” resulting from their “unlawful copying and use of The Times’s uniquely valuable works.”

According to The Times’ case, “there will be a vacuum that no computer or artificial intelligence can fill if The Times and other news organizations cannot produce and protect their independent journalism.” “There will be a significant loss to society regarding journalism output.”

To create essays, code, emails, articles, and more, generative AI models “learn” from examples. Companies such as OpenAI scan the internet for millions or billions of these examples to add to their training sets. A few of the instances are in the public domain. Others aren’t, or they’re covered by restricted licenses that demand payment in a certain way or citation.

Suppliers contend that the fair use theory offers complete defense for their web-scraping operations. Hundreds of news organizations are currently utilizing coding to stop OpenAI, Google, and other companies from searching their websites for training data, despite the disagreement of copyright holders.

The Times’ lawsuit is the most recent in a string of court cases resulting from the vendor-outlet dispute.

In July, actress Sarah Silverman became a party to two lawsuits alleging that Meta and OpenAI used Silverman’s memoir as “ingested” data to develop their AI models. Thousands of authors, including John Grisham and Jonathan Franzen, sued OpenAI, claiming the company used their writing as training data without their knowledge or consent. Additionally, several programmers are suing Microsoft, OpenAI, and GitHub for Copilot, an AI-powered code-generating tool that they claim was created using their intellectual property (IP) protected code.

Though not the first publisher to file a lawsuit against generative AI vendors for purported intellectual property violations involving written works, The Times is the biggest one to do so to date and was among the first to draw attention to the potential harm that could be done to its reputation as a result of “hallucinations,” or made-up facts from generative AI models.

The complaint from The Times lists multiple instances where Microsoft’s Bing Chat (now Copilot), which an OpenAI model powers, gave false results purportedly from The Times. These results included those for “the 15 most heart-healthy foods,” 12 of which were not mentioned in any article published by The Times.

The Times also argues that OpenAI and Microsoft are effectively using The Times’ works to create competitors for news publishers, hurting The Times’ business by offering information that was previously only accessible with a subscription; additionally, The Times claims that this information is not always cited, occasionally monetized, and devoid of affiliate links, which The Times relies upon for commissions.

Generative AI models tend to repeat training data, such as repeating nearly exact results from articles, as The Times’ criticism implies. Beyond mere repetition, OpenAI has unintentionally made ChatGPT users able to bypass news material protected by a paywall at least once.

Impacts on publisher web traffic and the news subscription industry are central to a related lawsuit publishers brought against Google earlier this month. The defendants in the case, including The Times, contended that Google’s GenAI experiments—such as its AI-powered Bard chatbot and Search Generative Experience—use anticompetitive tactics to steal readers’ attention, publishers’ content, and ad income.

The claims made by publishers are valid. According to a recent study from The Atlantic, 75% of the time, a search engine like Google could respond to a user’s query without forcing them to link through to its website if it integrated AI into search. According to publishers involved in the Google lawsuit, they can lose up to 40% of their traffic.

Rather than taking vendors to court, several news organizations have decided to sign licensing contracts with them. Axel Springer, the German publisher that owns Politico and Business Insider, also reached an agreement with OpenAI this month, following the Associated Press’s July agreement.

The Times claims in its complaint that it tried to come to a licensing agreement with Microsoft and OpenAI in April but that the negotiations finally failed.

(Information Source: Techcrunch.com)

Spread the love