, ,

OpenAI defends itself: training is fair use & regurgitation is “rare bug” that New York Times may have exploited

In a blog post, OpenAI forcefully defended itself from the lawsuit the New York Times filed against it and Microsoft on Dec. 27, 2023. This is a stunning response to a lawsuit. Parties, especially corporate defendants, usually are tight-lipped about litigation and save their responses for the litigation in court.

The most stunning part of the article by OpenAI was the detailed response to the examples of “memorized” or “regurgitated” New York Times articles that it compiled in Exhibit J to its complaint. As we discussed, Exhibit J impressively compiles 100 examples of New York Times articles that ChatGPT allegedly produced based on prompts that NYT made consisting of several opening lines of its articles. Then ChatGPT apparently regurgitated the rest of the article.

In a prior post, we suggested that this must be a glitch of ChatGPT and that feeding it lines of text from other articles isn’t really what “prompts” are about. We tried to replicate the same results using the same technique–and it didn’t work for us, suggesting the loophole was fixed at least somewhat.

OpenAI’s blog post describes the glitch as a “rare bug” of “inadvertent memorization” and “regurgitation” due to “when particular content appears more than once in training data,” such as apparently with the “years-old articles [of the NYT] that have proliferated on multiple third-party websites.”

OpenAI notes that, under OpenAI’s terms of use, users of ChatGPT are not permitted to “intentionally manipulat[e] our models to regurgitate” content. (I have taken a quick look at the Terms of Use, both from March 14, 2023 and for Jan. 31, 2024, and do not immediately find an express provision covering this regurgitation activity. But I have not closely reviewed everything, and it may be covered by a general provision.)

But OpenAI doesn’t stop there. OpenAI then suggests that the NYT was being very strategic, if not sneaky, in compiling the examples.

Along the way, they had mentioned seeing some regurgitation of their content but repeatedly refused to share any examples, despite our commitment to investigate and fix any issues. We’ve demonstrated how seriously we treat this as a priority, such as in July when we took down a ChatGPT feature immediately after we learned it could reproduce real-time content in unintended ways.

Interestingly, the regurgitations The New York Times induced appear to be from years-old articles that have proliferated on multiple thirdparty websites. It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts.

OPenAI blog post on Jan. 8, 2024

OpenAI also reveals that, while they were trying to negotiate a license, the NYT didn’t give advance notice to OpenAI that it was filing a complaint; OpenAI learned about the lawsuit by reading it in the NYT.

Well, this blog post is unlikely to smooth over the negotiations with the New York Times.

Everyone, buckle up for litigation.

Leave a Reply


Discover more from Chat GPT Is Eating the World

Subscribe now to keep reading and get access to the full archive.

Continue reading