Microsoft just filed its reply to the New York Times’ opposition to the defendants’ partial motion to dismiss. Microsoft, quite forcefully, argues that the NYT lawsuit “fails to allege a specific instance of an end-user infringing The Times’s works, or Microsoft’s knowledge of any such act.”
This is a lurking issue in the case. Although the NYT Complaint compiled 100 examples of regurgitation of NYT articles allegedly by ChatGPT, those examples were generated by the NYT using prompts consisting of portions of NYT articles.
The atmospherics of the 100 examples of regurgitated articles seems damning for the defendants. But, as I questioned then, it’s unclear how probative the value of these examples are, given (1) they were self-generated by the NYT and (2) don’t seem to be the common way in which users of ChatGPT write prompts. And why would a user copy and paste a paragraph of a New York Times article into ChatGPT, presumably already having a copy of the article?
Microsoft’s reply drives homes this point: the lack of evidence of any “real world” infringement of a New York Times article by a user of ChatGPT.
Of course, the New York Times can argue that the 100 examples of regurgitation are further evidence that the articles were used to train OpenAI’s large language model. But presumably that point would inevitably be determined through discovery of what datasets were used by OpenAI. The 100 examples seem far more salient as potential evidence of infringement in the output of ChatGPT. But Microsoft’s reply casts doubt on whether it is probative at all for that issue.
