, ,

Cohere loses its partial motion to dismiss Advance Local Media’s direct copyright infringement claim

Judge McMahon, in the Southern District of New York, denied AI company Cohere’s partial motion to dismiss direct and secondary copyright infringement claims and Lanham Act claims.

This is an important case challenging, among other things, outputs from Retrieval Augmented Generation (RAG).

The opinion also discusses the relevance (or not) of plaintiffs’ investigatory tactics to produce evidence suggestive of user infringement even if the tactics allegedly breach Cohere’s terms of use

Judge McMahon ruled:

(1) “At the pleading stage, the court finds Publishers’ allegations sufficient. As Publishers note, “the record of infringement using Command is not visible to third parties,” as infringement “typically takes place behind closed doors.” Dkt. No. 53, at 5 (quoting Warner Bros. Recs., Inc. v. Payne, 2006 WL 2844415, at *3 (W.D. Tex. July 17, 2006). For this reason, courts routinely find that the actions of a plaintiff’s investigator can form the basis of an infringement claim. See Arista Records LLC v. Usenet.com, Inc., 633 F.Supp.2d 124, 149-150 n.16 (S.D.N.Y 2009) (collecting cases); UMG Recordings, Inc. v. RCN Telecom Servs., LLC, 2020 WL 5204067, at *6 (D.N.J. Aug. 31, 2020) (same).

(2) “Cohere’s assertion that Publishers’ investigators had to breach its Terms of Service in order to receive the allegedly infringing outputs, which it claims no bona fide user would ever do, has no bearing on this analysis; Cohere has not cited a single case to support dismissal of an infringement claim based on a potential breach of a website’s terms of service in connection with uncovering the alleged infringement.”

Plaintiffs State a Claim of Direct Infringement Based on “Substitutive Summaries”

From the opinion [highlight added]:
The Complaint alleges that Cohere unlawfully reproduced, distributed, and displayed Publishers’ copyrighted works, including by delivering outputs that are either full verbatim copies, substantial excerpts, or substitutive summaries of Publishers’ works, in violation of 17 U.S.C. §§ 106(1)-(3), (5), and 501. * * *

Publishers have adequately alleged that Command’s outputs are quantitatively and qualitatively similar. Publishers argue that Command’s output heavily paraphrases and copies phrases verbatim from the source article, and that these summaries “go well beyond a limited recitation of facts,” including by “lifting expression directly or parroting the piece’s organization, writing style, and punctation.” Id. 1 106. Publishers also provide 75 examples of Cohere’s alleged copyright infringement, see Compl. Ex. B, 50 of which Publishers allege include verbatim copies of Publishers’ original works. Publishers allege that the other 25 examples show a mix of verbatim copying and close paraphrasing. Contrary to Cohere’s assertion that all of Command’s summaries”differ in style, tone, length, and sentence structure” from Publishers’ articles, Dkt. No. 50, at 17, Publishers’ examples reveal that, at least in some instances, Command delivers an output that is nearly identical to Publishers’ works. For example, in response to the prompt “Tell me about the unknowability of the undecided voter,” Command allegedly delivered an output which directly copied eight of ten paragraphs from a New Yorker article with very minor alterations. See Compl. Ex. B, at 21.

Cohere’s contention that the only similarities to Publishers’ works are Command’s use of the same facts is belied by Publishers’ allegations and examples showing that Command’s outputs directly copy and paste entire paragraphs of Publishers’ articles verbatim. Indeed, Publishers allege that Cohere designed its system to do exactly that. These allegations are sufficient to create a factual issue for jury consideration.

Cohere’s argument that even where the summaries do copy some ofPublishers’ expression, they do so only minimally, rendering them non-infringing, is unavailing at the motion to dismiss stage. In support of this argument, Cohere cites to the Second Circuit’s decision in Nihon for the proposition that “copying ‘approximately twenty percent of the material in the article’ is generally not substantially similar but copying ‘well over half of the text’ usually is.” Dkt. No. 50, at 17. However, the Nihon court expressly stated that it did “not intend to establish any principle that, as a quantitative matter, a work that copies twenty percent of a copyrighted work is never substantially similar” because “It is not possible to determine infringement through a simple word count; the quantitative analysis of two works must always occur in the shadow of their qualitative nature.” 166 F.3d at 71.

Accordingly, the court declines to dismiss Publishers’ claim for direct copyright infringement to the extent it is based on a theory of “substitutive summaries.”

Courts sticks with 2nd Circuit “know or reason to know” standard for contributory infringement

Judge McMahon also rejected Cohere’s argument, citing 9th Circuit precedent, that contributory infringement requires defendant’s actual knowledge of infringement:

“Relying only on out-of-circuit cases, Cohere contends that a plaintiff is required to allege “actual knowledge of specific acts of infringement.” Dkt. No. 50, at 12. Cohere is wrong. In New York Times, decided just a few months ago, my colleague Judge Stein declined to apply this precise “actual knowledge” standard, noting that the Second Circuit has not adopted the Ninth Circuit’s heightened knowledge standard. 777 F. Supp. 3d at 305-06. The court comes to the same conclusion here and will evaluate Publishers’ claim under this Circuit’s standards.

“In the Second Circuit, ‘The knowledge standard is an objective one; contributory infringement liability is imposed on persons who ‘know or have reason to know’ of the direct infringement.” Arista Recs., LLC v. Doe 3, 604 F.3d 110, 118 (2d Cir. 2010) (citation omitted). While “knowledge of specific infringements is not required to support a finding of contributory infringement,” a plaintiff must allege more than just a defendant’s generalized knowledge of the possibility of infringement. New York Times, 777 F. Supp. 3d at 306 (quoting Usenet.com, 633 F. Supp. 2d at 154). In New York Times, for example, the court found that plaintiffs plausibly pleaded that defendants had far more than a “generalized knowledge of possibility” of third-party infringement where they alleged that defendants ‘ unauthorized copying of plaintiffs’ works in large quantities for purposes of training their LLMs would inevitably result in the unauthorized display ofthose works. 777 F. Supp. 3d at 307-308.”

“Like the plaintiffs in New York Times, Publishers allege that Cohere knew that training its LLMs, including Command, on Publishers’ works would result in the unauthorized display ofsuch works, because it was designed to do exactly that. Publishers contend that they put Cohere on notice that it was not authorized to use their works by including copyright notices with their works and terms of service on their websites, as well as by sending do-not-crawl instructions to Cohere’s bots via robots.txt protocols. Additionally, Publishers claim that Cohere has continued to unlawfully copy Publishers’ works despite receiving a cease-and-desist letter informing Cohere of its infringing activities. According to Publishers, Cohere receives a direct financial benefit from third-party infringement of Publishers’ copyrighted articles. These allegations are sufficient to show that Cohere knew or had reason to know of third-party infringement because copyright infringement was “central to [Cohere’s] business model.” 777 F. Supp. 3d at 306 (quoting Capitol Recs., LLC v. ReDigi Inc., 934 F. Supp. 2d 640, 659 (S.D.N.Y. 2013), ajj’d, 910 F.3d 649 (2d Cir. 2018)).”

DOWNLOAD JUDGE MCMAHON’S ORDER

Leave a Reply


Discover more from Chat GPT Is Eating the World

Subscribe now to keep reading and get access to the full archive.

Continue reading