Should Anthropic get discovery of Concord Music’s investigatory prompts and outputs it generated on Claude? Seems relevant to normal use of Claude and its guardrails

An important discovery dispute is brewing in Concord Music v. Anthropic.

Anthropic is seeking discovery of the prompts Concord Music used to investigate for potentially infringing outputs, plus the outputs Concord derived. “Because Publishers’ contentions can only be properly tested and rebutted with a granular understanding of their undisclosed prompts and outputs, Anthropic, Judge Lee, and the fact-finder are entitled to the full picture.”

Concord Music argues that the prompts used by their investigator and outputs generated are protected attorney work product as Magistrate Judge van Keulen already held (ECF No. 478).

My take on discovery of investigation underlying investigatory evidence of infringement

This is a festering issue in many of these AI copyright lawsuits. What if the Plaintiffs’ examples of allegedly infringing outputs rely heavily, if not solely, on the Plaintiffs’ own investigator’s generation of those outputs?

If a plaintiff relies on evidence of alleged infringement obtained by an investigator who generated alleged infringing outputs, I think that reliance puts into issue the investigator’s prompts and methods used to generate or manufacture the evidence of alleged infringement.

Plus, for an expert to be qualified to testify under Federal Rule of Evidence 702 and Daubert, the testimony must be (ii) “based on sufficient facts or data”; (iii) be “the product of reliable principles and methods”; and (iii) the expert’s opinion reflects a reliable application of the principles and methods to the facts of the case.”

The defendant has a right to contest every aspect of the expert’s putative qualifications, including the expert’s “facts or data,” and “methods” on which the opinion is based.

Why? For at least 4 independent reasons.

First, the defendant has a right to contest the credibility of the investigator. Imagine an investigator completely fabricated at least some examples of alleged infringement. (Cf. Shepard Fairey case.) The defendant should be able to examine the investigator’s methods, including all the prompts used and outputs generated by the investigator.

Plus, for an expert to be qualified to testify under Federal Rule of Evidence 702 and Daubert, the testimony must be (ii) “based on sufficient facts or data“; (iii) be “the product of reliable principles and methods“; and (iii) the expert’s opinion reflects a reliable application of the principles and methods to the facts of the case.” Before a plaintiff’s expert is qualified, the defendant has a right to contest every aspect of the expert’s putative qualifications, including the expert’s “facts or data,” and “methods” on which the opinion is based.

Second, the probative value of investigatory evidence of alleged infringement is potentially less than evidence of alleged infringement in real life or “in the wild” by users. In other words, it’s one thing if the AI generator routinely generates infringing outputs for all users in real life, in normal use of the AI. But it’s a completely different thing if the only infringing outputs are prompted by an investigator using potentially adversarial techniques and non-normal use of the AI generator. (The latter is somewhat analogous to unforeseeable product misuse in products liability.)

The defendant should be allowed to show the investigatory evidence has limited, if any, probative value because it does not reflect how users use the AI in real life. The more evidence that the investigator collected or manufactured using herculean adversarial efforts to generate only a relatively small number of infringing outputs would help the defendant prove this contention.

This issue of probative value of investigatory evidence versus evidence in real life was cogently discussed in the Getty Images v. Stability AI UK decision:

Mrs Justice Joanna Smith decision in Getty Images v. Stability AI. No secondary copyright infringement. Limited trademark infringement.No trademark dilution.

Third, and relatedly, the total universe of prompts used and outputs generated by the investigator is directly relevant to the scope of alleged infringement and effectiveness of guardrails, as well as the fair use defense (see Authors Guild v. Google: “The result of these restrictions [by Google] is, so far as the record demonstrates, that a searcher cannot succeed, even after long extended effort to multiply what can be revealed, in revealing through a snippet search what could usefully serve as a competing substitute for the original.”). For example, if the investigator used 1 million prompts to generate only 0.01% outputs of alleged infringement and 99.09% non-infringing outputs, this evidence is highly relevant to both infringement and fair use. A jury might conclude it should receive the low end of statutory damages. Or a jury might conclude that the relative lack of infringing outputs supports the reasonable amount of use of the plaintiffs’ works in the fair use analysis. See Authors Guild (“As snippet view never reveals more than one snippet per page in response to repeated searches for the same term, it is at least difficult, and often impossible, for a searcher to gain access to more than a single snippet’s worth of an extended, continuous discussion of the term.”).

Fourth, as Anthropic correctly argues (Letter at p. 1), the implementation of effective guardrails can be evidence that tends to negate willfulness of any infringement. If a defendant instituted effective guardrails to stop infringing outputs, it arguably was not engaging in willful infringement.

DOWNLOAD THE DISCOVERY DISPUTE LETTER

Concord Music v Anthropic re Concord’s Investigation of infringement Download

Chat GPT Is Eating the World