- Two judges issued orders finding fair use by Anthropic and Meta in training their AI models.
- But their decisions were not clear victories for the AI companies.
- Judge Alsup ruled that Anthropic’s downloading of “pirated books” from shadow libraries online was not fair use and all but ruled they were infringing. Trial on damages.
- Judge Chhabria ruled that the book authors failed to present sufficient evidence to raise a genuine issue of material fact to support (what he expects to be) market harm under a new theory of copyright dilution.
- Judge Chhabria even stated, in dicta, most training of AI models with copyrighted works without permission is likely illegal and not fair use due to this new theory of market dilution.
Below we compare and contrast the two decisions in Bartz v. Anthropic and Kadrey v. Meta. The key differences are indicated by an asterisk*.
Table. Comparison of Judge Alsup’s and Judge Chhabria’s decisions on fair use
| Fair Use Factor | Judge Alsup (Bartz v. Anthropic) | Judge Chhabria (Kadrey v. Meta) |
| Is downloading “pirated” books datasets separate use from training model? | *Separate use: Anthropic downloading / building a permanent library of pirated books was not fair use. Infringing. Trial on damages. | *Same use: Meta downloading was for the ultimate purpose of training AI model. Fair use. |
| (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes | Favors fair use (+). Training LLMs is exceedingly “transformative — spectacularly so” because it maps statistical relationships to produce technology that produces new, noninfringing outputs. Emphasized this is “among the most transformative many of us will see in our lifetimes.” No evidence of output infringing authors’ works. Cites Google Books decision. | Favors fair use (+). Meta’s use of the plaintiffs’ books had a “further purpose” and “different character” than the books—that it was highly transformative. The purpose of Meta’s copying was to train its LLMs, which are innovative tools that can be used to generate diverse text and perform a wide range of functions. Users can ask Llama to edit an email they have written, translate an excerpt from or into a foreign language, write a skit based on a hypothetical scenario, or do any number of other tasks. The purpose of the plaintiffs’ books, by contrast, is to be read for entertainment or education Commercial use tends to be less important when the secondary use is highly transformative. Cites Oracle decision. |
| (2) the nature of the copyrighted work; | Disfavors fair use (-). Creative expressive books. | Disfavors fair use (-). But second factor weighs less if transformative purpose. |
| (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole | Favors fair use (+). Compelling benefits of training the LLMs on strong examples were not offset by revelations to the public of any portion of the works themselves. What was copied was therefore especially reasonable and compelling. | Favors fair use (+). The amount that Meta copied was reasonable given its relationship to Meta’s transformative purpose. See Oracle, 593 U.S. at 34. Everyone agrees that LLMs work better if trained on more high-quality material. |
| (4) the effect of the use upon the potential market for or value of the copyrighted work. | Favors fair use (+). No cognizable market harm. Authors concede that training LLMs did not result in any exact copies nor even infringing knockoffs of their works being provided to the public. *Rejects new theory of copyright market dilution. The Copyright Act seeks to advance original works of authorship, not to protect authors against competition. Rejects lost licensing as a market Copyright Act entitles authors to exploit. | Favors fair use (+). Slight public benefit. Likely help Llama create new expression. *But accepts new theory of copyright market dilution: harm by helping to enable the rapid generation of countless works that compete with the originals, even if those works aren’t themselves infringing. But finds Plaintiffs failed to present sufficient evidence to create genuine issue. Rejects lost licensing as a market authors entitled to exploit. Circularity problem. |
Bottom line: These decisions are a mixed bag. Although they both find fair use, the rulings come with a huge asterisk. Because most, if not all, of the AI companies that created LLMs, such as OpenAI and Microsoft, allegedly used so-called pirated books datasets (e.g., Library Genesis or Books3), under Judge Alsup’s decision, acquiring the “pirated copies” would probably be treated as infringing (although Judge Alsup did not decide whether they would absent building a permanent library). By contrast, Judge Chhabria concluded that the pirated copies can be justified if they were used to train the LLM as he found Meta did. But Judge Chhabria embraced a new theory of market dilution under Factor 4 of fair use: “using copyrighted books to train an LLM might harm the market for those works is by helping to enable the rapid generation of countless works that compete with the originals, even if those works aren’t themselves infringing.” Judge Chhabria spent pages speculating that, in most cases, market dilution will mean that AI training is illegal.
“No matter how transformative LLM training may be, it’s hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books,” Judge Chhabria concluded.
Related Stories
DOWNLOAD A PDF OF THE CHART
