- Judge Alsup’s ruling is the first major judicial decision on fair use and generative AI training.
- The decision was Solomonic: the pirated books part was a win for the plaintiff book authors and the fair use in AI training was a win for Anthropic.
- The decision makes a distinction on the acquisition of an “initial copy” that is from a “pirated source,” and strongly suggests it should disqualify one from fair use even when later used for a fair use purpose to train AI models.
- Assuming this approach is right for the sake of argument, it leaves important questions unanswered, including the meaning of “pirated.”
Judge William Alsup issued the first decision recognizing that using copyrighted materials to train AI models is a transformative fair use. Although many other cases raise the same question against different AI companies, including Meta, Microsoft, OpenAI, and Google, this first decision provides at least one precedent tackling this novel question of law. Ultimately, the Supreme Court will (likely) have the final say. But, for now, it’s worth exploring Judge Alsup’s Solomonic decision as a potential legal pathway to fair use analysis of AI training.
the solomonic decision
I will focus here on the two biggest parts of the split decision:
(1) Win for Plaintiffs book authors on pirated books in library: Anthropic’s acquiring of pirated copies of books from so-called shadow libraries that others assembled online (Books3, LibGen, and Pirate Library Mirror) and storing them in a central library at Anthropic indefinitely is copyright infringement. (*Because the plaintiffs did not move for summary judgment, there’s no ruling of infringement, but the opinion all but says so.) “A separate justification was required for each use. None is even offered here except for Anthropic’s pocketbook and convenience.” Trial on damages, including willfulness. “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”
(2) Win for Anthropic on fair use in AI training: The copies subsequently used to train Anthropic’s AI models were fair use. It had an “exceedingly transformative purpose”–“spectacularly so”—to create a “technology … among the most transformative many of us will see in our lifetimes.”
This article focuses on the first part of the decision, while a subsequent article will analyze the fair use part (all or nearly all of which I agree with).
the pirated sites and Sources of the initial copy used
Most of the media attention has focused on (2), the fair use decision. No doubt it is incredibly important. But not enough media attention has been devoted to (2), the pirated books ruling. Let’s not forget the plaintiffs can recover statutory damages only per work, not per number of copies. That means they will recover the same amount even if Judge Alsup had ruled no fair use in the training copies (discounting for any egregiousness based on extent and nature of copying a jury might factor in to pick the dollar amount within the statutory range, the maximum being $150,000 per work for willful infringement). Here the outer limit of Anthropic’s liability exposure is roughly 1 trillion 50 billion dollars ($1,050,000,000,000), as explained below.

judge Alsup’s stern language
It’s also important to recognize that Judge Alsup used very stern language describing Anthropic’s acquisition of pirated books for its library. Judge Alsup said Anthropic “stole” the books and engaged in “theft,” including in his conclusion:
- From the start, Anthropic “ha[d] many places from which” it could have purchased books, but it preferred to steal them to avoid “legal/practice/business slog,” as cofounder and chief executive officer Dario Amodei put it (see Opp. Exh. 27).
- Not every person who merely intends to make a fair use of a work is thereby entitled to a full copy in the meantime, nor even to steal a copy so that achieving this fair use is especially simple or cost-effective.
- But Anthropic did not do those things — instead it stole the works for its central library by downloading them from pirated libraries.
- Here, piracy was the point: To build a central library that one could have paid for, just as Anthropic later did, but without paying for it.
- That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.
And, from the very opening line, the entire opinion repeatedly refers to the “pirate sites” and “pirated” sources and copies of the books. Few copyright decisions I recall contain such strong rebuke of the defendant’s conduct as stealing. Tim McFarlin counted 58 “pirate” references and at least 7 references to “stealing” in the opinion.
Perhaps the use of this stern language about “stealing,” “theft,” and “pirate” is justified. But Judge Alsup’s opinion does not explain these terms–or why they are more appropriate to use here than “copyright infringement” or “unauthorized copies.”
In Dowling v. U.S., in interpreting a criminal law that prohibiting transport of goods knowing they have been stolen, converted or taken by fraud,” the Supreme Court cautioned against equating theft with copyright infringement:
It follows that interference with copyright does not easily equate with theft, conversion, or fraud. The Copyright Act even employs a separate term of art to define one who misappropriates a copyright: “ ‘Anyone who violates any of the exclusive rights of the copyright owner,’ that is, anyone who trespasses into his exclusive domain by using or authorizing the use of the copyrighted work in one of the five ways set forth in the statute, ‘is an infringer of the copyright.’ 17 U.S.C.] § 501(a).” Sony Corp., supra, 464 U.S., at 433, 104 S.Ct., at 784. There is no dispute in this case that Dowling’s unauthorized inclusion on his bootleg albums of performances of copyrighted compositions constituted infringement of those copyrights. It is less clear, however, that the taking that occurs when an infringer arrogates the use of another’s protected work comfortably fits the terms associated with physical removal employed by § 2314. The infringer invades a statutorily defined province guaranteed to the copyright holder alone. But he does not assume physical control over the copyright; nor does he wholly deprive its owner of its use. While one may colloquially like infringement with some general notion of wrongful appropriation, infringement plainly implicates a more complex set of property interests than does run-of-the-mill theft, conversion, or fraud.
Dowling v. U.S, 473 U.S. 207 (1985).
what makes an initial copy a pirated copy?
Putting word choice aside, Judge Alsup’s decision places special attention on the initial copy acquired by the defendant for a potential fair use later. Because all of the AI lawsuits involve AI training that required datasets, focusing on the initial copy–whether it was “pirated”–may affect all of the other AI cases involving fair use defenses.
Although he limited his decision to situations involving the building of a permanent library, he strongly suggested the use of any “pirated copies” should be disqualifying under fair use for AI training: “Such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded” (19).
important questions left open
I disagree with that suggestion, especially its sweeping, categorical nature for fair use, but will save my analysis for a future post. Let’s assume that acquiring “pirated copies” is infringing, even if the ultimate goal is to train an AI model. That begs two questions:
(1) what makes a copy “pirated,” and
(2) is it different from being an unauthorized copy?
These questions are incredibly important because a fair use defense will always involve a defendant’s unauthorized copying of the plaintiff’s work. Indeed, that’s the whole question: whether the unauthorized copying is fair use.
In Harper & Row, the facts involved a stolen, or “purloined” physical manuscript of a book, President Ford’s memoirs. Perhaps the “pirated” digital copies of books are functionally equivalent to the purloined manuscript because someone took the digital copies dishonestly, such as without paying for them when they were offered for sale. That does sound like a theft. And that does sound like what Judge Alsup had in mind when he contrasted Anthropic’s use of pirated books to Google’s use of books presumably lawfully acquired by university libraries in the Google Books case:
Nor does the initial copying here even resemble the full-text copying in the Google Books cases. There, libraries of authorized copies already had been assembled, and all copies therefrom were made for direct employment in a one-to-one further fair use — whether the transformative use of pointing to the works themselves, the use of providing the works in formats for print-disabled patrons, or the use of insuring against going out of print, getting lost, and becoming otherwise unavailable. HathiTrust, 755 F.3d at 97, 101, 103; Google, 804 F.3d at 206, 216–18, 228 (further distinguishing search and snippet uses, which “test[ed] the boundaries of fair use”). Not so here concerning the pirated copies. No authorized copies existed from which Anthropic made its first copies.
Judge Alsup’s suggestion seems like it could require the defendant to make a lawful acquisition of the initial copy, which might mean paying for it. But if that requirement is applied too broadly, it could devolve into just a licensing requirement, thereby swallowing the whole doctrine of fair use. For example, should scraping content from the Internet be considered lawful or unlawful acquisition of the initial copy? What if the scraping violates the websites’ terms of use or a paywall? Piracy?
My sense is that people don’t call web scraping “piracy,” at least not as commonly as “pirate” is used to describe the controversial shadow libraries. After all, search engines like Google are created through scraping all websites online. People don’t usually refer to that as piracy. But, if Judge Alsup’s approach to “pirated” initial copies is adopted in other cases, such as the image generator cases, courts will have to better define what distinguishes a pirated copy from merely an unauthorized copy.
One possibility is that copyrighted content offered for viewing or listening only by sale or payment by consumers is “pirated” if someone takes it without paying, whereas content offered freely online is not “pirated” when someone merely screenshots it or makes a copy, such as on their phone. If the latter were piracy, everyone is a pirate. But what about nonprofits that create datasets, such as LAOIN? Are they engaging in “piracy” of online content? Under the proposed distinction of “viewing … only by sale or payment,” no. However, I’m sure some will disagree with that distinction and will argue for a general requirement of lawfully acquired copy for fair use.
Of course, we could reject the use of the terminology “pirate” as obfuscating the underlying complexities of copyright disputes. I favor that approach and believe Dowling supports it or, at least, cautions against equating infringement with theft. In this article, however, I wanted to examine what Judge Alsup’s approach might look like if applied more generally to other AI suits.
For more on my approach, see my prior post below (which does not discuss library-building because that issue had not received much attention in the litigation):
Related Stories
