In today’s controversial decision by Magistrate Judge Wang finding that OpenAI waived its attorney-client privilege as to communications on why it deleted Books1 and Books2 datasets (which was used to train earlier OpenAI models, according to an OpenAI paper at p. 9), Magistrate Judge Wang rejected one of OpenAI’s arguments based on Bartz v. Anthropic.
In footnote 20 on pp. 20-21 of her opinion, Magistrate Judge Wang even characterized OpenAI’s argument as “bizarre[]” and a “gross misunderstanding of Judge Alsup’s decision” in the passage below:


Although this issue appeared to be secondary in Judge Wang’s opinion on waiver of attorney-client privilege, it is nonetheless important to understand the potential reason(s) for an AI company to delete datasets after used to train an AI model. As I read OpenAI’s letter that Judge Wang pointed to, OpenAI’s citation of Bartz v. Anthropic was for this limited reason. There is at least one legitimate reason for an AI company to delete datasets after training: it bolsters the AI company’s argument that its use of the works was reasonable in relation to the purpose of training. And it avoids building a permanent central library, which now under Bartz v. Anthropic, is more vulnerable to a claim of copyright infringement.
My take on Judge Alsup’s decision
I believe OpenAI has the correct interpretation of Judge Alsup’s decision in Bartz v. Anthropic. Judge Alsup’s decision on fair use is premised on the expectation that the AI company engaged in the AI training does not store copies of copyrighted works indefinitely or in a general permanent library. Indeed, permanent library building from pirated copies is effectively what doomed one half of Anthropic’s fair use defense and led to Anthropic’s settlement of the lawsuit.
On p. 14, Judge Alsup explained Anthropic’s library building as a separate use from training:

In this key passage, what Judge Alsup finds problematic about Anthropic’s conduct is its building a central library and “keep[ing] [copies] forever with no further accounting.”
Thus, Judge Alsup’s opinion makes clear, these “permanent library copies” were separate uses from the “further use” (5) of “training LLMs.”
Training copies were fair uses. But permanent library copies were not.
On p. 10, Judge Alsup explained the rationale for taking a use-by-use approach to fair use consistent with Andy Warhol Foundation v. Goldsmith: “Sometimes, the copying involves many uses ….,” as it did in Google Books.
In Bartz, Judge Alsup found 3 separate uses of books: (1) library building and permanent retention by Anthropic of copies of books from shadow libraries, (2) AI training by Anthropic with copies of those books, and (3) Anthropic’s digitization of physical books. Judge Alsup held that Anthropic’s uses of copies for (2) and (3) were fair uses, for different reasons. But uses for library-building with pirated copies were not fair use, at least not on the record on summary judgment (but could be re-argued by Anthropic at trial).
Because the plaintiffs did not move for summary judgment, Judge Alsup concluded that “[w]e will have a trial” on “the pirated copies used to create Anthropic’s central library and the resulting damages, actual or statutory (including for willfulness).” But Judge Alsup also said: “Nothing is foreclosed as to any other copies flowing from library copies for uses other than for training LLMs.” (32). Training copies were fair use. But all the library copies had to go to trial.
And, if that were not clear enough, after the proposed class settlement was reached and later preliminarily approved, Judge Alsup himself clarified in a proposed summary of the lawsuit that Anthropic could have raised a fair use defense again on the pirated library copies at trial, and a jury could have found in Anthropic’s favor on fair use, as noted in the highlighted passages below:

***

But what about the Bartz dictum Magistrate Judge Wang quotes?
In footnote 20 of her opinion, Judge Wang quotes this following dictum from Judge Alsup’s Bartz decision:

But this passage is just dictum. Judge Alsup expressly declined to reach that decision and based his holding on the more limited view that the “permanent library copies” were retained by Anthropic and were not fair use, even though when the copies were used for the further purpose of training, they were fair use. As Judge Alsup explained:

Thus, based on Judge Alsup’s holding that the AI training copies were fair use but the permanent library copies were separate and not fair use (on summary judgment), OpenAI’s suggestion that the deletion of datasets (versus permanent retention of them) would be more consistent with fair use in the training copies under Bartz is correct.
A fair use in the training copies would not shield infringement claims against datasets that were permanently stored in an internal library even beyond the actual training. In other words, the permanent storage after training was completed would need to be justified by an additional fair use purpose. Otherwise, it might be infringement under Bartz.
In light of Judge Alsup’s fair use order and his subsequent proposed summary of the case and the availability of Anthropic’s fair use defense for the library copies at trial, OpenAI has the correct understanding of Judge Alsup’s ruling, in my view. Under Bartz, AI companies have a better chance of fair use in AI training if they do not retain the copies from datasets forever or indefinitely after the training is done. Building a permanent internal library was not fair in Bartz.
Thus, an AI company’s deletion of datasets — and avoidance of building a permanent library of datasets from copyrighted works — bolsters the company’s fair use defense in training. Of course, that still leaves the big issue whether downloading from shadow libraries is a separate use from training. Judge Alsup concluded it was in Bartz, but Judge Chhabria concluded it was all part of the same purpose for training on the summary judgment record in Kadrey v. Meta.
Judge Alsup’s discussion of Texaco confirms this conclusion
Judge Alsup’s discussion of the central library in American Geophysical Union v. Texaco further supports this reading of Bartz. In the passage below, Judge Alsup found problematic the retention of copies “in a central library even after its transformative use had been completed“:

In short, under Bartz, AI companies risk infringement if they store copies “in a central library even after its transformative use had been completed.”
OpenAI’s letter sounds more limited than Magistrate Judge Wang’s interpretation
Finally, it’s worth noting that OpenAI’s citation of Bartz v. Anthropic seems far more limited than Magistrate Judge Wang’s interpretation of OpenAI’s argument in ECF 615 at 1.
magistrate judge wang’s interpretation of openAi’s letter
Judge Wang asserted: “OpenAI, bizarrely, cites to Judge Alsup’s decision in Bartz v. Anthropic PBC to somehow suggest that downloading pirated copies of books is lawful as long as they are subsequently used for training an LLM. (See ECF 615 at 1 (citing Bartz v. Anthropic PBC, 787 F. Supp. 3d 1007, 1026 (2025)).”
what openai’s letter says
But OpenAI’s letter only stated this: “Plaintiffs have already obtained non-privileged facts on the acquisition and use of the Books1 and Books2 datasets. And OpenAI has provided discovery relating to the fact that it removed those datasets after deciding not to use them to train LLMs. Cf. Bartz v. Anthropic PBC, 2025 WL 1741691, at *8 (N.D. Cal. June 23, 2025) (faulting Anthropic for “retain[ing]” book copies “even after deciding it would not use them to train any LLM” (emphasis in original)).”
That’s the only mention of Bartz in OpenAI’s letter in a Cf. citation, using a parenthetical.
I don’t think OpenAI’s letter “suggest[s] that downloading pirated copies of books is lawful as long as they are subsequently used for training an LLM.” The letter makes no argument on the lawfulness of the initial downloading at all. Instead, the letter focuses on the issue of waiver of attorney-client privilege — what is at issue — and the disputed reasons that OpenAI deleted Books1 and Books2.
OpenAI’s parenthetical summary of the point from Bartz is accurate: On the motion for summary judgment of fair use by Anthropic, Judge Alsup did “fault[] Anthropic for ‘retain[ing]” book copies “even after deciding it would not use them to train any LLM” (emphasis in original)). As explained above, Anthropic’s retention (non-deletion) of copies not used for training or even after training is what doomed one half of Anthropic’s fair use defense.
Put simply, Anthropic lost on fair use in downloading copies from shadow libraries and permanently storing them in a general library. That’s what Bartz holds.
Related Stories