, , ,

OpenAI’s proposed reply reveals more on Books 1, 2 datasets used to train GPT-3

In its proposed reply to support its Rule 72(a) objection to Magistrate Judge Wang’s ruling that OpenAI waived its attorney-client privilege, OpenAI gives us a fuller description of the timeline in which OpenAI used Books 1 and 2 datasets:

The Book Author Class Plaintiffs also describe their own timeline in their brief:

Our Own Timeline

Earlier, we created our own timeline. Nearly all of it appears to be accurate.

However, (1) if the Plaintiffs’ description is correct, the initial downloading in 2018 might have been a different person than Benjamin Mann, who is first named in the unredacted activity in October 2019 above. It still could have been Mann (we can’t see the redacted portions) based on his deposition in Bartz v. Anthropic. (2) The training of GPT-3 had to have occurred at least by 2020 based on OpenAI’s publication of its research paper first posted online on Thu, 28 May 2020 17:29:03 UTC (6,995 KB).

Timeline of Alleged Infringements by OpenAI and Key Events

I have labeled the 2018 downloading “Alleged Willful Infringement 1” and the circa 2020 training “Alleged Willful Infringement 2” for clarity.

The training with Books 1 and 2 may have spanned the period from 2018 – late 2021, but the precise timeline is a bit unclear from the publicly filed briefs. According to the Gratz letter for OpenAI, they were “deleted in or around mid-2022.”

This timeline shows that OpenAI’s deletion of Books 1 and 2 occurred some time after OpenAI’s downloading of the datasets and later use of them to train GPT-3 in 2020. Indeed, Plaintiffs’ own brief states it was “[t]hree years later,” meaning after the downloading of the datasets and Mann started to use the datasets:

Related Stories

Leave a Reply


Discover more from Chat GPT Is Eating the World

Subscribe now to keep reading and get access to the full archive.

Continue reading