, , ,

Magistrate Judge Sallie Kim clarifies Nazemian’s discovery of The Pile dataset from NVIDIA

On October 2, 2025, Magistrate Judge Sallie Kim clarified the scope of discovery related to the Pile dataset allowed.

Judge Kim had previously allowed: “the motion to compel discovery regarding the use of The Pile to train datasets in the Nemo Megatron family beyond the four large language models specifically named in the Complaint.”

Judge Kim has now clarified that this discovery does not apply to “all large language models developed (alone or jointly) by [Defendant] using a prototype or version of the Nemo Megatron framework and trained on The Pile, including but not limited to the MT-NLG model, without any further arbitrary restrictions.”

In other words, if the LLM is not in the Nemo Megatron family, then it’s not discoverable even if it may have been earlier developed using a prototype or version of the Nemo Megatron framework. And the MT-NLG model is not part of the lawsuit or discovery.

Leave a Reply


Discover more from Chat GPT Is Eating the World

Subscribe now to keep reading and get access to the full archive.

Continue reading