On October 2, 2025, Magistrate Judge Sallie Kim clarified the scope of discovery related to the Pile dataset allowed.
Judge Kim had previously allowed: “the motion to compel discovery regarding the use of The Pile to train datasets in the Nemo Megatron family beyond the four large language models specifically named in the Complaint.”
Judge Kim has now clarified that this discovery does not apply to “all large language models developed (alone or jointly) by [Defendant] using a prototype or version of the Nemo Megatron framework and trained on The Pile, including but not limited to the MT-NLG model, without any further arbitrary restrictions.”
In other words, if the LLM is not in the Nemo Megatron family, then it’s not discoverable even if it may have been earlier developed using a prototype or version of the Nemo Megatron framework. And the MT-NLG model is not part of the lawsuit or discovery.

