In cases filed by Darius H. James, Together Computer, Cerebras may raise issue whether training AI models with copyrighted works in academic research is fair use

Book author Darius H. James filed 3 copyright lawuits against 3 different defendants: Together Computer, Cerebras Systems, and Snowflake.

In their respective Joint Case Management Statements, both Together Computer and Cerebras Systems may have just raised an important legal question: Is training AI models with copyrighted works in academic research a fair use?

Here’s the relevant part of Together Computer’s statement related to its compiling the Red Pajama dataset with researchers from Stanford University, Universite de Montreal, ETH Zurich, and Ontocord.ai.

And here’s Cerebras System’s relevant statement:

My law review article, “Fair Use and the Origin of AI Training”

In my law review article “Fair Use and the Origin of AI Training,” I explain why courts should consider AI training with copyrighted works at academic institutions as entailing a transformative or different purpose.

I also argue that the courts in the AI copyright litigation should answer this question, even if the case involves an AI company engaged in commercial development of AI models.

Here’s a flavor of my analysis:

Put simply, asking whether using copyrighted works to train AI models at academic institutions serves a transformative purpose will assist the courts in understanding why the training is taking place. Is it transformative? Does it provide a public benefit?

The issue of transformativeness is distinct from the (non)commerciality of the use.

So, if AI training is transformative in purpose when conducted at academic institutions, it should be transformative in purpose when conducted at for-profit companies. In both settings, the goal is to research and develop a better AI model.

Chat GPT Is Eating the World