The Trial of AI: preliminary thoughts on the copyright claims in Sarah Andersen v. Stability AI, Midjourney + DeviantArt

On Friday, attorneys for three artists (Sarah Andersen, Kelley McKernan, and Karla Ortiz) filed a class action against three companies that offer AI image generators that enable users to create art through text commands. (DALL-E developed by OpenAI is probably the most famous AI image generator, but it wasn’t named in this lawsuit.) You can download the Complaint here. Today, we offer some preliminary thoughts on the first two copyright claims.

Direct infringement claim

Count I Direct Infringement Claim: The AI generators allegedly used copyrighted works without permission during machine learning from a dataset consisting of unlicensed copies of copyrighted works including of the plaintiffs’.

Mark Lemley and Bryan Casey argue that AI’s use of copyrighted works to engage in machine learning should be treated as fair use (or “fair learning”) if for non-expressive purposes. But they also acknowledge that courts likely “won’t be so sympathetic to machine copying.” Their article provides reason in favor of their approach to fair (machine) learning.

The Dataset of Copyrighted Images Collected Without Licenses: The Complaint alleges that AI got access to the copyrighted works of the plaintiff class through web scraping: “The training data for all AI Image Products are collected via web scraping. For example, the training data for Stable Diffusion—a database of billions of captione images—was collected via web scraping.” Defendant “Stability paid LAION to create LAION-5B, a new dataset of 5.85 billion Training Images—more than 14 times bigger than LAION-400M…. The LAION-Aesthetics dataset is heavily reliant on scraping and copying images from commercial image-hosting services: according to one study, 47% of the images in the dataset were scraped from only 100 web domains. The sources of some of the copies and scrapes are stock-image sites, including Getty Images, Shutterstock, and Adobe Stock, as well as shopping sites (like Shopify, Pinterest, Wix, and Squarespace). Significantly, websites featuring user-generated content were a huge source of images, including sites like Smugmug, Flickr, Wikimedia, Tumblr, and DeviantArt.”

Federal courts have recognized that unauthorized copying of works to create an internal database that is used for a transformative purpose–Google’s image search, Google’s cache system, Google Book Search, Turnitin’s plagiarism detection software–constitute permissible fair use. One might analogize the learning dataset utilized by AI to these past database cases.

Justice Breyer’s opinion in Google v. Oracle, which recognized that Google’s use of Oracle’s declaring code (that helped to organize programs in Java) had a transformative purpose under the first factor of fair use, is also helpful here:

“[S]ince virtually any unauthorized use of a copyrighted computer program (say, for teaching or research) would do the same, to stop here would severely limit the scope of fair use in the functional context of computer programs. Rather, in determining whether a use is ‘transformative,’ we must go further and examine the copying’s more specifically described ‘purpose[s]’ and ‘character.’ 17 U. S. C.§107(1).

“Here Google’s use of the Sun Java API seeks to create new products. It seeks to expand the use and usefulness of Android-based smartphones. Its new product offers programmers a highly creative and innovative tool for a smartphone environment. To the extent that Google used parts of the Sun Java API to create a new platform that could be readily used by programmers, its use was consistent with that creative ‘progress’ that is the basic constitutional objective of copyright itself. Cf. Feist, 499 U. S., at 349–350 (“The primary objective of copyright is not to reward the labor of authors, but ‘[t]o promote the Progress of Science and useful Arts’” (quoting U. S. Const., Art. I, §8, cl. 8)).”

One might argue that AI image generators provide “a highly creative and innovative tool,” or a new platform that fosters creativity of users, aided by AI, to create new works. On the other hand, one might argue that the AI dataset is different from the databases for image search, caching, book search, and plagiarism detection because the AI generators are producing new derivative works based on the copyrighted works, which leads to the second issue of alleged user infringement discussed below. (Whether or not the AI-generative works are substantially similar to existing copyrighted works that the AI relied on from its dataset is a factual question that depends on each alleged infringement of a work. The Complaint doesn’t provide a comparison of the works with specificity.) The Supreme Court is revisiting the test for transformativeness in the Andy Warhol Foundation v. Goldsmith case. With Justice Breyer’s retirement, it’s possible the Court takes a more limited approach to the issue than Google v. Oracle.

Vicarious infringement claim

Count II Vicarious Infringement Claim: The users of the AI generators allegedly created derivative works including “in the style of artists” such as the plaintiffs, based on their copyrighted works: “Individuals have used AI Image Products to create works using the names of Plaintiffs and the Class in prompts and passed those works off as original works by the artist whose name was used in the prompt. Such individuals are referred to herein as ‘Imposters’ By using a particular artist’s name, Imposters can cause the AI Image Product to rely more heavily on that artist’s prior works to create images that can pass as original works by that artist. These output images are referred to herein as ‘Fakes.’”

We will have to wait and see the specific works the plaintiffs alleged were infringed to see how similar they are to the allegedly infringing AI generative works. But case law does recognize that artistic style can be copyrighted. See Malden Mills, Inc. v. Regency Mills, Inc., 626 F.2d 1112 (2d Cir. 1980) (“The two designs are of such likeness with regard to subject matter, style of representation, shading, composition, relative size and placement of components, and mood as to obviously substantially similar.”); Steinberg v. Columbia Pictures Indus., 663 F. Supp. 706 (S.D.N.Y. 1987) (“style is one ingredient of ‘expression’). Of course, some styles are commonplace, lack originality, or are unprotected ideas. Jewelry 10, Inc. v. Elegance Trading Co., 1991 WL 144151 (S.D.N.Y. 1991) (“a detailed copying which takes not only the stylistic idea but the manner or details of execution will be found to infringe.”) And, even if a style is protected, the scope of protection will be key.

If the defendants are enabling their users to write textual commands to create works “in the style of Sarah Andersen or (other artist in the plaintiff class)” that results in substantially similar style of works, then there’s likely to be at least a prima facie claim of infringement. In a New York Times article, Sarah Andersen provided an example of an AI-generated image that she created using an AI image generator that does seem quite similar in style to hers. But ultimately we need to compare the plaintiffs’ works to the alleged derivative works generated by the defendants’ platforms.

*None of the article is legal advice.

Chat GPT Is Eating the World