On Dec. 21, 2023, the Stanford Internet Observatory published its study in which it found more than 3,200 images of suspected child sexual abuse material (CSAM) in the LAION-5B dataset, which contains 5.85 billion image-text pairs scraped from the Internet.
LAION (Large-scale Artificial Intelligence Open Network), a nonprofit, said: “In an abundance of caution, we have taken down the LAION datasets to ensure they are safe before republishing them.”
This large dataset was used to train AI image generators. Stability AI said in a statement that it uses filters to “remove unsafe content from reaching the models. By removing that content before it ever reaches the model, we can help to prevent the model from generating unsafe content.”