Fascinating research paper refuting the notion that scaling of datasets has diminishing returns in the performance of large language models.
While that might be true for simple “single-step” tasks, it is not for more complex “long horizon” tasks, according to the following researchers (Akshit Sinha, Arvindh Arun, Shashwat Goel, Steffen Staab, Jonas Geiping), who posted a preprint version of their article:


If you look at the bottom row far left graph, you can see the task length increases in near linear fashion with the model size (billions of parameters).
I’m fascinated by scaling and wrote about it in my forthcoming law review article on fair use. Intuitively, scaling makes a lot of sense. The more data, the better the model. Just as a human would learn more with more, the model can, too. But, on another level, it’s still hard to believe a model can figure it all out on its own.
