, ,

DeepSeek new paper: model uses images of text for vision tokens instead of text tokens. Optical compression might cut costs, computation needed

DeepSeek, based in China, is innovating again.

In a fascinating paper posted on Github, Deepseek-OCR: Contexts Optical Compression, DeepSeek uses a new method of “compressing long contexts via optical 2D mapping.” Instead of text tokens, the model will use image or vision tokens of the text. Basically, like taking a screenshot of a page of text.

As Rohan Paul tweeted: “Instead of feeding an LLM thousands of text tokens, it turns long text into an image, encodes that image into a small set of vision tokens, then lets a decoder reconstruct the text.”

Excerpt

Comments from Rohan Paul:

Leave a Reply


Discover more from Chat GPT Is Eating the World

Subscribe now to keep reading and get access to the full archive.

Continue reading