Chat GPT Is Eating the World

DeepSeek new paper: model uses images of text for vision tokens instead of text tokens. Optical compression might cut costs, computation needed

October 22, 2025

DeepSeek, based in China, is innovating again.

In a fascinating paper posted on Github, Deepseek-OCR: Contexts Optical Compression, DeepSeek uses a new method of “compressing long contexts via optical 2D mapping.” Instead of text tokens, the model will use image or vision tokens of the text. Basically, like taking a screenshot of a page of text.

As Rohan Paul tweeted: “Instead of feeding an LLM thousands of text tokens, it turns long text into an image, encodes that image into a small set of vision tokens, then lets a decoder reconstruct the text.”

Excerpt

Comments from Rohan Paul:

👨‍🔧 Inside the smart design of DeepSeek OCR

DeepSeek-OCR looks like just another OCR model at first glance, something that reads text from images. But it’s not just that.

What they really built is a new way for AI models to store and handle information.

Normally, when AI reads… pic.twitter.com/c4CppB6ZLv
— Rohan Paul (@rohanpaul_ai) October 21, 2025

Chat GPT Is Eating the World

DeepSeek new paper: model uses images of text for vision tokens instead of text tokens. Optical compression might cut costs, computation needed

Like this:

Leave a ReplyCancel reply

DeepSeek new paper: model uses images of text for vision tokens instead of text tokens. Optical compression might cut costs, computation needed

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Chat GPT Is Eating the World