This is a significant leap. Claude can now READ images, graphs and other visual elements inside a document.
Current document based AI systems break down PDFs into pure text using OCR, then chunk them for RAG. But real documents are more visual and not only texty. There is also a certain relationship between the text and tables, charts, layouts, and figures that carry crucial meaning.
Most likely, they are using ColPali algorithm (https://lnkd.in/gDvEx_A4) which processes documents by dividing them into small visual patches, preserving both textual and visual information without losing their relationships. Something similar to how a human reads.
This is quite good for real-world documents like financial reports, papers, or technical docs. And they have it available in their APIs as well. So, anyone can plug it into their application.
Claude can now view images within a PDF, in addition to text. Enable the feature preview to get started: https://claude.ai/new?fp=1. This helps Claude 3.5 Sonnet more accurately understand complex documents, such as those laden with charts or graphics.
The Anthropic API now also supports PDF inputs in beta: https://lnkd.in/emvau9Ez