Jack Brzezinski’s Post

View profile for Jack Brzezinski

Senior AI Architect, Generative AI @UCSD

Multimodal document parsing is a huge step forward in the #GenAI space. This capability extends to mathematical formulas, tables, and graphs. It is hard to overestimate this innovation.

View organization page for LlamaIndex

241,420 followers

🔥 Introducing GPT-4o + LlamaParse 🔥 GPT-4o is the state-of-the-art model for multimodal understanding, meaning it also has state-of-the-art document parsing capabilities. LlamaParse is the platform for enabling LLM-powered parsing - it uses LLMs to extract documents from any file type in a performant, reliable fashion, offering state-of-the-art response quality for advanced document RAG. We’re excited to offer GPT-4o as an explicit option in LlamaParse, which will use GPT-4o for extraction per page into markdown, instead of using our default parsers/models. Why:  - GPT-4o is very good at parsing very complex documents into well-formatted markdown. Oftentimes it outperforms our default approaches. - This means that it can turn documents with very complex tables / charts into clean, indexable data for your RAG pipeline - higher response quality, lower hallucinations 📈 Tradeoffs / Caveats ⚠️:  - It’s expensive 💵: Due to the cost of inference, using GPT-4o is currently $0.60 USD per page (while by default LlamaParse is $0.003 per page). This cost can spike quickly - beware!  - You can specify your OpenAI key, in which case the marginal cost per page goes down to 0.3c per page.  - This is a beta feature. Given the cost and latency, use this with caution! If you want to give this a shot, signup for an account and check out our UI: https://lnkd.in/gbkxQAQd Notebook: https://lnkd.in/grwUVr-G

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics