LlamaIndex’s Post

View organization page for LlamaIndex, graphic

225,820 followers

7mo

🔥 Introducing GPT-4o + LlamaParse 🔥 GPT-4o is the state-of-the-art model for multimodal understanding, meaning it also has state-of-the-art document parsing capabilities. LlamaParse is the platform for enabling LLM-powered parsing - it uses LLMs to extract documents from any file type in a performant, reliable fashion, offering state-of-the-art response quality for advanced document RAG. We’re excited to offer GPT-4o as an explicit option in LlamaParse, which will use GPT-4o for extraction per page into markdown, instead of using our default parsers/models. Why: - GPT-4o is very good at parsing very complex documents into well-formatted markdown. Oftentimes it outperforms our default approaches. - This means that it can turn documents with very complex tables / charts into clean, indexable data for your RAG pipeline - higher response quality, lower hallucinations 📈 Tradeoffs / Caveats ⚠️: - It’s expensive 💵: Due to the cost of inference, using GPT-4o is currently $0.60 USD per page (while by default LlamaParse is $0.003 per page). This cost can spike quickly - beware! - You can specify your OpenAI key, in which case the marginal cost per page goes down to 0.3c per page. - This is a beta feature. Given the cost and latency, use this with caution! If you want to give this a shot, signup for an account and check out our UI: https://lnkd.in/gbkxQAQd Notebook: https://lnkd.in/grwUVr-G

24 Comments

Ranjeet Rustogi

co-f @awaydayai, cto @suketv | 5x exit tech nut | ai/web3 consultant | crypto-native | startup advisor, mentor, investor

7mo

Using own key should bring the cost down to 0.3c or $0.3? Because 0.3c is $0.003, which is then the same as the cost of the default LlamaParse 🤔 LlamaIndex

1 Reaction

Ibrahim Akhtar

Data Scientist | AI Developer | Game Developer

7mo

Can't wait to experiment around with this model. Once the price goes down a bit ofc.

4 Reactions

Jonny H.

Software Engineer

7mo

Could the two be used in combination to improve accuracy? For example, 4o reviews the llamaparsed output and input page image to check for errors.

Cohorte

7mo

Interesting. Should give it a try.

Matthew Combatti

Simulanics Technologies - AI & ML Systems Engineer - Master Software Developer & Systems Security Expert

7mo

You're welcome for the idea this morning 💡 🙂 🙏

1 Reaction

Jhonnatan Betancourt

Data, AI & Software for Business ⚙️📊 Data engineering & Analytics

7mo

I'm waiting impatiently for GPT 4o🤓

Piyush Sar

MLE I at infocusp innovations | NLP + Multimodal

7mo

Tirthkumar Patel

1 Reaction

Daniel Nicusor Naicu

7mo

woow

See more comments

To view or add a comment, sign in

More Relevant Posts

Pierre-Loic Doulcet
7mo
Report this post
This is a glimpse into the future of Documents/PDF parsing. GPT-4o is really good at understanding the content of document and parsing them into a structured form. I think that in a few year we will stop use custom made parser for document and only feed them into a LLM / Multimodal model, and get perfect results. No more OCR, no more custom made parsers... And it's already working pretty well today, although a bit pricey / slow. With this release of LlamaParse we let you try the future of document parsing, you should try it! It can handle complexe chart and table quite well. We are of course still hard working on our 'traditional' parser and will continue to improve it so you can use it until Large Models catch up in cost and efficiency!
LlamaIndex

225,820 followers
7mo

🔥 Introducing GPT-4o + LlamaParse 🔥 GPT-4o is the state-of-the-art model for multimodal understanding, meaning it also has state-of-the-art document parsing capabilities. LlamaParse is the platform for enabling LLM-powered parsing - it uses LLMs to extract documents from any file type in a performant, reliable fashion, offering state-of-the-art response quality for advanced document RAG. We’re excited to offer GPT-4o as an explicit option in LlamaParse, which will use GPT-4o for extraction per page into markdown, instead of using our default parsers/models. Why: - GPT-4o is very good at parsing very complex documents into well-formatted markdown. Oftentimes it outperforms our default approaches. - This means that it can turn documents with very complex tables / charts into clean, indexable data for your RAG pipeline - higher response quality, lower hallucinations 📈 Tradeoffs / Caveats ⚠️: - It’s expensive 💵: Due to the cost of inference, using GPT-4o is currently $0.60 USD per page (while by default LlamaParse is $0.003 per page). This cost can spike quickly - beware! - You can specify your OpenAI key, in which case the marginal cost per page goes down to 0.3c per page. - This is a beta feature. Given the cost and latency, use this with caution! If you want to give this a shot, signup for an account and check out our UI: https://lnkd.in/gbkxQAQd Notebook: https://lnkd.in/grwUVr-G
10 Comments
Like Comment
To view or add a comment, sign in
Jack Brzezinski

Senior AI Architect, Generative AI @UCSD
7mo
Report this post
Multimodal document parsing is a huge step forward in the #GenAI space. This capability extends to mathematical formulas, tables, and graphs. It is hard to overestimate this innovation.
LlamaIndex

225,820 followers
7mo

🔥 Introducing GPT-4o + LlamaParse 🔥 GPT-4o is the state-of-the-art model for multimodal understanding, meaning it also has state-of-the-art document parsing capabilities. LlamaParse is the platform for enabling LLM-powered parsing - it uses LLMs to extract documents from any file type in a performant, reliable fashion, offering state-of-the-art response quality for advanced document RAG. We’re excited to offer GPT-4o as an explicit option in LlamaParse, which will use GPT-4o for extraction per page into markdown, instead of using our default parsers/models. Why: - GPT-4o is very good at parsing very complex documents into well-formatted markdown. Oftentimes it outperforms our default approaches. - This means that it can turn documents with very complex tables / charts into clean, indexable data for your RAG pipeline - higher response quality, lower hallucinations 📈 Tradeoffs / Caveats ⚠️: - It’s expensive 💵: Due to the cost of inference, using GPT-4o is currently $0.60 USD per page (while by default LlamaParse is $0.003 per page). This cost can spike quickly - beware! - You can specify your OpenAI key, in which case the marginal cost per page goes down to 0.3c per page. - This is a beta feature. Given the cost and latency, use this with caution! If you want to give this a shot, signup for an account and check out our UI: https://lnkd.in/gbkxQAQd Notebook: https://lnkd.in/grwUVr-G
Like Comment
To view or add a comment, sign in
Alexander Chukovski

Building Niche Job Boards (Web3 & AI) | HR Tech Consultant | Expertise in Job Sites SEO, Google Jobs, NLP & AI Solutions, Job Scraping | HR Tech Blogger
4mo
Report this post
A couple of thoughts on the new GPT-4o-mini regarding zero-shot classification and extraction tasks. Up until now, there was a relatively simple strategy. If you have a high volume of transactions, you try to fine-tune GPT-3.5 to keep the cost under control. Fine-tuning is a time-intensive process, so it does make sense to look into this only for large volumes, and a fine-tuned GPT-3.5 is still more than 50% cheaper than GPT-4o. Also, fine-tuned models usually work well with short prompts, reducing costs, especially in many transactions. For low volume, GPT-4o or even GPT-4 is better—you can start immediately, and the models will most likely be good enough for zero-shot classification. The definition of high/low volume depends on your budget, the duration of how long you intend to run the process and the cost of the team that would do the fine-tuning. So, we now have a new model significantly cheaper than any of the options above. Well, you cannot fine-tune it, but at this cost, there are a few options: 1. You can break down complex tasks running on the expensive or fine-tuned models and run them on the new model. 2. The new model allows you to use few-shot examples—even though the prompts become longer, the cost will still be exponentially lower. 3. You can try the cheaper model, and if your validation fails, always upscale to the more expensive models. If you use OpenAI, I would evaluate existing processes and test the new model using the options above. It is enough for your use case, and you will likely look at a 5-10x decrease in the current costs.
Like Comment
To view or add a comment, sign in
Ivan Neshkov

Managing Director @ UMELLE | Custom Insurtech Solutions
2mo
Report this post
It's important to know that AI is not a "one size fits all" solution. As models continue to develop, they will continue to specialize. It is likely, that in the future we will get a model that raises the bar overall, but we will take many steps to get there. Solutions like Hugging Face pretrained models, which allow for hyper specification, using the strengths of different models will continue to be the way to build powerful custom AI solutions. Thanks to Tobias Zwingmann
Tobias Zwingmann Tobias Zwingmann is an Influencer

Helping business leaders build and run AI roadmaps for growth. | O'Reilly Author | LinkedIn Learning Instructor | Keynote Speaker | Managing Partner RAPYD.AI
2mo

GPT-o1 vs. GPT4o: It's important to understand that OpenAI's new o1 model is not necessarily better than GPT4o, but designed for different purposes. Let me explain: GPT-o1 - with its "advanced reasoning capabilities" - is a major breakthrough for tackling tasks that LLMs weren't so good at before: math, temporal understanding, logic - complex things where you need to think through something slowly, in steps. However, GPT-o1 isn't necessarily better across all fronts. Interestingly, it can perform worse in areas where LLMs were typically quite strong. Code completion, for example: As you can see on this benchmark table, o1 ranks behind Claude-3.5 Sonnet, and even behind GPT4o! (Same for creative writing, but that's just my anecdotal evidence). At the same time, GPT-o1 is ALWAYS more expensive than anything else because it generates more output - called reasoning tokens - before arriving at an answer. Clearly, this isn't something you want to have an easy conversation with. Most chatbot use cases won't work. Instead, GPT-o1 seems to work EXTREMELY well in "here's everything I know, process this carefully, and whenever you're ready get back to me with an answer–oh also tell me why" scenarios. As AI becomes more advanced, knowing which model to use for different business applications will become a key skill. GPT-o1 may not be the right tool for every job, but in the right situation, it could be a game changer for use cases that were previously pretty much impossible.
Like Comment
To view or add a comment, sign in
Martin Bogdanov

Fintech | Banking | Digital Transformation | Innovation | Efficiency | Lending | Investor | Marketing Sales | eCommerce
2mo Edited
Report this post
Latest version of GPT-o1 is here!
Tobias Zwingmann Tobias Zwingmann is an Influencer

Helping business leaders build and run AI roadmaps for growth. | O'Reilly Author | LinkedIn Learning Instructor | Keynote Speaker | Managing Partner RAPYD.AI
2mo

GPT-o1 vs. GPT4o: It's important to understand that OpenAI's new o1 model is not necessarily better than GPT4o, but designed for different purposes. Let me explain: GPT-o1 - with its "advanced reasoning capabilities" - is a major breakthrough for tackling tasks that LLMs weren't so good at before: math, temporal understanding, logic - complex things where you need to think through something slowly, in steps. However, GPT-o1 isn't necessarily better across all fronts. Interestingly, it can perform worse in areas where LLMs were typically quite strong. Code completion, for example: As you can see on this benchmark table, o1 ranks behind Claude-3.5 Sonnet, and even behind GPT4o! (Same for creative writing, but that's just my anecdotal evidence). At the same time, GPT-o1 is ALWAYS more expensive than anything else because it generates more output - called reasoning tokens - before arriving at an answer. Clearly, this isn't something you want to have an easy conversation with. Most chatbot use cases won't work. Instead, GPT-o1 seems to work EXTREMELY well in "here's everything I know, process this carefully, and whenever you're ready get back to me with an answer–oh also tell me why" scenarios. As AI becomes more advanced, knowing which model to use for different business applications will become a key skill. GPT-o1 may not be the right tool for every job, but in the right situation, it could be a game changer for use cases that were previously pretty much impossible.
1 Comment
Like Comment
To view or add a comment, sign in
Val Andrei Fajardo

Founding Software/AI Engineer at LlamaIndex | PhD Probability & Stats | CIPT
7mo Edited
Report this post
Use GPT-4o for Data Extraction from Images! 🏞️🗂️ OpenAI just released GPT-4o, where the "o" stands for omni, alluding to the fact that this latest and greatest model is capable of working in text, vision, and audio modalities. I recently shared a post on how one could use LlamaIndex's PydanticProgram to perform data extraction into structured outputs (i.e., Pydantic BaseModels) from text data. With the release of GPT-4o, I wondered how it would fare in a similar data extraction task, but this time over images. TASK: Data Extraction From PaperCards 💃 =============================== Those who have followed my posts, are likely familiar with the PaperCards that I release from time to time. In any case, PaperCards are merely visualizations that organize a particular summarization of the research papers that I have read. More specifically, a PaperCard provides: - Main Contribution - Insights (or main motivation of the research paper) - Main Results - Tech Bits (some illustration of how the main algorithm works) The title, authors, year, and arxiv id are also displayed in these PaperCards. RESULTS 🌟 ========= 1. GPT-4o is faster (~180% faster on average than turbo) and fails less (0 times out of 35) than GPT-4v (14 failures) and GPT-4turbo (1 failure) 2. GPT-4o yields better data extraction results than GPT-4v and GPT-4turbo 3. GPT-4o was very good at extracting facts from the PaperCard: Title, Author, Year, and headline statements of the Main Results section 4. GPT-4v and GPT-4turbo often hallucinated the main results and sometimes the authors 5. Results with GPT-4o can probably be improved using better prompting especially for extracting data from Insights section, but also for describing Tech Bits section. LINKS ==== 📓 Notebook: https://lnkd.in/e3UavGgq
25 Comments
Like Comment
To view or add a comment, sign in
Karen Ryan

Co-Founder and President, Digital^Shift
2mo
Report this post
This is an excellent presentation of why we shouldn't be brand or hype driven in GenAI. I have a client that is using 3 versions of GPT, because the lowest cost option does an adequate job for one of their use cases. Cost is a big issue in AI in business and we shouldn't ignore it. The right tool for the right job is even more important. Thanks for a very timely post Tobias.
Tobias Zwingmann Tobias Zwingmann is an Influencer

Helping business leaders build and run AI roadmaps for growth. | O'Reilly Author | LinkedIn Learning Instructor | Keynote Speaker | Managing Partner RAPYD.AI
2mo

GPT-o1 vs. GPT4o: It's important to understand that OpenAI's new o1 model is not necessarily better than GPT4o, but designed for different purposes. Let me explain: GPT-o1 - with its "advanced reasoning capabilities" - is a major breakthrough for tackling tasks that LLMs weren't so good at before: math, temporal understanding, logic - complex things where you need to think through something slowly, in steps. However, GPT-o1 isn't necessarily better across all fronts. Interestingly, it can perform worse in areas where LLMs were typically quite strong. Code completion, for example: As you can see on this benchmark table, o1 ranks behind Claude-3.5 Sonnet, and even behind GPT4o! (Same for creative writing, but that's just my anecdotal evidence). At the same time, GPT-o1 is ALWAYS more expensive than anything else because it generates more output - called reasoning tokens - before arriving at an answer. Clearly, this isn't something you want to have an easy conversation with. Most chatbot use cases won't work. Instead, GPT-o1 seems to work EXTREMELY well in "here's everything I know, process this carefully, and whenever you're ready get back to me with an answer–oh also tell me why" scenarios. As AI becomes more advanced, knowing which model to use for different business applications will become a key skill. GPT-o1 may not be the right tool for every job, but in the right situation, it could be a game changer for use cases that were previously pretty much impossible.
Like Comment
To view or add a comment, sign in
Manutej Mulaveesala

Generative AI Specialist and Educator | Prompt Engineer | AI Strategist | | Writer | Speaker
2mo
Report this post
There is a lot of hype this week about GPT-1o release by OpenAI, but important to understand that this model functions well for DIFFERENT tasks (including reasoning), like Reasoning, math problems, complex logic. It reduces the need for specifying Chain of Thought directly as it is a Fine-tuned version of GPT model. Use with Care and be wary of presumptions about its functionality as "the newest and best" model. More details and breakdown by Tobias Zwingmann below.
Tobias Zwingmann Tobias Zwingmann is an Influencer

Helping business leaders build and run AI roadmaps for growth. | O'Reilly Author | LinkedIn Learning Instructor | Keynote Speaker | Managing Partner RAPYD.AI
2mo

GPT-o1 vs. GPT4o: It's important to understand that OpenAI's new o1 model is not necessarily better than GPT4o, but designed for different purposes. Let me explain: GPT-o1 - with its "advanced reasoning capabilities" - is a major breakthrough for tackling tasks that LLMs weren't so good at before: math, temporal understanding, logic - complex things where you need to think through something slowly, in steps. However, GPT-o1 isn't necessarily better across all fronts. Interestingly, it can perform worse in areas where LLMs were typically quite strong. Code completion, for example: As you can see on this benchmark table, o1 ranks behind Claude-3.5 Sonnet, and even behind GPT4o! (Same for creative writing, but that's just my anecdotal evidence). At the same time, GPT-o1 is ALWAYS more expensive than anything else because it generates more output - called reasoning tokens - before arriving at an answer. Clearly, this isn't something you want to have an easy conversation with. Most chatbot use cases won't work. Instead, GPT-o1 seems to work EXTREMELY well in "here's everything I know, process this carefully, and whenever you're ready get back to me with an answer–oh also tell me why" scenarios. As AI becomes more advanced, knowing which model to use for different business applications will become a key skill. GPT-o1 may not be the right tool for every job, but in the right situation, it could be a game changer for use cases that were previously pretty much impossible.
Like Comment
To view or add a comment, sign in
Tom Hanley (AWS Leader, Ex Disney, Ex Verizon)

AWS Solutions Architecture | 6 x AWS Certified | Executive Mentor | Speaker | Data Architecture | Portuguese-American Dual Citizen
2mo
Report this post
Moral of the story - use the right model for the right job... and then invest in prompt engineering, fine tuning, and RAG to improve results.
Tobias Zwingmann Tobias Zwingmann is an Influencer

Helping business leaders build and run AI roadmaps for growth. | O'Reilly Author | LinkedIn Learning Instructor | Keynote Speaker | Managing Partner RAPYD.AI
2mo

GPT-o1 vs. GPT4o: It's important to understand that OpenAI's new o1 model is not necessarily better than GPT4o, but designed for different purposes. Let me explain: GPT-o1 - with its "advanced reasoning capabilities" - is a major breakthrough for tackling tasks that LLMs weren't so good at before: math, temporal understanding, logic - complex things where you need to think through something slowly, in steps. However, GPT-o1 isn't necessarily better across all fronts. Interestingly, it can perform worse in areas where LLMs were typically quite strong. Code completion, for example: As you can see on this benchmark table, o1 ranks behind Claude-3.5 Sonnet, and even behind GPT4o! (Same for creative writing, but that's just my anecdotal evidence). At the same time, GPT-o1 is ALWAYS more expensive than anything else because it generates more output - called reasoning tokens - before arriving at an answer. Clearly, this isn't something you want to have an easy conversation with. Most chatbot use cases won't work. Instead, GPT-o1 seems to work EXTREMELY well in "here's everything I know, process this carefully, and whenever you're ready get back to me with an answer–oh also tell me why" scenarios. As AI becomes more advanced, knowing which model to use for different business applications will become a key skill. GPT-o1 may not be the right tool for every job, but in the right situation, it could be a game changer for use cases that were previously pretty much impossible.
Like Comment
To view or add a comment, sign in

225,820 followers

View Profile Follow

LlamaIndex’s Post

More Relevant Posts

Explore topics