Jack Brzezinski’s Post

Senior AI Architect, Generative AI @UCSD

9mo

Multimodal document parsing is a huge step forward in the #GenAI space. This capability extends to mathematical formulas, tables, and graphs. It is hard to overestimate this innovation.

LlamaIndex

241,420 followers

9mo

🔥 Introducing GPT-4o + LlamaParse 🔥 GPT-4o is the state-of-the-art model for multimodal understanding, meaning it also has state-of-the-art document parsing capabilities. LlamaParse is the platform for enabling LLM-powered parsing - it uses LLMs to extract documents from any file type in a performant, reliable fashion, offering state-of-the-art response quality for advanced document RAG. We’re excited to offer GPT-4o as an explicit option in LlamaParse, which will use GPT-4o for extraction per page into markdown, instead of using our default parsers/models. Why: - GPT-4o is very good at parsing very complex documents into well-formatted markdown. Oftentimes it outperforms our default approaches. - This means that it can turn documents with very complex tables / charts into clean, indexable data for your RAG pipeline - higher response quality, lower hallucinations 📈 Tradeoffs / Caveats ⚠️: - It’s expensive 💵: Due to the cost of inference, using GPT-4o is currently $0.60 USD per page (while by default LlamaParse is $0.003 per page). This cost can spike quickly - beware! - You can specify your OpenAI key, in which case the marginal cost per page goes down to 0.3c per page. - This is a beta feature. Given the cost and latency, use this with caution! If you want to give this a shot, signup for an account and check out our UI: https://lnkd.in/gbkxQAQd Notebook: https://lnkd.in/grwUVr-G

To view or add a comment, sign in

More Relevant Posts

LlamaIndex

241,420 followers
9mo
Report this post
🔥 Introducing GPT-4o + LlamaParse 🔥 GPT-4o is the state-of-the-art model for multimodal understanding, meaning it also has state-of-the-art document parsing capabilities. LlamaParse is the platform for enabling LLM-powered parsing - it uses LLMs to extract documents from any file type in a performant, reliable fashion, offering state-of-the-art response quality for advanced document RAG. We’re excited to offer GPT-4o as an explicit option in LlamaParse, which will use GPT-4o for extraction per page into markdown, instead of using our default parsers/models. Why: - GPT-4o is very good at parsing very complex documents into well-formatted markdown. Oftentimes it outperforms our default approaches. - This means that it can turn documents with very complex tables / charts into clean, indexable data for your RAG pipeline - higher response quality, lower hallucinations 📈 Tradeoffs / Caveats ⚠️: - It’s expensive 💵: Due to the cost of inference, using GPT-4o is currently $0.60 USD per page (while by default LlamaParse is $0.003 per page). This cost can spike quickly - beware! - You can specify your OpenAI key, in which case the marginal cost per page goes down to 0.3c per page. - This is a beta feature. Given the cost and latency, use this with caution! If you want to give this a shot, signup for an account and check out our UI: https://lnkd.in/gbkxQAQd Notebook: https://lnkd.in/grwUVr-G
24 Comments
Like Comment
To view or add a comment, sign in
Pierre-Loic Doulcet
9mo
Report this post
This is a glimpse into the future of Documents/PDF parsing. GPT-4o is really good at understanding the content of document and parsing them into a structured form. I think that in a few year we will stop use custom made parser for document and only feed them into a LLM / Multimodal model, and get perfect results. No more OCR, no more custom made parsers... And it's already working pretty well today, although a bit pricey / slow. With this release of LlamaParse we let you try the future of document parsing, you should try it! It can handle complexe chart and table quite well. We are of course still hard working on our 'traditional' parser and will continue to improve it so you can use it until Large Models catch up in cost and efficiency!
LlamaIndex

241,420 followers
9mo

🔥 Introducing GPT-4o + LlamaParse 🔥 GPT-4o is the state-of-the-art model for multimodal understanding, meaning it also has state-of-the-art document parsing capabilities. LlamaParse is the platform for enabling LLM-powered parsing - it uses LLMs to extract documents from any file type in a performant, reliable fashion, offering state-of-the-art response quality for advanced document RAG. We’re excited to offer GPT-4o as an explicit option in LlamaParse, which will use GPT-4o for extraction per page into markdown, instead of using our default parsers/models. Why: - GPT-4o is very good at parsing very complex documents into well-formatted markdown. Oftentimes it outperforms our default approaches. - This means that it can turn documents with very complex tables / charts into clean, indexable data for your RAG pipeline - higher response quality, lower hallucinations 📈 Tradeoffs / Caveats ⚠️: - It’s expensive 💵: Due to the cost of inference, using GPT-4o is currently $0.60 USD per page (while by default LlamaParse is $0.003 per page). This cost can spike quickly - beware! - You can specify your OpenAI key, in which case the marginal cost per page goes down to 0.3c per page. - This is a beta feature. Given the cost and latency, use this with caution! If you want to give this a shot, signup for an account and check out our UI: https://lnkd.in/gbkxQAQd Notebook: https://lnkd.in/grwUVr-G
10 Comments
Like Comment
To view or add a comment, sign in
Krzysztof Karaszewski

AI & Process Automation for Business 🤖 Beyond the Hype and Buzzwords
5mo
Report this post
While very impressive, the o1 model still has several limitations that you should know of, so before jumping on the OpenAI o1 hype train please read the post below. 1. It's a model designed for short but complex reasoning tasks. Prompts need to be concise and direct. 2. The o1 model is not well-suited for RAG or long-context tasks, limiting its usability in real-world business applications like agents or chatbots. 3. While the benchmarks are impressive, it is proven that it may still struggle with answering how many "r"s are in the word "Strawberry." Note that o1 was only compared mainly with GPT-4o, and the MMLU result wasn't significantly better (88 vs 92). 4. It's quite expensive by today's standards ($15 input/$60 output per million tokens), but that will probably change quickly ;) 5. As Andrew Ng demonstrated, providing more examples to GPT-3.5 and using chain-of-thought (CoT) methods allow that model to perform on par with GPT-4. What OpenAI has done is automate that process by adding the "thinking box" section. Thus, with some clever prompt engineering, you can achieve o1-level answers for your specific use case. 6. If you read between the lines of the "Hiding the Chains of Thought" section in the o1 press release, OpenAI suggests that the model responsible for reasoning is uncensored, which is why they are hiding the thinking section. This may explain why the model seems to perform better. 7. o1 was benchmarked primarily against pain-prompted GPT-4o, which isn't a fair comparison. I bet GPT-4o would perform much better if even a simple generic CoT instruction was added to the prompt. I'm looking forward to more benchmarks, especially those comparing o1 with other SOTA models using CoT. 8. o1 is text-only, so it cannot analyze images, videos, or voice. So why the "o"? It's not "omni", so is it for OpenAI this time. 9. As stated in the press release, "... the enhanced reasoning capabilities may be particularly useful if you’re tackling complex problems in science, coding, math, and similar fields.". Agree, however, what concerns me is the short effective context window. It might help solve a problem in science for which it already has data in the model, but it may not be as useful for analyzing new research papers or drawing conclusions from fresh data. 10. Considering OpenAI's history of making false promises, such as with video/voice mode, and hiding certain technical facts - like GPT-4o and GPT-4o-mini vision models being the same - I would advise restraint. There might be something they're not telling us :) While perhaps not as revolutionary as many hoped or claimed, OpenAI's o1 release marks another important milestone in AI history. Even if other companies catch up quickly, OpenAI has set the bar very high, and any new model will now be compared to o1. #OpenAI #LLM #GPT4o #o1 #AgenciAI
3 Comments
Like Comment
To view or add a comment, sign in
Val Andrei Fajardo

Applied ML Scientist at Vector Institute | Ex Founding Engineer at LlamaIndex | PhD | CIPT
9mo Edited
Report this post
Use GPT-4o for Data Extraction from Images! 🏞️🗂️ OpenAI just released GPT-4o, where the "o" stands for omni, alluding to the fact that this latest and greatest model is capable of working in text, vision, and audio modalities. I recently shared a post on how one could use LlamaIndex's PydanticProgram to perform data extraction into structured outputs (i.e., Pydantic BaseModels) from text data. With the release of GPT-4o, I wondered how it would fare in a similar data extraction task, but this time over images. TASK: Data Extraction From PaperCards 💃 =============================== Those who have followed my posts, are likely familiar with the PaperCards that I release from time to time. In any case, PaperCards are merely visualizations that organize a particular summarization of the research papers that I have read. More specifically, a PaperCard provides: - Main Contribution - Insights (or main motivation of the research paper) - Main Results - Tech Bits (some illustration of how the main algorithm works) The title, authors, year, and arxiv id are also displayed in these PaperCards. RESULTS 🌟 ========= 1. GPT-4o is faster (~180% faster on average than turbo) and fails less (0 times out of 35) than GPT-4v (14 failures) and GPT-4turbo (1 failure) 2. GPT-4o yields better data extraction results than GPT-4v and GPT-4turbo 3. GPT-4o was very good at extracting facts from the PaperCard: Title, Author, Year, and headline statements of the Main Results section 4. GPT-4v and GPT-4turbo often hallucinated the main results and sometimes the authors 5. Results with GPT-4o can probably be improved using better prompting especially for extracting data from Insights section, but also for describing Tech Bits section. LINKS ==== 📓 Notebook: https://lnkd.in/e3UavGgq
25 Comments
Like Comment
To view or add a comment, sign in
Karen Ryan

Co-Founder and President, Digital^Shift
5mo
Report this post
This is an excellent presentation of why we shouldn't be brand or hype driven in GenAI. I have a client that is using 3 versions of GPT, because the lowest cost option does an adequate job for one of their use cases. Cost is a big issue in AI in business and we shouldn't ignore it. The right tool for the right job is even more important. Thanks for a very timely post Tobias.
Tobias Zwingmann Tobias Zwingmann is an Influencer

Managing Partner | O'Reilly Author | LinkedIn Instructor | I find & realize profitable Al opportunities in your business. Sharing updates & learnings along the way.
5mo

GPT-o1 vs. GPT4o: It's important to understand that OpenAI's new o1 model is not necessarily better than GPT4o, but designed for different purposes. Let me explain: GPT-o1 - with its "advanced reasoning capabilities" - is a major breakthrough for tackling tasks that LLMs weren't so good at before: math, temporal understanding, logic - complex things where you need to think through something slowly, in steps. However, GPT-o1 isn't necessarily better across all fronts. Interestingly, it can perform worse in areas where LLMs were typically quite strong. Code completion, for example: As you can see on this benchmark table, o1 ranks behind Claude-3.5 Sonnet, and even behind GPT4o! (Same for creative writing, but that's just my anecdotal evidence). At the same time, GPT-o1 is ALWAYS more expensive than anything else because it generates more output - called reasoning tokens - before arriving at an answer. Clearly, this isn't something you want to have an easy conversation with. Most chatbot use cases won't work. Instead, GPT-o1 seems to work EXTREMELY well in "here's everything I know, process this carefully, and whenever you're ready get back to me with an answer–oh also tell me why" scenarios. As AI becomes more advanced, knowing which model to use for different business applications will become a key skill. GPT-o1 may not be the right tool for every job, but in the right situation, it could be a game changer for use cases that were previously pretty much impossible.
Like Comment
To view or add a comment, sign in
Alexander Chukovski

Building Niche Job Boards (Web3 & AI) | HR Tech Consultant | Expertise in Job Sites SEO, Google Jobs, NLP & AI Solutions, Job Scraping | HR Tech Blogger
7mo
Report this post
A couple of thoughts on the new GPT-4o-mini regarding zero-shot classification and extraction tasks. Up until now, there was a relatively simple strategy. If you have a high volume of transactions, you try to fine-tune GPT-3.5 to keep the cost under control. Fine-tuning is a time-intensive process, so it does make sense to look into this only for large volumes, and a fine-tuned GPT-3.5 is still more than 50% cheaper than GPT-4o. Also, fine-tuned models usually work well with short prompts, reducing costs, especially in many transactions. For low volume, GPT-4o or even GPT-4 is better—you can start immediately, and the models will most likely be good enough for zero-shot classification. The definition of high/low volume depends on your budget, the duration of how long you intend to run the process and the cost of the team that would do the fine-tuning. So, we now have a new model significantly cheaper than any of the options above. Well, you cannot fine-tune it, but at this cost, there are a few options: 1. You can break down complex tasks running on the expensive or fine-tuned models and run them on the new model. 2. The new model allows you to use few-shot examples—even though the prompts become longer, the cost will still be exponentially lower. 3. You can try the cheaper model, and if your validation fails, always upscale to the more expensive models. If you use OpenAI, I would evaluate existing processes and test the new model using the options above. It is enough for your use case, and you will likely look at a 5-10x decrease in the current costs.
Like Comment
To view or add a comment, sign in
Ivan Neshkov

Managing Director @ UMELLE | Custom Insurtech Solutions
5mo
Report this post
It's important to know that AI is not a "one size fits all" solution. As models continue to develop, they will continue to specialize. It is likely, that in the future we will get a model that raises the bar overall, but we will take many steps to get there. Solutions like Hugging Face pretrained models, which allow for hyper specification, using the strengths of different models will continue to be the way to build powerful custom AI solutions. Thanks to Tobias Zwingmann
Tobias Zwingmann Tobias Zwingmann is an Influencer

Managing Partner | O'Reilly Author | LinkedIn Instructor | I find & realize profitable Al opportunities in your business. Sharing updates & learnings along the way.
5mo

GPT-o1 vs. GPT4o: It's important to understand that OpenAI's new o1 model is not necessarily better than GPT4o, but designed for different purposes. Let me explain: GPT-o1 - with its "advanced reasoning capabilities" - is a major breakthrough for tackling tasks that LLMs weren't so good at before: math, temporal understanding, logic - complex things where you need to think through something slowly, in steps. However, GPT-o1 isn't necessarily better across all fronts. Interestingly, it can perform worse in areas where LLMs were typically quite strong. Code completion, for example: As you can see on this benchmark table, o1 ranks behind Claude-3.5 Sonnet, and even behind GPT4o! (Same for creative writing, but that's just my anecdotal evidence). At the same time, GPT-o1 is ALWAYS more expensive than anything else because it generates more output - called reasoning tokens - before arriving at an answer. Clearly, this isn't something you want to have an easy conversation with. Most chatbot use cases won't work. Instead, GPT-o1 seems to work EXTREMELY well in "here's everything I know, process this carefully, and whenever you're ready get back to me with an answer–oh also tell me why" scenarios. As AI becomes more advanced, knowing which model to use for different business applications will become a key skill. GPT-o1 may not be the right tool for every job, but in the right situation, it could be a game changer for use cases that were previously pretty much impossible.
Like Comment
To view or add a comment, sign in
Manutej Mulaveesala

GenAI Transformation Leader | Author | Enterprise AI Consultant | Innovation Strategist
5mo
Report this post
There is a lot of hype this week about GPT-1o release by OpenAI, but important to understand that this model functions well for DIFFERENT tasks (including reasoning), like Reasoning, math problems, complex logic. It reduces the need for specifying Chain of Thought directly as it is a Fine-tuned version of GPT model. Use with Care and be wary of presumptions about its functionality as "the newest and best" model. More details and breakdown by Tobias Zwingmann below.
Tobias Zwingmann Tobias Zwingmann is an Influencer

Managing Partner | O'Reilly Author | LinkedIn Instructor | I find & realize profitable Al opportunities in your business. Sharing updates & learnings along the way.
5mo

GPT-o1 vs. GPT4o: It's important to understand that OpenAI's new o1 model is not necessarily better than GPT4o, but designed for different purposes. Let me explain: GPT-o1 - with its "advanced reasoning capabilities" - is a major breakthrough for tackling tasks that LLMs weren't so good at before: math, temporal understanding, logic - complex things where you need to think through something slowly, in steps. However, GPT-o1 isn't necessarily better across all fronts. Interestingly, it can perform worse in areas where LLMs were typically quite strong. Code completion, for example: As you can see on this benchmark table, o1 ranks behind Claude-3.5 Sonnet, and even behind GPT4o! (Same for creative writing, but that's just my anecdotal evidence). At the same time, GPT-o1 is ALWAYS more expensive than anything else because it generates more output - called reasoning tokens - before arriving at an answer. Clearly, this isn't something you want to have an easy conversation with. Most chatbot use cases won't work. Instead, GPT-o1 seems to work EXTREMELY well in "here's everything I know, process this carefully, and whenever you're ready get back to me with an answer–oh also tell me why" scenarios. As AI becomes more advanced, knowing which model to use for different business applications will become a key skill. GPT-o1 may not be the right tool for every job, but in the right situation, it could be a game changer for use cases that were previously pretty much impossible.
Like Comment
To view or add a comment, sign in
Martin Bogdanov

Fintech | Banking | Digital Transformation | Innovation | Efficiency | Lending | Investor | Marketing Sales | eCommerce
5mo Edited
Report this post
Latest version of GPT-o1 is here!
Tobias Zwingmann Tobias Zwingmann is an Influencer

Managing Partner | O'Reilly Author | LinkedIn Instructor | I find & realize profitable Al opportunities in your business. Sharing updates & learnings along the way.
5mo

GPT-o1 vs. GPT4o: It's important to understand that OpenAI's new o1 model is not necessarily better than GPT4o, but designed for different purposes. Let me explain: GPT-o1 - with its "advanced reasoning capabilities" - is a major breakthrough for tackling tasks that LLMs weren't so good at before: math, temporal understanding, logic - complex things where you need to think through something slowly, in steps. However, GPT-o1 isn't necessarily better across all fronts. Interestingly, it can perform worse in areas where LLMs were typically quite strong. Code completion, for example: As you can see on this benchmark table, o1 ranks behind Claude-3.5 Sonnet, and even behind GPT4o! (Same for creative writing, but that's just my anecdotal evidence). At the same time, GPT-o1 is ALWAYS more expensive than anything else because it generates more output - called reasoning tokens - before arriving at an answer. Clearly, this isn't something you want to have an easy conversation with. Most chatbot use cases won't work. Instead, GPT-o1 seems to work EXTREMELY well in "here's everything I know, process this carefully, and whenever you're ready get back to me with an answer–oh also tell me why" scenarios. As AI becomes more advanced, knowing which model to use for different business applications will become a key skill. GPT-o1 may not be the right tool for every job, but in the right situation, it could be a game changer for use cases that were previously pretty much impossible.
1 Comment
Like Comment
To view or add a comment, sign in
Matt Wuertz
1mo
Report this post
The OpenAI GPT I created (based on GPT-4o) has over 15,000 characters. The instructions are mostly a set of rules along with sample queries. But sometimes things can get a bit odd. I’m currently working in a real-estate company, and when I ask questions, I don’t want to see an ID for a property. I want to see a name we use publicly for the building. Even though I’ve made various rules for this, sometimes when I add a new data route, the GPT defaults to the ID. One time, it followed the rule to show a building name, but not by fetching it from the database. Instead, it made up the name based on the ID. ID 123 was turned into the name 123 Estates. Nice try, GPT. Nice try. One of the more frustrating aspects is seeing a wrong behavior that the GPT is aware of when asked. “Do you see instructions about looking for the name of the property instead of showing the ID?” “You’re right! I should have done that. I’ll adjust and do that now.” But given that these adjustments are limited to the current session, that doesn’t help anyone else using the GPT in the future. I will then ask how I could improve the instructions to ensure the correct rules are followed. It’s an odd approach – asking code to improve code – and the suggestions don’t always work. There’s a bit of trial and error in the process for me so far, at least around instructions, but it is a powerful tool overall.
Like Comment
To view or add a comment, sign in

694 followers

View Profile Follow

Jack Brzezinski’s Post

More from this author

AI System Architectures: Part 1: Responsibility, Evolving Frameworks, and Technology Stack

A Path Towards AI Regulation: Law as a Notebook - A Computational Essay Platform

Large Machine Learning Generative Models: Technological Revolution vs. Corrosive Social Impacts

Explore topics