A couple of thoughts on the new GPT-4o-mini regarding zero-shot classification and extraction tasks. Up until now, there was a relatively simple strategy. If you have a high volume of transactions, you try to fine-tune GPT-3.5 to keep the cost under control. Fine-tuning is a time-intensive process, so it does make sense to look into this only for large volumes, and a fine-tuned GPT-3.5 is still more than 50% cheaper than GPT-4o. Also, fine-tuned models usually work well with short prompts, reducing costs, especially in many transactions. For low volume, GPT-4o or even GPT-4 is better—you can start immediately, and the models will most likely be good enough for zero-shot classification. The definition of high/low volume depends on your budget, the duration of how long you intend to run the process and the cost of the team that would do the fine-tuning. So, we now have a new model significantly cheaper than any of the options above. Well, you cannot fine-tune it, but at this cost, there are a few options: 1. You can break down complex tasks running on the expensive or fine-tuned models and run them on the new model. 2. The new model allows you to use few-shot examples—even though the prompts become longer, the cost will still be exponentially lower. 3. You can try the cheaper model, and if your validation fails, always upscale to the more expensive models. If you use OpenAI, I would evaluate existing processes and test the new model using the options above. It is enough for your use case, and you will likely look at a 5-10x decrease in the current costs.
Alexander Chukovski’s Post
More Relevant Posts
-
Gretel Navigator's synthetic data generation outperformed OpenAI's GPT-4 by 25.6%, surpassed Llama3-70b by 48.1%, and exceeded human expert-curated data by 73.6%. 🤩 Here's how to use Navigator to create high-quality synthetic data for fine-tuning LLMs. https://lnkd.in/eg2tSFes
To view or add a comment, sign in
-
We compared OpenAI o1 and GPT-4o on price, speed & performance👀. Here is the TLDR of our results and observations: Our TLDR is that if you don't work with a really hard problem that needs that extra *reasoning then you're better off using GPT-4o for similar tasks - 30 times faster, and 3 times cheaper. Some other observations: 1/ Productionizing with o1 will be hard - lots of hidden tokens, so you can't measure how much time a task will take, and you can't debug to learn how it was solved. 2/ Prompting will be different - you shouldn't add additional CoT in your prompts, that can hurt performance. 3/ O1 not useful for many frequent use-cases. You can't use streaming, tool use, temperature with this model so bunch of your work might not apply for this model 4/ O1 will be useful for many new use-cases. Think agentic workflows, where o1 can do the planning and faster models can execute the plan. We also tested the model on: 1/ Ten of the hardest SAT math problems: o1 got 6/10 right, where other models like gpt4o and 3.5 sonnet can't solve more than 2/10. 2/ Customer ticket classification: On 100 tickets, o1 scored 12% better than gpt-4o 3/ Reasoning riddles: For this set of riddles o1 had only small improvement, getting just one more example correct than GPT-4o. Here's the report if you wanna read more: https://lnkd.in/dycQcz4G
Analysis: OpenAI o1 vs GPT-4o
vellum.ai
To view or add a comment, sign in
-
Give Gretel.ai Navigator a try yourself. In a few minutes, you can create your first data set from scratch with a simple prompt, and design and iterate on that data set with our models from there. Great way to test out your LLM idea without overcoming the data access and quality hurdles up front. Synthetic data is your fastest path to an MVP. Use the MVP as your business case’s justifcation to unlock the other resources you need (data, investment, etc) to get to production!
Gretel Navigator's synthetic data generation outperformed OpenAI's GPT-4 by 25.6%, surpassed Llama3-70b by 48.1%, and exceeded human expert-curated data by 73.6%. 🤩 Here's how to use Navigator to create high-quality synthetic data for fine-tuning LLMs. https://lnkd.in/eg2tSFes
How to Create High Quality Synthetic Data for Fine-Tuning LLMs
gretel.ai
To view or add a comment, sign in
-
OpenAI just published the System Card for their latest model GPT-4o. Summary: https://lnkd.in/epumDW9G PDF: https://lnkd.in/ed6u5V84 The System Card outlines the safety measures OpenAI has took before releasing the model. It covers the following areas: 🔍 External Red Teaming OpenAI worked with more than 100 external red-teamers. They were asked to carry out exploratory capability discovery, assess novel potential risks, and stress test mitigations. 📋 Evaluation methodology In addition to the data from red-teaming, they converted a range of evaluation datasets to evaluations for speech-to-speech models using text-to-speech systems. 🚨 Observed safety challenges, evaluations & mitigations For a number of observed safety challenges (e.g. unauthorized voice generation), they provide a description of the risk, the mitigations applied, and results of relevant evaluations where applicable. ☣️ Preparedness framework evaluations OpenAI evaluated GPT-4o in accordance with their Preparedness Framework, which is the safety framework they use to mitigate catastrophic risks (https://lnkd.in/ewdWzvHW). 🔍 Third party assessments OpenAI worked with METR and Apollo Research to evaluate GPT-4o's autonomous capabilities and the associated risks. 🌎 Societal impacts Finally, they discuss discussed a range of possible societal impacts (e.g. disinformation, environmental harms, and loss of control).
To view or add a comment, sign in
-
This comment from the OpenAI keynote today is really moving the goalposts: https://lnkd.in/gAq7_5Mx "16:13 Opening question to Sam: how close are we to AGI? Sam says they're trying to avoid the term now because it has become so over-leaded. Instead they think about their new five steps framework." In less than a year the enthusiasm around the idea that AGI is around the corner at OpenAI has pretty much evaporated. It was always pure fantasy and speculation. He is also correct that much like AI before it the term has lost all meaning. That said LLM models still blow my mind with how neat and useful they can be. Expect more hot air to get let out of the "AI is going to take over the world" balloon with each passing announcement while at the same time watch as AI driven tools and applications continue to grow in use cases and revenue. https://lnkd.in/g4N4CrAk https://lnkd.in/gzUHzkSp https://lnkd.in/gr4tg9Uq
OpenAI DevDay 2024 live blog
simonwillison.net
To view or add a comment, sign in
-
OpenAI o1 is much better than GPT-4o on many tasks, but it's not a replacement. If you are wondering why OpenAI didn't call this model GPT-5 is because they don't expect o1 to take GPT-4o's place. (I doubt OpenAI will ever release something called GPT-5. More about that below.) • GPT-4o specializes in System 1 thinking. • OpenAI o1 specializes in System 2 thinking. We need both, but more importantly, we need a model that knows when to use each system. OpenAI's next major model will incorporate 4o and o1 under a single model. That new model will decide when to use 4o's capabilities or when to go with o1's COT process. If we are lucky, OpenAI will even give us a way to control which "mode" to use on any given request. This model will not happen in 2024. Instead, I think we can expect the following two updates sometime this year: • The final o1 model—we only have access to a "mini" and a "preview." • o1 goes multi-modal—apparently, the model is multi-modal, but these capabilities are disabled. No convergence of o1 and 4o in 2024, and no GPT-5. And speaking of GPT-5, will there ever be a GPT-5 model? I'll stick my neck out and say we will never see one: 1. The GPT-X family of models is not the future. o1 is a new paradigm, and the goal now will be to combine both approaches. 2. It's good to keep the community excited about something big ('Imagine how good GPT-5 will be!") even if it never materializes. This is also great for fundraising. Time will tell, but I think GPT-5 is dead before being born. OpenAI's marketing team will need to keep coming with new names. (By the way, who cares about names?)
To view or add a comment, sign in
-
I tried OpenAI o1 or Strawberry on building financial statements from 210 transactions... I think I killed it. 🤣 It "thought" for about 5 minutes and then gave up and stopped working... Redid it 3 times and same thing happened each time. Background for those who don't understand what I'm talking about - AI has not been good enough at math so far to take a set of transactions and create a proper set of financial statements. GPT 4o would create one, but it wouldn't be close to correct. The problem, GPT 4o can't think about the project, and check its work. On Friday, OpenAI released the latest version OpenAI o1 (code named Strawberry). This new model uses reasoning and computational "thought" to get more complex and complete answers. I figured let's take it for a spin. Yeah, it thinks very nicely and you can see all the work it is doing in the background trying to get a proper set of books... until it doesn't anymore and just gives up. It's like, "Uh yeah. Good luck with this. Want to know how many 'r's are in the word strawberry?" "AI will replace accountants"... AI seems to think it has better things to do with its time 😅.
To view or add a comment, sign in
-
🚀Ekohe Principal Data Scientist Keira Liu shares the 3rd article in our LLM series! 📊 Key pinpoints to success include: ✅Choose the right model ✅Understand the components of the GPT app to build ✅Prompt Tuning ✅Building an external knowledge base and retrieving relevant information, etc 👉Read on to learn more: https://lnkd.in/grfsV2gk #ekohe #ai #gpt #llm #llmevaluation
Moving GPT from Cool to USEFUL — Part 3: From Playgrounds to Production
medium.com
To view or add a comment, sign in
-
1 The $1m (and up) autonomous business orchestrated by one individual will rapidly become possible with the right tools on the front end. 2 Any business which already has a #platform approach to mid- and high-level #professionalservices that deal in billable 5 hour and 5 day tasks (#law, #accountancy, #consulting) will gain an immediate advantage by reducing cost to serve: Canva is augmenting creative work with #genai while Upwork and Fiverr will very quickly move to similar models.
This week, Leopold Aschenbrenner, who was previously fired from OpenAI’s safety team for allegedly leaking material, published a 165-page blog post on AI and the decade ahead. The post was received with both excitement and dismissal by various factions. Aschenbrenner says: "Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. [...] it is strikingly plausible that by 2027, models will be able to do the work of an AI researcher/engineer." Aschenbrenner’s view is in the same sort of ballpark as Eric Schmidt’s and roughly lines up with what Microsoft’s CTO, Kevin Scott, said. Scott talked about GPT-5 performing at a level of a PhD student. Let’s assume that scaling laws hold and that performance leaps continue for the next several years. What kind of real-world impact would such a leap in performance have? I find one of Sam Altman’s frameworks quite clarifying. He talks about “the five-second tasks, the five-minute tasks, the five-hour tasks, maybe even the five-day tasks” that AI could do. We could apply this to our experiences with LLMs: GPT-3 or GPT-3.5 was good at five-second tasks; GPT-4 is good at five-minute tasks, and perhaps GPT-5 will be good at five-hour tasks (or at least 50-minute tasks!). But what comes next? Well, in Sam’s telling of the story, we could expect GPT-5 to handle five-hour tasks. This is congruent with Eric Schmidt’s argument that in the next five years, these machines will be able to undertake tasks that have 1,000 discrete steps. This is pretty substantial. The question I’ll pose to all of you is this: what does having a piece of software capable of conducting a task mean? A well-trained human - say MSc-level (although Kevin Scott says PhD-level) - takes 5 hours to do, but in any domain. That piece of software doesn’t cost much… Perhaps it’s free, $20 a month, or a little more. And once you access a GPT-5-class model, you can use dozens or more of those PhD-level software assistants. A business could potentially run hundreds of thousands or millions. A state, billions. https://lnkd.in/eHyZAShb
What to expect when you’re expecting GPT-5
exponentialview.co
To view or add a comment, sign in
-
When scoping tasks and roles poised for augmentation or disruption powered by accelerating #GenAI, the unit economics are critical lenses abd levers. In the article below you can find out about the hierarchy of 5sec, 5min, 50min and 5day units and expected timelines as a starting point. #AI, #transformation, #waysofworking.
This week, Leopold Aschenbrenner, who was previously fired from OpenAI’s safety team for allegedly leaking material, published a 165-page blog post on AI and the decade ahead. The post was received with both excitement and dismissal by various factions. Aschenbrenner says: "Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. [...] it is strikingly plausible that by 2027, models will be able to do the work of an AI researcher/engineer." Aschenbrenner’s view is in the same sort of ballpark as Eric Schmidt’s and roughly lines up with what Microsoft’s CTO, Kevin Scott, said. Scott talked about GPT-5 performing at a level of a PhD student. Let’s assume that scaling laws hold and that performance leaps continue for the next several years. What kind of real-world impact would such a leap in performance have? I find one of Sam Altman’s frameworks quite clarifying. He talks about “the five-second tasks, the five-minute tasks, the five-hour tasks, maybe even the five-day tasks” that AI could do. We could apply this to our experiences with LLMs: GPT-3 or GPT-3.5 was good at five-second tasks; GPT-4 is good at five-minute tasks, and perhaps GPT-5 will be good at five-hour tasks (or at least 50-minute tasks!). But what comes next? Well, in Sam’s telling of the story, we could expect GPT-5 to handle five-hour tasks. This is congruent with Eric Schmidt’s argument that in the next five years, these machines will be able to undertake tasks that have 1,000 discrete steps. This is pretty substantial. The question I’ll pose to all of you is this: what does having a piece of software capable of conducting a task mean? A well-trained human - say MSc-level (although Kevin Scott says PhD-level) - takes 5 hours to do, but in any domain. That piece of software doesn’t cost much… Perhaps it’s free, $20 a month, or a little more. And once you access a GPT-5-class model, you can use dozens or more of those PhD-level software assistants. A business could potentially run hundreds of thousands or millions. A state, billions. https://lnkd.in/eHyZAShb
What to expect when you’re expecting GPT-5
exponentialview.co
To view or add a comment, sign in