A couple of thoughts on the new GPT-4o-mini regarding zero-shot classification and extraction tasks.
Up until now, there was a relatively simple strategy.
If you have a high volume of transactions, you try to fine-tune GPT-3.5 to keep the cost under control. Fine-tuning is a time-intensive process, so it does make sense to look into this only for large volumes, and a fine-tuned GPT-3.5 is still more than 50% cheaper than GPT-4o. Also, fine-tuned models usually work well with short prompts, reducing costs, especially in many transactions.
For low volume, GPT-4o or even GPT-4 is better—you can start immediately, and the models will most likely be good enough for zero-shot classification.
The definition of high/low volume depends on your budget, the duration of how long you intend to run the process and the cost of the team that would do the fine-tuning.
So, we now have a new model significantly cheaper than any of the options above. Well, you cannot fine-tune it, but at this cost, there are a few options:
1. You can break down complex tasks running on the expensive or fine-tuned models and run them on the new model.
2. The new model allows you to use few-shot examples—even though the prompts become longer, the cost will still be exponentially lower.
3. You can try the cheaper model, and if your validation fails, always upscale to the more expensive models.
If you use OpenAI, I would evaluate existing processes and test the new model using the options above. It is enough for your use case, and you will likely look at a 5-10x decrease in the current costs.