LangWatch reposted this
My 7 h̶a̶l̶l̶u̶c̶i̶n̶a̶t̶i̶o̶n̶s̶ predictions for AI in 2025 1. Agents will still be a thing, and keep going The analogy is here to stay, and the industry will mature on AI agents. We won't completely crack the challenge of handling agents in 2025 yet, but we will approach better solutions. Think about all JavaScript frameworks there was before we arrived at React, all the object-oriented patterns and FP before we arrived at modern code, same will happen with agents, tooling will improve, practices will improve, LLMOps will be needed more than ever. 2. Video and other data sources will play a major role As predicted by Ilya, AI is running out of the free lunch of massive data, but just not yet, there is still a LOT of information contained in sources other than text, specially videos which contain an enormous amount of information and relationships (think beyond transcriptions), which can still be harnessed with more multimodal innovations to keep pushing foundational models 3. Google and China takes the lead As we saw this end of year, Google is on a roll, from the outside, finally all the internal struggles seems to be solved and Google is picking up pace more and more. Building on the previous point, Google has YouTube and many other products still to leverage. Same with chinese models, as the launch of Qwen 2.5 and DeepSeek v3 shows, there is so much innovation coming from there, with possibility of leveraging data the west has no idea about. OpenAI will still launch innovations like o1-family, but will struggle to remain at the top, however, consumer-wise they will still remain top of mind with ChatGPT for 2025 4. Really good local tiny models, really cheap At the end of this year we have seen multiple times smaller models beating way larger previous-generation models. We've seen that with Llama 3.2 and DeepSeek v3 with it's MoE shows that over again. Costs keep going down and portability going up, together with continued innovations in hardware, this might finally be the year where bringing your own model to your application or local development will be commonplace 5. Heavy models and test-time compute keep pushing the boundaries, distilled one-shot for the real world Much like what we saw with Claude 3.5 Opus not being launched and probably just being used to train Sonnet, it will follow that models like the o1-family will not be used by wrappers and daily tasks, even so, billions of dollars will keep being poured into training them, making them bulkier and heavier, to push the state-of-the art and help distil into smarter, one-shot models. A more clear line of use cases for each side will be drawn. Numbers 6 and 7 in the comments due to character limit 🙊
Interesting takes! I think test time compute scaling is going to be a huge focus and source of gains for 2025. There are a lot of low hanging fruit for optimization there, and I don't think smaller single-shot models will ever be able to get the performance that deep-CoT like o3 seems to have, even if trained on synthetic data from this kind of models. With dedicated hardware and inference engines for TTC, representative encoding to lower token usage instead of using natural language for the CoT (e.g. Hidden CoT), quantization, speculative decoding for CoT trees etc. we could get a 10000x speed increase and cost reduction for CoT models making them almost as accessible as single-shot models for most applications. I think this will accelerate heavily after o3 public release, and will be the main focus unless a major breakthrough out of transformers is achieved.
Wow great post. Do you think we'll reach the point of wide-scale adoption on-edge? I had local LLM on PI and recently on Android via Termux - for industry applications you want that latency and quality though. You guys are well-positioned I think
We got you covered for 1,2 and 4!
Good stuff Rogerio! Especially point 6, I'm also curious to see how the process of bringing GenAI prototypes to production reliably will further flesh out as these lines become clearer, but I'm sure you guys at LangWatch will provide some guidance in that respect
Co-Founder @ LangWatch - Measure the quality and continuously improve your LLM apps
5d6. Practitioners vs Hobbyists Right now the line is still a bit fuzzy on who is just doing "shiny prototypes that never see the light of the day" vs true robust applications of AI, where the industry is very early on. Those "shiny tricks" may be dismissed as gimmicks and not reliable, but truth is, being incredibly fast to prototype is very valuable in itself, as success of apps like Lovable shows. In 2025 both camps will grow enormously, low-code will be at an all-time high at the same time that reliability and testing practices will be much more mature for next generation AI apps, with a pragmatic approach. The line between those two sides will be much more clear, learning from each other.