LangWatch’s Post

LangWatch reposted this

View profile for Rogério Chaves, graphic

Co-Founder @ LangWatch - Measure the quality and continuously improve your LLM apps

🎅 12 days of OpenAI Since yesterday, OpenAI started their own "advent calendar" with AI releases, starting with the launch of o1 and o1-pro. But is it any good? Their launch video shows impressive capabilities on solving a though physics problem out of a hand-drawn sketch, in which gpt-4o would always fail before, it's jaw-dropping. However, looking into their own research paper, you can find several places where o1-pro falls short, for example on relatively common code tasks (image below), and still lagging behind many other common tasks when compared to Anthropic's Claude 3.5. This is why at LangWatch we firmly believe you must be the owner of your own benchmarks and evaluations, because the answer to whether it's good or not is *it depends*, you never know when a model is better suited for your use case than the other, all you can do is experiment, and experiment fast, optimizing the prompt to maximize metrics for both so you are sure to compare apples to apples. Excited for the next 11 days of OpenAI, let's see what's coming up!

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics