OpenPipe

OpenPipe

Software Development

Automatically convert unreliable LLM prompts into high-quality, fast fine-tuned models.

About us

Fine-tune models to replace your LLM prompts.

Website
https://openpipe.ai
Industry
Software Development
Company size
2-10 employees
Type
Privately Held

Employees at OpenPipe

Updates

  • SUPER PUMPED to announce that Gemini fine-tuning is available to all OpenPipe users! Gemini Flash provides the lowest cost fine-tuning of any model in its quality class. Comparable to gpt-4o-mini, but 4x cheaper inference and FREE fine-tuning!

    • No alternative text description for this image
  • OpenPipe reposted this

    View profile for Reid Mayo, graphic

    Founding AI Engineer @ OpenPipe (YC23) | The End-to-End LLM Fine-tuning Platform for Developers

    Was great to be back in Austin with Kyle Corbitt and Saumya G. for MLOps World '24. We had a lot of great conversations with engineering Founders, Leaders, and Expert ICs building cutting edge LLMOps infrastructure tooling and teaching the latest best practices in this exciting new field. Speaking of LLMOps, OpenPipe is helping engineering teams and product owners take advantage of their product's human and AI generated feedback data -- to create a Data Flywheel. We can help you continuously plug that data into a Reinforcement Learning pipeline that dramatically improves the performance of your LLMs on your proprietary use-case (and continuously improves performance over time). Interested in building (and growing) a sustainable competitive advantage backed by your data? Message me and let's set up a time to chat! Some special shout-outs to folks we met on the conference trail 🤠 Skyler Saucedo, Marty Dytrych, Juan Diego Balbi, Martin Picovsky, Ramon Serrallonga, Stephan Broquie, Rahul Sheth, Jared Zoneraich, Aaron Cheng, Ph.D, Beatrice Lovely, Nitin Gupta, Stefan Krawczyk, Adam Probst ^ some incredibly smart folks we learned a thing or two from (and hopefully taught a thing or two back)

    • No alternative text description for this image
  • RLHF-curious? I’ve put together a very practical guide to building a task-specific reward model! Includes lots tips on choosing the right metric and data, and all code is included. Hope it’s helpful. 🙂 If your application has human feedback (regenerations, user choices, etc.) please DM me and I’d love to chat about how we can use RLHF to improve your response quality significantly with minimal marginal effort!

    Using Reinforcement Learning and $4.80 of GPU Time to Find the Best HN Post Ever (RLHF Part 1) - OpenPipe

    Using Reinforcement Learning and $4.80 of GPU Time to Find the Best HN Post Ever (RLHF Part 1) - OpenPipe

    openpipe.ai

  • Yesterday two new major model families became available for fine-tuning: Llama 3.1, which comes in 8B, 70B and 405B(!) variants, and GPT-4o mini. We’ve added them to the OpenPipe platform and ran all of them (except Llama 3.1 405B) through our evaluation harness. The good news is, all 3 of models are extremely high quality. The bad news is, they saturate most of the standard evals we ran, which makes comparing them difficult! In fact, both Llama 3.1 variants we tried saturate all 3 of the standard evals we ran, and GPT-4o mini also saturated 2/3 of them. What do we mean by saturate? For any given input, you can imagine there is a potential “perfect” output (or set of outputs) that cannot be improved upon. The more complex the task, the more difficult it is for a model to generate a perfect output. However, once a model is strong enough to consistently generate a perfect output for that task, we consider the task saturated for that model. In our LLM-as-judge evals, this usually shows up as a cluster of models all doing about the same on the task without any model significantly outperforming. And in fact, that's what we see in the evaluations below. All 3 fine-tuned models do about as well as each other (win rates within 6%) on both the "Resume Summarization" and "Data Extraction" tasks. On "Chatbot Responses" however, both Llama 3.1 variants significantly outperform GPT-4o mini. So the "Chatbot Responses" task isn’t saturated for GPT-4o mini, but all other tasks and models are. This is very significant—we chose these tasks explicitly because older models on our platform, like Mistral 7B and Llama 3 8B, did not saturate these tasks! There are two main reasons why we’re seeing this saturation now: - The new models we’re testing here are stronger than the previous generation of models available on-platform. - Our benchmark models are now all trained on datasets relabeled with Mixture of Agents, which substantially improves the quality of the dataset and thus the fine-tuned model. We’re working on developing better benchmarks, and once we have some higher-difficulty ones we’ll analyze Llama 3.1 405B as well. And again, you can try all these out today on OpenPipe to run your own evaluations!

    • No alternative text description for this image
  • OpenPipe reposted this

    One week away from All About Fine-Tuning LLMs 🛠 Join us next Tuesday, June 25th at 11 AM PDT on Zoom! We're excited to announce two new panelists: 🤩 Sophia Yang, Ph.D. : Head of Developer Relations at Mistral AI Aditya Jain: Applied Research Scientist at Meta They'll be joining alongside- Kyle Corbitt: Co-founder OpenPipe Wing Lian: Founder, Axolotl AI Benjamin Hamm: Senior Principal Product Manager at OctoAI And our host - Naomi Chetrit Band, The GenAI Collective! Don't miss this deep dive into fine-tuning models for optimal performance, level up your tuning knowledge, and gain the knowledge and strategies to help you tailor open source models. Sign up here 👇 https://lnkd.in/e_3dasMu

    🧠 GenAI Collective x OctoAI 🐙 All About Fine-Tuning LLMs · Zoom · Luma

    🧠 GenAI Collective x OctoAI 🐙 All About Fine-Tuning LLMs · Zoom · Luma

    lu.ma

Similar pages

Funding

OpenPipe 2 total rounds

Last Round

Seed

US$ 6.7M

See more info on crunchbase