Weights & Biases

Weights & Biases

Software Development

San Francisco, California 76,406 followers

The AI developer platform.

About us

Weights & Biases: the AI developer platform. Build better models faster, fine-tune LLMs, develop GenAI applications with confidence, all in one system of record developers are excited to use. W&B Models is the MLOps solution used by foundation model builders and enterprises who are training, fine-tuning, and deploying models into production. W&B Weave is the LLMOps solution for software developers who want a lightweight but powerful toolset to help them track and evaluate LLM applications. Weights & Biases is trusted by over a 1,000 companies to productionize AI at scale including teams at OpenAI, Meta, NVIDIA, Cohere, Toyota, Square, Salesforce, and Microsoft. Sign up for a 30-day free trial today at http://wandb.me/trial.

Website
https://wandb.ai/site
Industry
Software Development
Company size
201-500 employees
Headquarters
San Francisco, California
Type
Privately Held
Founded
2017
Specialties
deep learning, developer tools, machine learning, MLOps, GenAI, LLMOps, large language models, and llms

Products

Locations

Employees at Weights & Biases

Updates

  • Our first episode of Gradient Dissent in 2025 is here! 🎙️ Lukas Biewald is joined by Akshay Agrawal, Co-Founder of marimo to discuss the future of collaborative AI development. They dive into how Marimo is enabling developers and researchers to collaborate seamlessly on AI projects, the challenges of scaling AI tools, and the importance of fostering open ecosystems for innovation. Akshay also shares insights into building a platform that empowers teams to iterate faster and solve complex AI challenges together. You do not want to miss this episode. Here's how you can tune in below! YouTube: https://lnkd.in/grXa5q8s Apple Podcasts: https://lnkd.in/e4RJ4ia Spotify: http://wandb.me/spotify

  • Our latest course LLM Apps: Evaluation is now LIVE! 🎉 In this code-first course, you’ll learn: • Best practices for evaluation metrics, datasets and human annotations • Lessons on building and aligning LLM judges • Industry expertise from Weights & Biases, Google and All Hands AI instructors 📚 Course Highlights: • Evaluation fundamentals & metrics • Programmatic evaluation implementation • LLM Judges: design & alignment • Google Case Study: Imagen, Veo 2 and tool use • OpenHands Case Study: evaluating agents Instructors: Ayush Thakur - AI Engineer, Weights & Biases Anish Shah - AI Engineer, Weights & Biases Paige Bailey - AI Developer Relations Lead, Google Graham Neubig - Co-Founder, All Hands AI 🎓 Start learning now: https://lnkd.in/gCHffA24

  • Wondering how NVIDIA's NIM Blueprint and our W&B Weave can supercharge generative AI applications? 🚀 NIM Blueprint provides generative AI reference architectures with: • Reference code • Helm charts for deployment • Documentation This makes it easier to build RAG-powered applications for personalization, summarization, and sentiment analysis. W&B Weave integrates seamlessly with NVIDIA NIM Blueprints to add: • Traceability 🛤️ • Observability 👁️ • Evaluation tools 📊 You can monitor and improve generative AI assistant responses even during live calls. The synergy between NVIDIA AI and W&B empowers developers to deliver robust, scalable AI applications with tools for continuous evaluation and iteration. Explore the future of AI-driven innovation here: https://lnkd.in/ggwnNbXR

    • No alternative text description for this image
    • No alternative text description for this image
  • We launched new tools to the W&B Playground! 🛠️ ⚓️ Trials has been shipped to Playground! (see more info below) ➕ Amazon Web Services (AWS)'s Nova models & xAI's Grok beta LLM are now available. Choosing the best model output often means iteration. Playground trials will save you time by letting you compare multiple results side-by-side before committing to one. This feature really shines when the temperature settings are turned up. 🔥 Allowing you to explore a model’s creativity by generating multiple outputs at once and comparing the diverse responses—all at a glance. The magic doesn’t stop there. 🪄 Once you pick the best output, you can continue your exploration as if that output is how the model would’ve answered. See how the conversation unfolds from your chosen result!

    • No alternative text description for this image
  • View organization page for Weights & Biases, graphic

    76,406 followers

    Evaluating LLMs: A Conversation with Joseph Gonzalez Our CEO and cofounder, Lukas Biewald, recently sat down with Joseph Gonzalez, EECS Professor at UC Berkeley and Co-Founder at RunLLM, to discuss the research he and his team have done on evaluating LLMs. Here are some of the highlights from this conversation: 🔹 Vibes-Based Model Evaluation Joseph introduced the concept of "vibes," which evaluates not just accuracy but also the style of a model’s response—whether it’s friendly, concise, or narrative-driven. This approach is transforming how LLMs are refined for human interaction. 👉 “Correctness is only part of the story—how a model communicates is just as critical. Llama is funnier and friendlier; OpenAI tends to be more formal and tends towards longer responses.” – Joseph Gonzalez 🔹 Chatbot Arena: A Global Benchmark for LLMs Chatbot Arena (lmarena.ai) lets users compare LLMs side-by-side, creating a community-driven leaderboard for open-source and commercial models. Using the Bradley-Terry approach to analyze pairwise comparisons, this initiative segments performance by tasks like creative writing, coding, or instruction following, helping developers optimize workflows for their specific application. 👉 “We want to democratize LLM evaluation—helping developers and the community improve models collaboratively.” – Joseph Gonzalez 🔹 Collaborative AI Evaluation and Development Joseph shared insights on how LLM evaluation is evolving to incorporate human feedback and community input, offering a deeper understanding of model strengths and weaknesses. This participatory approach ensures that LLMs meet user needs across diverse use cases and applications. 👉 “Human preference is about much more than accuracy—it’s about trust, interaction, and experience.” – Joseph Gonzalez 🎥 Check out the full episode to explore Joseph’s insights on advancing LLM evaluation, fostering community collaboration, and refining AI-human interactions. https://lnkd.in/exD3xSui

Similar pages

Browse jobs

Funding