We launched new tools to the W&B Playground! 🛠️ ⚓️ Trials has been shipped to Playground! (see more info below) ➕ Amazon Web Services (AWS)'s Nova models & xAI's Grok beta LLM are now available. Choosing the best model output often means iteration. Playground trials will save you time by letting you compare multiple results side-by-side before committing to one. This feature really shines when the temperature settings are turned up. 🔥 Allowing you to explore a model’s creativity by generating multiple outputs at once and comparing the diverse responses—all at a glance. The magic doesn’t stop there. 🪄 Once you pick the best output, you can continue your exploration as if that output is how the model would’ve answered. See how the conversation unfolds from your chosen result!
Weights & Biases
Software Development
San Francisco, California 75,881 followers
The AI developer platform.
About us
Weights & Biases: the AI developer platform. Build better models faster, fine-tune LLMs, develop GenAI applications with confidence, all in one system of record developers are excited to use. W&B Models is the MLOps solution used by foundation model builders and enterprises who are training, fine-tuning, and deploying models into production. W&B Weave is the LLMOps solution for software developers who want a lightweight but powerful toolset to help them track and evaluate LLM applications. Weights & Biases is trusted by over a 1,000 companies to productionize AI at scale including teams at OpenAI, Meta, NVIDIA, Cohere, Toyota, Square, Salesforce, and Microsoft. Sign up for a 30-day free trial today at http://wandb.me/trial.
- Website
-
https://wandb.ai/site
External link for Weights & Biases
- Industry
- Software Development
- Company size
- 201-500 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2017
- Specialties
- deep learning, developer tools, machine learning, MLOps, GenAI, LLMOps, large language models, and llms
Products
Weights & Biases
Machine Learning Software
Weights & Biases helps AI developers build better models faster. Quickly track experiments, version and iterate on datasets, evaluate model performance, reproduce models, and manage your ML workflows end-to-end.
Locations
-
Primary
400 Alabama St
San Francisco, California 94110, US
Employees at Weights & Biases
Updates
-
Evaluating LLMs: A Conversation with Joseph Gonzalez Our CEO and cofounder, Lukas Biewald, recently sat down with Joseph Gonzalez, EECS Professor at UC Berkeley and Co-Founder at RunLLM, to discuss the research he and his team have done on evaluating LLMs. Here are some of the highlights from this conversation: 🔹 Vibes-Based Model Evaluation Joseph introduced the concept of "vibes," which evaluates not just accuracy but also the style of a model’s response—whether it’s friendly, concise, or narrative-driven. This approach is transforming how LLMs are refined for human interaction. 👉 “Correctness is only part of the story—how a model communicates is just as critical. Llama is funnier and friendlier; OpenAI tends to be more formal and tends towards longer responses.” – Joseph Gonzalez 🔹 Chatbot Arena: A Global Benchmark for LLMs Chatbot Arena (lmarena.ai) lets users compare LLMs side-by-side, creating a community-driven leaderboard for open-source and commercial models. Using the Bradley-Terry approach to analyze pairwise comparisons, this initiative segments performance by tasks like creative writing, coding, or instruction following, helping developers optimize workflows for their specific application. 👉 “We want to democratize LLM evaluation—helping developers and the community improve models collaboratively.” – Joseph Gonzalez 🔹 Collaborative AI Evaluation and Development Joseph shared insights on how LLM evaluation is evolving to incorporate human feedback and community input, offering a deeper understanding of model strengths and weaknesses. This participatory approach ensures that LLMs meet user needs across diverse use cases and applications. 👉 “Human preference is about much more than accuracy—it’s about trust, interaction, and experience.” – Joseph Gonzalez 🎥 Check out the full episode to explore Joseph’s insights on advancing LLM evaluation, fostering community collaboration, and refining AI-human interactions. https://lnkd.in/exD3xSui
-
Tomorrow on Gradient Dissent: Joseph Gonzalez, EECS Professor at UC Berkeley and Co-Founder at RunLLM joins Lukas Biewald to discuss vibes-based evaluation, democratizing LLM benchmarks, and advancing AI-human interactions. Subscribe: https://lnkd.in/gtGQWBk6
-
LLaVA-o1: Understanding images step-by-step Explore how LLaVA-o1 advances vision-language AI with structured reasoning. Learn how its dataset, multistage reasoning framework, and W&B Weave analysis set a new benchmark in multimodal understanding. Read more: https://lnkd.in/gKrmUg5H
-
Learn how to build an LLM router with W&B Weave and Not Diamond to optimize cost and performance. This guide covers routing queries to the best LLM for every use case, improving accuracy by up to 25% while reducing costs and latency tenfold. Explore the full tutorial here: https://lnkd.in/gc2ubitR
-
As AWS re:Invent 2024 wraps up, don’t miss our final sessions at the Weights & Biases booth! Join us Thursday at Booth 1520: • 10:30 AM PST: “Design, Evaluate, Iterate: Rapid Data Experimentation with Gretel and W&B” • 11:30 AM PST: "Provectus enhances Houzz's recommendation engine search with Weights & Biases" – Colin Murphy, Head of Alliances & Business Development. • 2:30 PM PST: "Fuzer: Advancing Image Generation with Seamless Blending and Full Control" – Saliou Kane (カン) and Ibrahima Kane, cofounders of Fotographer AI. Stop by to connect with our team. 🔗 Learn more: https://wandb.me/aws-24-li
-
Weights & Biases reposted this
Standing room only at Weights & Biases as Kimberly Madia presents on Weave, the best lightweight tool to help developers track and evaluate LLM apps. Come by booth 1520 to see Weave in action! #awsreinvent24 #reinvent #wandb #awsreinvent
-
Day 3 at #reInvent is here, and we’ve got another exciting lineup at the Weights & Biases booth! Stop by for expert talks and live demos showcasing the latest in AI workflows. 📍 Booth 1520 Wednesday Highlights: • 1:30PT: BCG - Generative Engineering: 5x Your Engineering Team with GenAI (Matthew Kropp, CTO) • 3:30 PM PST: Kick Start GenAI Development and Deployment with NVIDIA and Weights & Biases - Judy Lee, Sr. Manager, Product Marketing, NVIDIA • 4:30 PM PST: Trace Your Agentic Application: Combining AI21 Labs Jamba LLM and W&B, presented by AI21 - Chen Wang, Lead Solution Architect, AI21 • 5:30 PM PST: Distributed Computing in the GenAI Age, presented by Anyscale - Matt Connor, Growth and Product Lead, Anyscale • 6:30 PM PST: Beyond Demos: Deploying GenAI into Production, presented by Weights & Biases Stop by to connect with our team. 🔗 Learn more: https://wandb.me/aws-24-li