Baseten

Baseten

Software Development

San Francisco, CA 6,713 followers

Fast, scalable inference in our cloud or yours

About us

At Baseten we provide all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently. Get started in minutes, and avoid getting tangled in complex deployment processes. You can deploy best-in-class open-source models and take advantage of optimized serving for your own models. We also utilize horizontally scalable services that take you from prototype to production, with light-speed inference on infra that autoscales with your traffic. Best in class doesn't mean breaking the bank. Run your models on the best infrastructure without running up costs by taking advantage of our scaled-to-zero feature.

Website
https://www.baseten.co/
Industry
Software Development
Company size
51-200 employees
Headquarters
San Francisco, CA
Type
Privately Held
Specialties
developer tools and software engineering

Products

Locations

Employees at Baseten

Updates

  • View organization page for Baseten, graphic

    6,713 followers

    🚀 New Generally Available Whisper drop: The fastest, most accurate, and cost-effective transcription with over 1000x real-time factor for production AI workloads. Our customers power user-facing applications with critical requirements for speed, accuracy, and cost-efficiency. At Baseten, we’re relentlessly focused on building the best infrastructure to power our customers’ mission-critical workloads: with the highest throughput, lowest latencies, and elastic autoscaling, all wrapped in a second-to-none DevEx. Our engineers apply cutting-edge research to achieve best-in-class model performance in production. Our new Generally Available Whisper implementation delivers: • Over 1000x real-time factor • The lowest word error rate (WER) • Custom scaling and hardware per model (or processing step) Best of all, you can fully customize your inference on Baseten including deployment type (dedicated, self-hosted, or hybrid), autoscaling, and number of GPUs. Check out our blog on how our engineers turbo-charged Whisper transcription accuracy and speed: https://lnkd.in/eBbMRtDw Huge shoutout to William Gao, Derrick Y., and Tianshu Cheng for their work here.

  • How can you run DeepSeek-R1 on H100s when it doesn’t fit on a single node? Multi-node inference uses high-bandwidth interconnect and model parallelism optimizations to split DeepSeek-R1 across two 8xH100 nodes – 16 GPUs working together to run the world’s most powerful open-source LLM. We use multi-node inference to offer cost-efficient DeepSeek-R1 deployments at scale. That said, building production-ready multi-node inference is no small feat: splitting an LLM across multiple nodes introduces additional compute orchestration and performance challenges that you need to address. 👉 Learn how multi-node inference works in our Co-founder Philip Howes and Philip Kiely's new blog: https://lnkd.in/e4ukh_48

    • No alternative text description for this image
  • Back by popular demand: Join us for the next NYC Tech Breakfast with Morgan Barrett on Wednesday, February 19th! The last one was so good we couldn't wait to put this back on the calendar. If you're an ML Engineering Leader in NYC, come through for technical discussions over pastries, lattes, and eggs at our beloved joint in SoHo. 👉 Seats are limited, save yours: https://lu.ma/tunu2u73

    • No alternative text description for this image
  • "There are big implications from DeepSeek for highly regulated industries. Companies that have strict data compliance requirements will be able to more freely experiment and innovate knowing they can completely control how data is used and where it’s sent." - Tuhin Srivastava Baseten CEO Tuhin shared his thoughts with Greylock on GenAI economics, the future of open-source vs. closed models, and how DeepSeek is affecting the future of AI development. 👉 Learn more here: https://lnkd.in/gh8Z2ZHU

    • No alternative text description for this image
  • "We've been working closely with the DeepSeek AI and SGLang teams for months to get these models running well." - Tuhin Srivastava Thank you Emma Cosgrove and the Business Insider team for sitting down with Tuhin again to talk about getting fast DeepSeek inference at scale! Learn how Baseten optimizes DeepSeek performance on H200s and H100s (using multi-node inference) in Tuhin and Amir's on-demand webinar: https://lnkd.in/eaiDeTNU

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    6,713 followers

    🚀 We’re thrilled to announce that Baseten Chains is now GA for production compound AI! 🚀 In 5 years, every app will have multiple models embedded in it. What matters is a great app experience. To achieve the ultra-low latencies necessary for a competitive UX, AI builders fight against monolithic model deployments, unnecessary costs from data egress and hardware bottlenecks, and complex, manual model orchestration. We built Baseten Chains to solve these challenges. Chains is an SDK for deploying compound AI systems in production that 𝗲𝗹𝗶𝗺𝗶𝗻𝗮𝘁𝗲𝘀 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗯𝗼𝘁𝘁𝗹𝗲𝗻𝗲𝗰𝗸𝘀, 𝘀𝗽𝗮𝗴𝗵𝗲𝘁𝘁𝗶 𝗰𝗼𝗱𝗲, 𝗮𝗻𝗱 𝘄𝗮𝘀𝘁𝗲𝗱 𝗺𝗼𝗻𝗲𝘆 𝗳𝗿𝗼𝗺 𝗶𝗱𝗹𝗲 𝗚𝗣𝗨𝘀. 📽️ Join Marius Killinger, Tyron Jung, and Rachel Rapp in a live webinar on March 6th and see Chains in action: https://lnkd.in/dW_YqmKJ At Baseten, we exist to empower our customers with the most performant, reliable, and cost-effective inference in production. Working closely with our customers, we designed Chains to let you: • Call a series of models and processing steps without incurring excess latency • Make complex workflows modular (think: hardware, autoscaling) yet cohesive • Abstract complex model orchestration Deploy any compound AI system with Chains and gain the 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗲𝗱 𝗺𝗼𝗱𝗲𝗹 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 and 𝗲𝗹𝗮𝘀𝘁𝗶𝗰 𝗵𝗼𝗿𝗶𝘇𝗼𝗻𝘁𝗮𝗹 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 we specialize in. Building complex, multi-model workflows is as simple as calling local, type-safe Python functions. Chains lets you define unique hardware and autoscaling per step in your workflow, which is critical for keeping latency low and deployments cost-efficient. We’ve seen 𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 𝘁𝗶𝗺𝗲𝘀 𝗵𝗮𝗹𝘃𝗲𝗱 and 𝗚𝗣𝗨 𝘂𝘁𝗶𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗶𝗺𝗽𝗿𝗼𝘃𝗲 𝟲𝘅. Chains also powers our fastest Whisper transcription, achieving 𝟭𝟬𝟬𝟬𝘅 𝗿𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 𝗳𝗮𝗰𝘁𝗼𝗿 with extreme cost-efficiency. Now with additional performance and DevEx improvements since our beta launch, we’re thrilled to announce the general availability of Chains for production AI! 👉🏻 Learn more in our launch blog: https://lnkd.in/d9qv_Chj Huge shoutout to Marius Killinger, Tyron Jung, and Sidharth Shanker for their work on our Chains GA release!

Similar pages

Browse jobs

Funding