Baseten

Software Development

San Francisco, CA 6,713 followers

Fast, scalable inference in our cloud or yours

See jobs Follow

Discover all 64 employees

About us

At Baseten we provide all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently. Get started in minutes, and avoid getting tangled in complex deployment processes. You can deploy best-in-class open-source models and take advantage of optimized serving for your own models. We also utilize horizontally scalable services that take you from prototype to production, with light-speed inference on infra that autoscales with your traffic. Best in class doesn't mean breaking the bank. Run your models on the best infrastructure without running up costs by taking advantage of our scaled-to-zero feature.

Website: https://www.baseten.co/
External link for Baseten
Industry: Software Development
Company size: 51-200 employees
Headquarters: San Francisco, CA
Type: Privately Held
Specialties: developer tools and software engineering

Products

Baseten

Machine Learning Software

Locations

Primary

San Francisco, CA, US

Get directions
New York, NY, US

Get directions

Employees at Baseten

See all employees

Updates

Baseten

6,713 followers
2mo Edited
Report this post
🚀 New Generally Available Whisper drop: The fastest, most accurate, and cost-effective transcription with over 1000x real-time factor for production AI workloads. Our customers power user-facing applications with critical requirements for speed, accuracy, and cost-efficiency. At Baseten, we’re relentlessly focused on building the best infrastructure to power our customers’ mission-critical workloads: with the highest throughput, lowest latencies, and elastic autoscaling, all wrapped in a second-to-none DevEx. Our engineers apply cutting-edge research to achieve best-in-class model performance in production. Our new Generally Available Whisper implementation delivers: • Over 1000x real-time factor • The lowest word error rate (WER) • Custom scaling and hardware per model (or processing step) Best of all, you can fully customize your inference on Baseten including deployment type (dedicated, self-hosted, or hybrid), autoscaling, and number of GPUs. Check out our blog on how our engineers turbo-charged Whisper transcription accuracy and speed: https://lnkd.in/eBbMRtDw Huge shoutout to William Gao, Derrick Y., and Tianshu Cheng for their work here.

🚀 New Generally Available Whisper drop

16 Comments

Like Comment Share
Baseten reposted this
Baseten

6,713 followers
2d
Report this post
Congrats Zed Industries on the new open-source model drop for edit prediction! It was a pleasure customizing Zeta's inference performance to hit your aggressive targets. 🔥 Zeta is #2 on Hacker News right now, check it out: https://lnkd.in/dfVnHSA
Like Comment Share
Baseten

6,713 followers
2d
Report this post
Congrats Zed Industries on the new open-source model drop for edit prediction! It was a pleasure customizing Zeta's inference performance to hit your aggressive targets. 🔥 Zeta is #2 on Hacker News right now, check it out: https://lnkd.in/dfVnHSA
Like Comment Share
Baseten

6,713 followers
3d
Report this post
How can you run DeepSeek-R1 on H100s when it doesn’t fit on a single node? Multi-node inference uses high-bandwidth interconnect and model parallelism optimizations to split DeepSeek-R1 across two 8xH100 nodes – 16 GPUs working together to run the world’s most powerful open-source LLM. We use multi-node inference to offer cost-efficient DeepSeek-R1 deployments at scale. That said, building production-ready multi-node inference is no small feat: splitting an LLM across multiple nodes introduces additional compute orchestration and performance challenges that you need to address. 👉 Learn how multi-node inference works in our Co-founder Philip Howes and Philip Kiely's new blog: https://lnkd.in/e4ukh_48
1 Comment

Like Comment Share
Baseten

6,713 followers
4d
Report this post
We're psyched to welcome two new team members: Luke de Haas and Kenzie Amack 🎉 Kenzie and Luke are joining our marketing team as Product Marketing Manager and Brand Designer!
Like Comment Share
Baseten

6,713 followers
5d
Report this post
Back by popular demand: Join us for the next NYC Tech Breakfast with Morgan Barrett on Wednesday, February 19th! The last one was so good we couldn't wait to put this back on the calendar. If you're an ML Engineering Leader in NYC, come through for technical discussions over pastries, lattes, and eggs at our beloved joint in SoHo. 👉 Seats are limited, save yours: https://lu.ma/tunu2u73
Like Comment Share
Baseten

6,713 followers
6d
Report this post
"There are big implications from DeepSeek for highly regulated industries. Companies that have strict data compliance requirements will be able to more freely experiment and innovate knowing they can completely control how data is used and where it’s sent." - Tuhin Srivastava Baseten CEO Tuhin shared his thoughts with Greylock on GenAI economics, the future of open-source vs. closed models, and how DeepSeek is affecting the future of AI development. 👉 Learn more here: https://lnkd.in/gh8Z2ZHU
1 Comment

Like Comment Share
Baseten

6,713 followers
1w
Report this post
"We've been working closely with the DeepSeek AI and SGLang teams for months to get these models running well." - Tuhin Srivastava Thank you Emma Cosgrove and the Business Insider team for sitting down with Tuhin again to talk about getting fast DeepSeek inference at scale! Learn how Baseten optimizes DeepSeek performance on H200s and H100s (using multi-node inference) in Tuhin and Amir's on-demand webinar: https://lnkd.in/eaiDeTNU
1 Comment

Like Comment Share
Baseten

6,713 followers
1w Edited
Report this post
What LLM inference optimizations work best on the NVIDIA GH200? It turns out KV cache reuse benefits from the high CPU-GPU bandwidth. We worked with our friends at Lambda to run our own benchmarks. Find the full writeup by our Co-founder Pankaj G. and Philip Kiely here: https://lnkd.in/ehJtJ8gZ
1 Comment

Like Comment Share
Baseten

6,713 followers
1w Edited
Report this post
🚀 We’re thrilled to announce that Baseten Chains is now GA for production compound AI! 🚀 In 5 years, every app will have multiple models embedded in it. What matters is a great app experience. To achieve the ultra-low latencies necessary for a competitive UX, AI builders fight against monolithic model deployments, unnecessary costs from data egress and hardware bottlenecks, and complex, manual model orchestration. We built Baseten Chains to solve these challenges. Chains is an SDK for deploying compound AI systems in production that 𝗲𝗹𝗶𝗺𝗶𝗻𝗮𝘁𝗲𝘀 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗯𝗼𝘁𝘁𝗹𝗲𝗻𝗲𝗰𝗸𝘀, 𝘀𝗽𝗮𝗴𝗵𝗲𝘁𝘁𝗶 𝗰𝗼𝗱𝗲, 𝗮𝗻𝗱 𝘄𝗮𝘀𝘁𝗲𝗱 𝗺𝗼𝗻𝗲𝘆 𝗳𝗿𝗼𝗺 𝗶𝗱𝗹𝗲 𝗚𝗣𝗨𝘀. 📽️ Join Marius Killinger, Tyron Jung, and Rachel Rapp in a live webinar on March 6th and see Chains in action: https://lnkd.in/dW_YqmKJ At Baseten, we exist to empower our customers with the most performant, reliable, and cost-effective inference in production. Working closely with our customers, we designed Chains to let you: • Call a series of models and processing steps without incurring excess latency • Make complex workflows modular (think: hardware, autoscaling) yet cohesive • Abstract complex model orchestration Deploy any compound AI system with Chains and gain the 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗲𝗱 𝗺𝗼𝗱𝗲𝗹 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 and 𝗲𝗹𝗮𝘀𝘁𝗶𝗰 𝗵𝗼𝗿𝗶𝘇𝗼𝗻𝘁𝗮𝗹 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 we specialize in. Building complex, multi-model workflows is as simple as calling local, type-safe Python functions. Chains lets you define unique hardware and autoscaling per step in your workflow, which is critical for keeping latency low and deployments cost-efficient. We’ve seen 𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 𝘁𝗶𝗺𝗲𝘀 𝗵𝗮𝗹𝘃𝗲𝗱 and 𝗚𝗣𝗨 𝘂𝘁𝗶𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗶𝗺𝗽𝗿𝗼𝘃𝗲 𝟲𝘅. Chains also powers our fastest Whisper transcription, achieving 𝟭𝟬𝟬𝟬𝘅 𝗿𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 𝗳𝗮𝗰𝘁𝗼𝗿 with extreme cost-efficiency. Now with additional performance and DevEx improvements since our beta launch, we’re thrilled to announce the general availability of Chains for production AI! 👉🏻 Learn more in our launch blog: https://lnkd.in/d9qv_Chj Huge shoutout to Marius Killinger, Tyron Jung, and Sidharth Shanker for their work on our Chains GA release!

1 Comment

Like Comment Share

Browse jobs

Funding

Baseten 4 total rounds

Last Round

Series B Apr 4, 2024

US$ 40.0M

Investors

Spark Capital IVP + 5 Other investors

See more info on crunchbase

Baseten

Software Development

San Francisco, CA 6,713 followers

Fast, scalable inference in our cloud or yours

About us

Products

Baseten

Machine Learning Software

Locations

Employees at Baseten

William Lau

Amir Haghighat

Co-founder at Baseten

Aaron Relph

Leading design at Baseten

Sarah Guo

Startup Investor and Company-Builder

Updates

🚀 New Generally Available Whisper drop

Join now to see what you are missing

Similar pages

Addition Wealth

Abacus.AI

Arize AI

Archy

Chronosphere

AtoB

Bigeye

Glean

Doppler

Cofertility

Browse jobs

Engineer jobs

Machine Learning Engineer jobs

Scientist jobs

Software Engineer jobs

Developer jobs

Marketing Manager jobs

Manager jobs

Senior Software Engineer jobs

Intern jobs

Associate jobs

Analyst jobs

Human Resources Specialist jobs

Executive jobs

Full Stack Engineer jobs

Operational Specialist jobs

Junior Software Engineer jobs

Designer jobs

Human Resources Generalist jobs

Human Resources Manager jobs

Account Executive jobs

Funding