Enterprise

Data lakehouse Onehouse nabs $35M to capitalize on GenAI revolution

Comment

Onehouse founder and CEO Vinoth Chandar
Image Credits: Onehouse / Founder and CEO Vinoth Chandar

You can barely go an hour these days without reading about generative AI. While we are still in the embryonic phase of what some have dubbed the “steam engine” of the fourth industrial revolution, there’s little doubt that “GenAI” is shaping up to transform just about every industry — from finance and healthcare to law and beyond.

Cool user-facing applications might attract most of the fanfare, but the companies powering this revolution are currently benefiting the most. Just this month, chipmaker Nvidia briefly became the world’s most valuable company, a $3.3 trillion juggernaut driven substantively by the demand for AI computing power.

But in addition to GPUs (graphics processing units), businesses also need infrastructure to manage the flow of data — for storing, processing, training, analyzing and, ultimately, unlocking the full potential of AI.

One company looking to capitalize on this is Onehouse, a three-year-old Californian startup founded by Vinoth Chandar, who created the open source Apache Hudi project while serving as a data architect at Uber. Hudi brings the benefits of data warehouses to data lakes, creating what has become known as a “data lakehouse,” enabling support for actions like indexing and performing real-time queries on large datasets, be that structured, unstructured or semi-structured data.

For example, an e-commerce company that continuously collects customer data spanning orders, feedback and related digital interactions will need a system to ingest all that data and ensure it’s kept up-to-date, which might help it recommend products based on a user’s activity. Hudi enables data to be ingested from various sources with minimal latency, with support for deleting, updating and inserting (“upsert”), which is vital for such real-time data use cases.

Onehouse builds on this with a fully managed data lakehouse that helps companies deploy Hudi. Or, as Chandar puts it, it “jumpstarts ingestion and data standardization into open data formats” that can be used with nearly all the major tools in the data science, AI and machine learning ecosystems.

“Onehouse abstracts away low-level data infrastructure build-out, helping AI companies focus on their models,” Chandar told TechCrunch.

Today, Onehouse announced it has raised $35 million in a Series B round of funding as it brings two new products to market to improve Hudi’s performance and reduce cloud storage and processing costs.

Down at the (data) lakehouse

Onehouse ad on London billboard
Onehouse ad on London billboard.
Image Credits: Onehouse

Chandar created Hudi as an internal project within Uber back in 2016, and since the ride-hailing company donated the project to the Apache Foundation in 2019, Hudi has been adopted by the likes of Amazon, Disney and Walmart.

Chandar left Uber in 2019, and, after a brief stint at Confluent, founded Onehouse. The startup emerged out of stealth in 2022 with $8 million in seed funding, and followed that shortly after with a $25 million Series A round. Both rounds were co-led by Greylock Partners and Addition.

These VC firms have joined forces again for the Series B follow-up, though this time, David Sacks’ Craft Ventures is leading the round.

“The data lakehouse is quickly becoming the standard architecture for organizations that want to centralize their data to power new services like real-time analytics, predictive ML and GenAI,” Craft Ventures partner Michael Robinson said in a statement.

For context, data warehouses and data lakes are similar in the way they serve as a central repository for pooling data. But they do so in different ways: A data warehouse is ideal for processing and querying historical, structured data, whereas data lakes have emerged as a more flexible alternative for storing vast amounts of raw data in its original format, with support for multiple types of data and high-performance querying.

This makes data lakes ideal for AI and machine learning workloads, as it’s cheaper to store pre-transformed raw data, and at the same time, have support for more complex queries because the data can be stored in its original form.

However, the trade-off is a whole new set of data management complexities, which risks worsening the data quality given the vast array of data types and formats. This is partly what Hudi sets out to solve by bringing some key features of data warehouses to data lakes, such as ACID transactions to support data integrity and reliability, as well as improving metadata management for more diverse datasets.

Configuring data pipelines in Onehouse
Configuring data pipelines in Onehouse.
Image Credits: Onehouse

Because it is an open source project, any company can deploy Hudi. A quick peek at the logos on Onehouse’s website reveals some impressive users: AWS, Google, Tencent, Disney, Walmart, ByteDance, Uber and Huawei, to name a handful. But the fact that such big-name companies leverage Hudi internally is indicative of the effort and resources required to build it as part of an on-premises data lakehouse setup.

“While Hudi provides rich functionality to ingest, manage and transform data, companies still have to integrate about half-a-dozen open source tools to achieve their goals of a production-quality data lakehouse,” Chandar said.

This is why Onehouse offers a fully managed, cloud-native platform that ingests, transforms and optimizes the data in a fraction of the time.

“Users can get an open data lakehouse up-and-running in under an hour, with broad interoperability with all major cloud-native services, warehouses and data lake engines,” Chandar said.

The company was coy about naming its commercial customers, aside from the couple listed in case studies, such as Indian unicorn Apna.

“As a young company, we don’t share the entire list of commercial customers of Onehouse publicly at this time,” Chandar said.

With a fresh $35 million in the bank, Onehouse is now expanding its platform with a free tool called Onehouse LakeView, which provides observability into lakehouse functionality for insights on table stats, trends, file sizes, timeline history and more. This builds on existing observability metrics provided by the core Hudi project, giving extra context on workloads.

“Without LakeView, users need to spend a lot of time interpreting metrics and deeply understand the entire stack to root-cause performance issues or inefficiencies in the pipeline configuration,” Chandar said. “LakeView automates this and provides email alerts on good or bad trends, flagging data management needs to improve query performance.”

Additionally, Onehouse is also debuting a new product called Table Optimizer, a managed cloud service that optimizes existing tables to expedite data ingestion and transformation.

‘Open and interoperable’

There’s no ignoring the myriad other big-name players in the space. The likes of Databricks and Snowflake are increasingly embracing the lakehouse paradigm: Earlier this month, Databricks reportedly doled out $1 billion to acquire a company called Tabular, with a view toward creating a common lakehouse standard.

Onehouse has entered a hot space for sure, but it’s hoping that its focus on an “open and interoperable” system that makes it easier to avoid vendor lock-in will help it stand the test of time. It is essentially promising the ability to make a single copy of data universally accessible from just about anywhere, including Databricks, Snowflake, Cloudera and AWS native services, without having to build separate data silos on each.

As with Nvidia in the GPU realm, there’s no ignoring the opportunities that await any company in the data management space. Data is the cornerstone of AI development, and not having enough good quality data is a major reason why many AI projects fail. But even when the data is there in bucketloads, companies still need the infrastructure to ingest, transform and standardize to make it useful. That bodes well for Onehouse and its ilk.

“From a data management and processing side, I believe that quality data delivered by a solid data infrastructure foundation is going to play a crucial role in getting these AI projects into real-world production use cases — to avoid garbage-in/garbage-out data problems,” Chandar said. “We are beginning to see such demand in data lakehouse users, as they struggle to scale data processing and query needs for building these newer AI applications on enterprise scale data.”

More TechCrunch

Plaid’s expansion into being a multi-product company has led to real traction beyond traditional fintech customers.

Plaid, once aimed mostly at fintechs, is growing its enterprise business and now has over 1,000 customers signed on

He says that the problem is that generative AI is not human or even human-like, and it’s flawed to try and assign human capabilities to it.

MIT robotics pioneer Rodney Brooks thinks people are vastly overestimating generative AI

Matrix is rebranding its India and China affiliates, becoming the latest venture firm to distance its international franchises. The U.S.-headquartered venture capital firm will retain its name, while Matrix Partners…

Matrix rebrands India, China units for ‘organizational independence’

Adept, a startup developing AI-powered “agents” to complete various software-based tasks, has agreed to license its tech to Amazon and the startup’s co-founders and portions of its team have joined…

Amazon hires founders away from AI startup Adept

There are plenty of resources to learn English, but not so many for near-native speakers who still want to improve their fluency. That description applies to Stan Beliaev and Yurii…

YC alum Fluently’s AI-powered English coach attracts $2M seed round

NASA and Boeing officials pushed back against recent reporting that the two astronauts brought to the ISS on Starliner are stranded on board. The companies said in a press conference…

NASA and Boeing deny Starliner crew is ‘stranded’: “We’re not in any rush to come home”

As the country reels from a presidential debate that left no one looking good, the Supreme Court has swooped in with what could be one of the most consequential decisions…

Forget the debate, the Supreme Court just declared open season on regulators

As Google described during the I/O session, the new on-device surface would organize what’s most relevant to users, inviting them to jump back into their apps.

Android’s upcoming ‘Collections’ feature will drive users back to their apps

Many VC firms are struggling to attract new capital from their own backers amid a tepid IPO environment. But established, brand-name firms are still able to raise large funds. On…

Kleiner Perkins announces $2 billion in fresh capital, showing that established firms can still raise large sums

Welcome to Startups Weekly — Haje‘s weekly recap of everything you can’t miss from the world of startups. Sign up here to get it in your inbox every Friday. Editor’s…

DEI? More like ‘common decency’ — and Silicon Valley is saying ‘no thanks’

The company “identified a security incident that involved bad actors targeting a limited number of HubSpot customers and attempting to gain unauthorized access to their accounts” on June 22.

HubSpot says it’s investigating customer account hacks

VW Group’s struggling software arm Cariad has hired at least 23 of the startup’s top employees over the past several months.

Volkswagen’s Silicon Valley software hub is already stacked with Rivian talent

Featured Article

All VCs say they are founder friendly; Detroit’s Ludlow Ventures takes that to another level

VCs Jonathon Triest and Brett deMarrais see their ability to read people and create longstanding relationships with founders as the primary reason their Detroit-based venture firm, Ludlow Ventures, is celebrating its 15th year in business. It sounds silly, attributing their longevity to what’s sometimes called “Midwestern nice.” But is it…

22 hours ago
All VCs say they are founder friendly; Detroit’s Ludlow Ventures takes that to another level

President Joe Biden’s administration is doubling down on its interest in the creator economy. In August, the White House will host the first-ever White House Creator Economy Conference, which will…

The White House will host a conference for social media creators

In an industry where creators are often tossed aside like yesterday’s lootboxes, MegaMod swoops in with a heroic promise to put them front and center.

Pitch Deck Teardown: MegaMod’s $1.9M seed deck

Google’s trying to make waves with Gemini, its flagship suite of generative AI models, apps and services. So what’s Google Gemini, exactly? How can you use it? And how does…

Google Gemini: Everything you need to know about the new generative AI platform

There were definite differences between how the two platforms managed last night, with some saying X felt more alive, and others asserting that Threads proved that X is no longer…

Who won the presidential debate: X or Threads?

Ultra-low-cost e-commerce giants Shein and Temu have only recently been confirmed as subject to centralized enforcement of the strictest layer of the European Union’s digital services regulation, the Digital Services…

Following raft of consumer complaints, Shein and Temu face early EU scrutiny of DSA compliance

Artyc has raised $14 million to date and has a product on the market, Medstow Micro, that helps ship temperature-sensitive specimens.

Cold shipping might be the next industry that batteries disrupt

Get ready to unlock the secrets of successful fundraising in the upcoming year at Disrupt 2024. Our featured session, “How to Raise in 2025 if You’ve Taken a Flat, Down,…

Elevate your 2025 fundraising strategy at Disrupt 2024

The remote access giant linked the cyberattack to government-backed hackers working for Russian intelligence, known as APT29.

Remote access giant TeamViewer says Russian spies hacked its corporate network

We’ve poked through the many product announcements made by the biggest tech companies and product trade shows of the year, so far, and compiled them into this list.

Here are the hottest product announcements from Apple, Google, Microsoft and others so far in 2024

As a foreigner, navigating health insurance systems can often be difficult. German startup Feather thinks it has a solution and raised €6 million to help some of the 40-plus million…

Feather raises €6M to go Pan-European with its insurance platform for expats

The salad days of fresh grocery delivery startups are over, but those that have stayed the course, and built businesses that are seeing gains, are still here and hungry for…

Rohlik rolls up $170M to expand in European grocery delivery and sell its tech to others

The first six months of the year have seen $4.2 billion invested in robotics, putting this year well on track to beat 2023’s 12-month total of $6.8 billion.

Robotics investments are gaining speed after post-pandemic slowdown

Hebbia, a startup using generative AI to search large documents and return answers, has raised a nearly $100 million Series B led by Andreessen Horowitz, according to three people with…

Hebbia raises nearly $100M Series B for AI-powered document search led by Andreessen Horowitz

Digit’s first job will be moving totes around a Connecticut Spanx factory — which is most definitely not a euphemism.

Agility’s humanoid robots are going to handle your Spanx

These days, when you hear about students and generative AI, chances are that you’re getting a taste of the debate over the adoption of tools like ChatGPT. Are they a…

Will AI get an A+ in edtech? MagicSchool raises $15M to find out

In the conversation, Zuckerberg said there needs to be a lot of different AIs that get created to reflect people’s different interests.

Zuckerberg disses closed-source AI competitors as trying to ‘create God’

AI big shot Andrew Ng’s AI Fund, a startup incubator that backs small teams of experts looking to solve key problems using AI, plans to raise upward of $120 million…

Andrew Ng plans to raise $120M for next AI Fund
  翻译: