Snowflake is not a data warehouse

I heard so many times people are saying that Snowflake is a data warehouse. That is categorically incorrect. Snowflake is not a data warehouse. Not until you build a data warehouse on Snowflake. So you need to create the data warehouse, YOUR data warehouse on Snowflake platform. Yes, Snowflake is a platform. It is a data platform.

Likewise, Databricks, Redshift, Fabric and BigQuery is not a data warehouse. They are data platform, on which you can build your data warehouse.

As a data platform Snowflake has several abilities:

1. Enable us to build a data warehouse

2. Enable us to build a data lake

3. Enable us to build a data sharing platform

4. Enable us to build data science / AI applications

5. Enable us to build data applications

6. Enable us to do data engineering

Likewise, Databricks, Redshift, Fabric and BigQuery enable us to build all those things too.

Cloud data platform

I’ve been working with databases for over 20 years. In the past, the database was installed in our in-house servers. We had a server room in the building, where racks and cabinets were full of blade servers, switches, storage, etc. A data warehouse project means that I had to setup and install database servers and storage in that server room, configuring 2 servers so that when 1 failed, the other one automatically took over within seconds. On top of that we had to setup user authentication, disaster recovery, partitioning, indexing, and so on.

Today, with cloud data platform there is a big difference. When doing a data warehouse project, we do not install anything into any servers. We don’t install a database server. We don’t install load balancer (for that automatic failover). We just login to Snowflake using our browser, and everything is available to us out of the box. Database, user authentication, time travel, high availability, disaster recovery, row level security, governance, data privacy.

Likewise, Databricks, Redshift, Fabric and BigQuery also do the same thing. There is no install. You just login and use them. You get everything out of the box.

Understandably, people from the “old school” like me often think the old way. For example:

  • Database backup. What backup? There is no backup! You don’t backup your database. And you don’t restore them. So what if you accidentally dropped a table? Just type “UNDROP TABLE”. And your table will be restored as it was. Voila! What if you wanted to query a table as it was last week? Easy. Just add “AT {DATE}” on your SELECT statement. And you’ll get the content of that table as it was last week.
  • Partitioning big tables? There is no partitioning! You don’t need to partition your tables to get good query performance like in the old days. The tables are already partitioned (called “micro partition”) and automatically managed by Snowflake. You don’t need to do anything! In the old days, I spent a lot of time managing the partitions on SQL Server. It’s called “partition switching” and “sliding window”. Not today. It’s all automatic.
  • Clustering. In the old days, to get high availability, I had to setup a “cluster”. Meaning a group of 2 database servers (or 3, or 4). One of them is active, and one is on standby. When the active one crashed, the standby one became active automatically (called “failover”). With Snowflake you don’t need to do that. You get high availability out of the box. There is no down time. There is no crash. Behind the scene, your database is replicated across multiple data centers. If one data center becomes unavailable, it automatically failover to another data center. Amazing isn’t it?
  • Compression: On SQL Server, I used to do data compression in order to save space and improve query performance. Using something called columnstore, and page compression. Meaning that the data is stored per column not per row. In Snowflake, I don’t need to do that. Out of the box, Snowflake is a columnar database. It stores data per column. And it automatically compress all the data in your database to improve query performance and reduce storage cost.
  • Indexing: I used to spend a lot of time doing indexing. As you can imagine, in data warehousing the tables are pretty big. So indexing is key to get good query performance. And load performance too. And I had to update the statistics regularly (like every day) on big tables. Well not with Snowflake. There is no index. Snowflake distributes tables into micro-partitions and for each partition it collects the statistics on the data range for each column, loading only the partitions required to satisfy the query.

Likewise, Databricks, Redshift, Fabric and BigQuery also do the same thing. NOT! They are not the same. Some of them are still using partitioning, still using indexing, clustering and database backup too! Yup. Need to be careful when choosing a “cloud data platform”. Because not everyone does everything above automatically out of the box.

But yes with all of those “cloud data platform” you don’t install anything. You don’t install any hardware. You don’t install any software. You don’t install any patches either. In the old days, every year you had to “patch” your database server. A few times a year even. It’s called “updates”. There were database updates, there were antivirus updates, there were operating system updates, etc. Lots of updates. And then there were “upgrades”, when you installed the next version of Oracle (or SQL Server, etc.) Or the next version of Windows server. And guess what? You got down time! You had to make the databases not available for a few hours while you upgrade the server. You had to work weekends to avoid impact to the users.

With the cloud data platform, there are no patches, updates, or upgrades. It is all automatic. They install all the updates and upgrades for you without any down time. NOT! Again, not all of them are like that. Some platform you still have to do patches, updates or upgrades. So be careful when choosing a data platform. Not everyone of them is maintenance free.

Conclusion

Snowflake is not a data warehouse. It is a data platform. And it is an AI platform too (but this is for another article, another day). With cloud data platforms like Snowflake you don’t need to do database backup, table partitioning, clustering, compression, indexing. No need to install anything, no need to do patches, updates and upgrades. There are many cloud data platforms, like Databricks, Redshift, Fabric and BigQuery. But not everyone of them do everything automatically.

Andy Webster

Data Cloud Platform, Data Analytics Engineer, Certified DBA MSc, BSc

2mo

Was this meant to be a joke? If so I apologize profusely

Like
Reply
Andy Webster

Data Cloud Platform, Data Analytics Engineer, Certified DBA MSc, BSc

2mo

I apologize if I'm teaching anyone to suck eggs...

Like
Reply
Andy Webster

Data Cloud Platform, Data Analytics Engineer, Certified DBA MSc, BSc

2mo

Databricks is a similar solution also subtly different as a Data Lakehouse, its shares common ideas but takes a different approach - they compete head to head, one has better Spark, one has better data sharing, one has a DWH metaphor the other kinda different. Again ask AI if you really don't know what you are talking about here too - sorry, please ignore me. Snowflake and Databricks experts, help me out here please?

Andy Webster

Data Cloud Platform, Data Analytics Engineer, Certified DBA MSc, BSc

2mo

copilot what is Snowflake? Copilot Snowflake is a cloud-based data platform that enables organizations to store, manage, and analyze large amounts of data1. It offers a fully managed service that allows for data sharing, collaboration, and analytics across multiple cloud environments2. Snowflake supports structured, semi-structured, and unstructured data, making it versatile for various use cases Some key features of Snowflake include: Flexible Architecture: Supports data on- ... blah blah... Copilot, not a cloud data warehouse then? It absolutely is a cloud data warehouse! Snowflake is actually one of the the-best cloud data warehouse solutions out there. Its architecture allows for the separation of storage and compute, making it highly scalable and cost-efficient. It enables organizations to store vast amounts of data and perform complex queries across multiple cloud providers. So yes, Snowflake is indeed a cloud data warehouse with some impressive capabilities. Blah b;ah blah - if you can't be authentic, be a generalist perhaps? If you don't believe me ask ChatGPT, Google Gemini yourself or better still do proper research online. and Yes I know DWH, Data Mart and data modeling.

Like
Reply
Andy Webster

Data Cloud Platform, Data Analytics Engineer, Certified DBA MSc, BSc

2mo

Really? not a Cloud Data Warehouse? Illuminate me, what exactly is it then, seriously I love to learn..

Like
Reply

To view or add a comment, sign in

More articles by Vincent Rainardi

  • A good company

    A good company

    I've been working for 15 companies in my career. I’m a contractor.

    1 Comment
  • Advice for computer science students

    Advice for computer science students

    Someone who teaches in university just asked me a question: his students who completed CSE undergraduate degree…

  • A few quick tips on dbt.

    A few quick tips on dbt.

    1. dbt build error The line number on dbt build error is 2 lines too many.

  • Every data scientist needs to read this book

    Every data scientist needs to read this book

    Every data scientist needs to read this book: Big book of data science https://www.databricks.

  • Writing using Gen AI

    Writing using Gen AI

    First I typed on ChatGPT, Copilot or Gemini what I wanted to write, for example: "tips for data engineer". ChatGPT gave…

    1 Comment
  • 3 Tips for Data Engineers

    3 Tips for Data Engineers

    It is impossible to master all the tech in data engineering. In AWS alone there are over 40 things, from S3, Glue…

    2 Comments
  • Python Notebook

    Python Notebook

    A Notebook in Snowflake can contain both SQL and Python. And they are very simple to create.

    3 Comments
  • Ingesting data into Databricks SQL

    Ingesting data into Databricks SQL

    In the previous article I showed Databricks SQL, a UI where we can type SQL queries just like in SQL Server Management…

    2 Comments
  • Quick tips on Databricks SQL

    Quick tips on Databricks SQL

    Before we start, if you haven’t used Databricks before, you can try it for free. Just head to www.

    4 Comments
  • dbt - Opening a model

    dbt - Opening a model

    Just a quick one on dbt. When you have hundreds of models, it could take a long time to navigate to a particular model.

    1 Comment

Insights from the community

Others also viewed

Explore topics