5 Use Cases for Integrating Big Data Tools with a Data Warehouse
These five use cases are good examples of how you can use big data tools to stage data for your data warehouse.

5 Use Cases for Integrating Big Data Tools with a Data Warehouse

All of us like to think we are above average. We like to think that our big data is bigger than average, and when we hear about big data tools, we flatter ourselves into thinking that our organization is a perfect candidate for the most advanced tools and architectures.

Is that really the case?

When You Actually Need Big Data Tools

Think of big data tools such as Hadoop, Spark, NoSQL, and massively parallel databases as the freight trains of the data world. A freight train is amazingly powerful and efficient, but it is slow to start up, limited in its routes, and frustrating to reschedule or change. If you are looking for a quick, agile vehicle to deliver packages, a freight train might be your least effective option. But if you are moving a mountain of materials every day, the train could be your best, and perhaps only, option.

In general, you will know you have a big data scenario when your:

  • Data velocity increases 100 times -- from thousands of transactions per hour to hundreds of thousands
  • Data volume increases 100 times -- from millions of rows to hundreds of millions
  • Data variety increases 100 times -- from dozens of data sources to hundreds

At those points of data throughput, your frustration with -- and the limitations of -- traditional SQL databases will grow to a tipping point. You will feel you are moving a growing mountain of data with a hand shovel and only getting further behind. If you aren't absolutely drowning in data, you can probably deal with your data in SQL tools with some tuning and good architecture.

The Best Tool Depends on the Job

Traditional SQL ETL and reporting tools, used within the data warehouse architecture, are best suited for primary business outcomes, such as sales, payments (or other transactions), account sign-ups, and unsubscribes.

Big data tools, used in a data lake architecture, are ideally suited for secondary business events that track the detailed (but often meaningless or redundant) steps on a customer journey or repeated (and often meaningless) messages from an Internet-connected device, such as:

  • Browsing history
  • Mobile app in-app actions
  • Device-activity monitoring
  • GPS location tracking

A data warehouse is the ideal destination for summarized trends from such secondary business events, aggregated into models that reflect the business processes.

Using Big Data with the Data Warehouse

One example of how big data tools can complement a data warehouse is an alarm company with Internet-connected sensors in homes across the country. There would be little value (and huge expense) in storing each sensor response in a SQL data warehouse, but that data could be retained in cheap storage in a data lake environment and then aggregated for use in the data warehouse. For instance, the company could define combinations of sensor device events that constitute a person locking up a home and departing. That aggregated event could be stored in the data warehouse in a fact table that records arrivals and departures.

Here are four more use cases for using big data tools to stage data for a data warehouse.

1. Summarize and filter IoT data into fact tables. A large national bed manufacturer is now including biometric sensors in their high-end mattresses. The individual sensor readings could be kept in a data lake (using storage such as Apache Hadoop). Using a tool such as Apache Spark to aggregate and filter the signals, the data warehouse could be populated with aggregated data to create time-trended reports and log alerts when boundary metrics are exceeded.

2. Merge live data with historical data. Financial institutions need real-time access to market data such as interest rates, but they also need to store that market data and show it in the context of historical trends. A tool such as Apache Kafka or Amazon Kinesis could facilitate this integration between the two sets of data. There is no scheduled batch process to delay the information, and data is streamed directly to the visualization tool.

3. ETL based on continuous training of data science models (i.e., machine learning ETL). Internet retailers continue to refine their models for segmenting and targeting customers. These techniques can be applied in Web analytics tools (such as Adobe's Marketing Cloud) to drive Web content. They can also be captured in the data warehouse to shape reporting and forecasting. Data warehouse dimensions can contain hierarchies and attributes that are built dynamically from statistical models, and those values can be modified over time.

4. Sessionization of clickstream, GPS tracking, or device monitoring data. A trucking or delivery company may collect GPS and delivery event data 24 hours a day, but most of that data is of little value as individual data points. The goal is to group those events into trips to show the overall statistics for distance traveled, timeliness of delivery, and other key metrics. Grouping all the event data into distinct trips is a difficult and resource-intensive process that, at high volumes, requires parallel processing in a tool such as Spark. The final trip metrics can then be loaded into a fact table in the data warehouse.

A Final Word

Your goal is to have the best of both sides of the data pipeline -- by collecting as much raw data as possible about any customer or business activity but selecting and organizing the final business outcomes into a data warehouse designed for business decision making. By using the right tool for the right job, you won't bog down your database server with an unmanageable mountain of staged raw data, and you won't impede reporting and business decision by trying to drive it with freight train. A data lake that leverages big data tools feeding into a data warehouse based on SQL is a smart way to keep all of your stakeholders happy.

Stuart Payne

Talks About - Business Transformation, Organisational Change, Business Efficiency, Sales, Scalability & Growth

3y

Great postNoam, maybe we should connect!

Like
Reply
Cherry Birch

Financial Training | Business Finance Training | Business Acumen | Financial Understanding | Financial Wellness

5y

Business can be a competitive market, great to have your insights around big data to get the edge!

To view or add a comment, sign in

More articles by Noam Zeigerson

  • Why are we still talking about AI?

    Why are we still talking about AI?

    A new generation of neo banks is disrupting the financial space, growing at pace and exploring a range of new…

    7 Comments
  • Reality Check: Can The Banking Industry Make AI Pay Off?

    Reality Check: Can The Banking Industry Make AI Pay Off?

    Banks and credit unions are ramping up investments in data analytics, artificial intelligence and machine learning, but…

    5 Comments
  • Three Machine Learning Misconceptions CEOs Should Know

    Three Machine Learning Misconceptions CEOs Should Know

    There is an epidemic in machine learning due largely to a few core misconceptions. As CEO, your team is giving you a…

  • Unleash the Power of Big Data to Increase Productivity

    Unleash the Power of Big Data to Increase Productivity

    With the rise of digital technologies and internet of things (IoT), enterprises today are deluged with a huge amount of…

  • AI Adapts to Human Behavior Rules

    AI Adapts to Human Behavior Rules

    The nature of machine learning operations mean they will actually deepen some of our problematic behaviors and…

    1 Comment
  • What is business innovation?

    What is business innovation?

    To thrive in a competitive landscape, businesses must be willing to adapt and change - but what is business innovation…

  • What does it take to be a ‘Chief Data Officer’ ?

    What does it take to be a ‘Chief Data Officer’ ?

    What does the role of the CDO entail and how can we succeed? Researchers at Gartner estimate that 90 per cent of…

    2 Comments
  • Industries for a machine learning transformation in 2019

    Industries for a machine learning transformation in 2019

    Machine learning made a big splash in 2018, and companies are expected to continue or increase their investments in…

  • Back to Business

    Back to Business

    I send my warm Congrats to all my friends for the end of the summer vacation! Hope you had a wonderful time with your…

Insights from the community

Others also viewed

Explore topics