How to Tune a Teradata System
Tera-Tom here! Teradata was started back in 1979 with the dream of designing a parallel processing system that could house a Terabyte of data. The experts of the day said that would be nearly impossible and they often reported these facts.
Think of it like this. If you lived for a million seconds (megabyte) you would live for 11.5 days. If you lived for a trillion seconds (terabyte) you would live for 32,688 years.
Teradata sold their first system, named the DBC 1012, which stands for Data Base Computer 10 to the 12th power (in other words, one trillion).
Teradata’s success has been well-documented. They have 1,400 customers in 77 countries. They also have:
- 18 of the top 20 global commercial and savings banks
- 19 of the top 20 telecommunications companies
- All of the top six airlines
- 11 of the top 20 healthcare companies
- 15 of the top 20 global retailers
- 14 of the top 20 travel/transportation companies
- 13 of the top 20 manufacturing companies
Teradata took the time to ensure that all queries use parallel processing. They came up with a brilliant way to spread the data and retrieve it quickly. They built the system for continued growth and there has always been less of a need for DBAs because Teradata was designed to let the system do most of the work itself.
Teradata systems are also great at loading massive amounts of data. Systems became so large that Teradata was smart to focus on load utilities, so they could load millions of rows extremely quickly. Those load utilities are now part of what Teradata calls TPT. That stands for Teradata Parallel Transport. These load utilities are versatile and exactly what the doctor ordered, but they are also difficult to create unless you are an expert.
What else is brilliant about Teradata is that they use nearly every technique possible for queries to be tuned. There is always good news and bad news. The good news is that Teradata systems can be tuned using dozens of techniques. The bad news is that you need experienced Teradata experts to physically utilize tuning techniques. These systems are not a “load and go” type system like Netezza or Amazon Redshift where there is little tuning needed.
I am going to get you started with the most important fundamentals you need to know to tune a Teradata system. When I say tuning, I mean the many options Teradata provides for creating tables and indexes, sorting and storage in order to make queries run faster. Here is a list of techniques to tune your Teradata system.
The Teradata Architecture
Teradata was born to be parallel, and with each query, a single step is performed in parallel by each AMP. A Teradata system consists of a series of AMPs that will work in parallel to store and process your data. AMPs are told exactly what to do by the Parsing Engine’s (PE) plan. There will be multiple PE’s in a Teradata system with each PE commanding all of the AMPs. The PE is the Teradata Optimizer and it takes user requests and parses the SQL to get the best performance tuning and performance optimization. The PE and AMPs communicate amongst themselves via either BYNET 0 or BYNET 1 and the AMPs write and read data from their disks via the BYNET. The PE and the AMPs work together to keep user’s SQL in check because each user is assigned spool space limits so that if a query goes over its allocated limit the PE will abort the query. This check is vital in case a user accidentally performs a product join, which can result in a runaway query.
Almost every MPP system including Greenplum, Amazon Redshift, Azure SQL Data Warehouse all have these components. They don’t call them the PE and AMPs, but you will hear of a Host instead of a PE and instead of AMPs you have Slices (Greenplum and Amazon), Distributions (Azure SQL Data Warehouse) and SPUs for Netezza. What is important to know is that fundamentally each table created and loaded on a Teradata system is designed so all AMPs own a portion of the data. The more AMPs you have the more powerful the system.
A Table can have only one Primary Index (Distribution Key)
Primary Index – A table will designate at least one column to be the Primary Index. This is called the Distribution Key in other systems. This is different than a Primary Key. A Primary Key is used to logically model the tables that join together using a Primary Key/Foreign Key relationship. [Read More]