How to Tune a Teradata System

Tom Coffing

Published Mar 22, 2019

Tera-Tom here! Teradata was started back in 1979 with the dream of designing a parallel processing system that could house a Terabyte of data. The experts of the day said that would be nearly impossible and they often reported these facts.

Think of it like this. If you lived for a million seconds (megabyte) you would live for 11.5 days. If you lived for a trillion seconds (terabyte) you would live for 32,688 years.

Teradata sold their first system, named the DBC 1012, which stands for Data Base Computer 10 to the 12th power (in other words, one trillion).

Teradata’s success has been well-documented. They have 1,400 customers in 77 countries. They also have:

18 of the top 20 global commercial and savings banks
19 of the top 20 telecommunications companies
All of the top six airlines
11 of the top 20 healthcare companies
15 of the top 20 global retailers
14 of the top 20 travel/transportation companies
13 of the top 20 manufacturing companies

Teradata took the time to ensure that all queries use parallel processing. They came up with a brilliant way to spread the data and retrieve it quickly. They built the system for continued growth and there has always been less of a need for DBAs because Teradata was designed to let the system do most of the work itself.

Teradata systems are also great at loading massive amounts of data. Systems became so large that Teradata was smart to focus on load utilities, so they could load millions of rows extremely quickly. Those load utilities are now part of what Teradata calls TPT. That stands for Teradata Parallel Transport. These load utilities are versatile and exactly what the doctor ordered, but they are also difficult to create unless you are an expert.

What else is brilliant about Teradata is that they use nearly every technique possible for queries to be tuned. There is always good news and bad news. The good news is that Teradata systems can be tuned using dozens of techniques. The bad news is that you need experienced Teradata experts to physically utilize tuning techniques. These systems are not a “load and go” type system like Netezza or Amazon Redshift where there is little tuning needed.

I am going to get you started with the most important fundamentals you need to know to tune a Teradata system. When I say tuning, I mean the many options Teradata provides for creating tables and indexes, sorting and storage in order to make queries run faster. Here is a list of techniques to tune your Teradata system.

The Teradata Architecture

Teradata was born to be parallel, and with each query, a single step is performed in parallel by each AMP. A Teradata system consists of a series of AMPs that will work in parallel to store and process your data. AMPs are told exactly what to do by the Parsing Engine’s (PE) plan. There will be multiple PE’s in a Teradata system with each PE commanding all of the AMPs. The PE is the Teradata Optimizer and it takes user requests and parses the SQL to get the best performance tuning and performance optimization. The PE and AMPs communicate amongst themselves via either BYNET 0 or BYNET 1 and the AMPs write and read data from their disks via the BYNET. The PE and the AMPs work together to keep user’s SQL in check because each user is assigned spool space limits so that if a query goes over its allocated limit the PE will abort the query. This check is vital in case a user accidentally performs a product join, which can result in a runaway query.

Almost every MPP system including Greenplum, Amazon Redshift, Azure SQL Data Warehouse all have these components. They don’t call them the PE and AMPs, but you will hear of a Host instead of a PE and instead of AMPs you have Slices (Greenplum and Amazon), Distributions (Azure SQL Data Warehouse) and SPUs for Netezza. What is important to know is that fundamentally each table created and loaded on a Teradata system is designed so all AMPs own a portion of the data. The more AMPs you have the more powerful the system.

A Table can have only one Primary Index (Distribution Key)

Primary Index – A table will designate at least one column to be the Primary Index. This is called the Distribution Key in other systems. This is different than a Primary Key. A Primary Key is used to logically model the tables that join together using a Primary Key/Foreign Key relationship. [Read More]

How to Tune a Teradata System

Tom Coffing

The Teradata Architecture

More articles by this author

Insights from the community

Others also viewed

The Power of Vector Search with Exadata & Oracle DB 23c AI: TCO, ROI, and Use Cases

The Future is Open

Unlocking Data Movement and Integration: A Comprehensive Guide on ADF

Spanner: Google’s Globally-Distributed Database

Big Software Market – Major Technology Giants in Buzz Again| Oracle, Microsoft, Broadcom

How DynamoDB Makes Any Query Super Efficient At Any Scale

Insert, Update and Delete Tricks

4 mistakes to avoid when building lakehouse based solutions on AWS

Time-to-Live (TTL) Best Practices in DynamoDB

What the Heck is Apache SeaTunnel?

Explore topics

The Teradata Architecture

Best Book on Snowflake Now Available

Jun 29, 2022

Mission Impossible: The Information Highway!

Jun 13, 2022

Nexus - 1,000 Features on 1,000 Databases - Pivot

Jan 25, 2020

Snowflake Analytics - Part 2 - Rank

Jan 19, 2020

Am I Crazy? ETL for Everyone!

Jan 16, 2020

Snowflake Analytics - Part 1

Jan 15, 2020

Dell – Why I Took the Leap of Faith

Dec 19, 2019

Start Moving Data Within the Hour

Dec 18, 2019

How to Use the Nexus Cache Tree (and Why Search Capabilities Are Priceless)

Apr 12, 2019

Amazon Redshift Tutorial: Learning Through Puzzles and Examples (Part One)

Apr 10, 2019

Insights from the community

Others also viewed

The Power of Vector Search with Exadata & Oracle DB 23c AI: TCO, ROI, and Use Cases

The Future is Open

Unlocking Data Movement and Integration: A Comprehensive Guide on ADF

Spanner: Google’s Globally-Distributed Database

Big Software Market – Major Technology Giants in Buzz Again| Oracle, Microsoft, Broadcom

How DynamoDB Makes Any Query Super Efficient At Any Scale

Insert, Update and Delete Tricks

4 mistakes to avoid when building lakehouse based solutions on AWS

Time-to-Live (TTL) Best Practices in DynamoDB

What the Heck is Apache SeaTunnel?

Explore topics