Architecture Talk-2: Teradata - Vantage Architecture (MPP - Massive Parallel Processing)

Karthikeyan Thanikachalam

Aspiring Head of Data & AI Platform | Author | Generative AI Evangelist| Senior Data Architect | Cloud Migration Specialist | Cloud Certified Professional - 5x | Teradata Vantage | GCP | Azure | AWS | GenAI | AI & ML

Published Oct 24, 2023

Teradata’s architecture is designed around a Massively Parallel Processing (MPP), shared-nothing architecture, which enables high-performance data processing and analytics. The MPP architecture distributes the workload into multiple vprocs or virtual processors. The virtual processor where query processing takes place is commonly referred to as an Access Module Processor (AMP). Each AMP is isolated from other AMPs, and processes the queries in parallel allowing Teradata to process large volumes of data rapidly.

The major architectural components of the Teradata Vantage engine include the Parsing Engines (PEs), BYNET, Access Module Processors (AMPs), and Virtual Disks (Vdisks). Vdisks are assigned to AMPs in enterprise platforms, and to the Primary Cluster in the case of VantageCloud Lake environments.

Teradata Vantage Engine Architecture Components

The Teradata Vantage engine consists of the components below:

Parsing Engines (PE)

When a SQL query is run in Teradata, it first reaches the Parsing Engine. The functions of the Parsing Engine are:

Manage individual user sessions (up to 120).
Check if the objects used in the SQL query exist.
Check if the user has required privileges against the objects used in the SQL query.
Parse and optimize the SQL queries.
Prepare the execution plan to execute the SQL query and passes it to the corresponding AMPs.
Receive the response from the AMPs and send it back to the requesting client.

BYNET

BYNET is a system that enables component communication. The BYNET system provides high-speed bi-directional broadcast, multicast, and point-to-point communication and merge functions. It performs three key functions: coordinating multi-AMP queries, reading data from multiple AMPs, regulating message flow to prevent congestion, and processing platform throughput. These functions of BYNET make Vantage highly scalable and enable Massively Parallel Processing (MPP) capabilities.

Parallel Database Extension (PDE)

Parallel Database Extension (PDE) is an intermediary software layer positioned between the operating system and the Teradata Vantage database. PDE enables MPP systems to use features such as BYNET and shared disks. It facilitates the parallelism that is responsible for the speed and linear scalability of the Teradata Vantage database.

Access Module Processor (AMP)

AMPs are responsible for data storage and retrieval. Each AMP is associated with its own set of Virtual Disks (Vdisks) where the data is stored, and no other AMP can access that content in line with the shared-nothing architecture. The functions of AMP are:

Access storage using Vantage’s Block File System Software
Lock management
Sorting rows
Aggregating columns
Join processing
Output conversion
Disk space management
Accounting
Recovery processing

Node

A node, in the context of Teradata systems, represents an individual server that functions as a hardware platform for the database software. It serves as a processing unit where database operations are executed under the control of a single operating system. When Teradata is deployed in a cloud, it follows the same MPP, shared-nothing architecture but the physical nodes are replaced with virtual machines (VMs).

Virtual Disks (Vdisks)

These are units of storage space owned by an AMP. Virtual Disks are used to hold user data (rows within tables). Virtual Disks map to physical space on a disk.

Architecture Talk-2: Teradata - Vantage Architecture (MPP - Massive Parallel Processing)

Karthikeyan Thanikachalam

Aspiring Head of Data & AI Platform | Author | Generative AI Evangelist| Senior Data Architect | Cloud Migration Specialist | Cloud Certified Professional - 5x | Teradata Vantage | GCP | Azure | AWS | GenAI | AI & ML

Teradata Vantage Engine Architecture Components

Parsing Engines (PE)

BYNET

Parallel Database Extension (PDE)

Access Module Processor (AMP)

Node

Virtual Disks (Vdisks)

Recommended by LinkedIn

Teradata Vantage Architecture Concepts

Linear Growth and Expandability

Teradata Parallelism

Teradata Retrieval Architecture

Teradata Data Distribution

More articles by this author

Insights from the community

Others also viewed

Data Engineering Questions!

What is Data Pipeline Architecture?

The Databricks Data Lakehouse

Data Events: Trust, Transactions and ACID Properties

Space-Based Architecture: Resolving Data Consistency, Performance, and Scalability Challenges in Distributed Systems

Why Open Table Formats and Apache Iceberg Are Reshaping Data Engineering

Apache NiFi: A Comprehensive Guide to Data Integration and Flow Automation

Data Mesh: The Dark Side Of The New Data Hype

Application Design: Key Principles For Data-Intensive App Systems

Data Virtualization: Strategies for a 'Zero ETL' Future

Explore topics

Teradata Vantage Engine Architecture Components

Parsing Engines (PE)

BYNET

Parallel Database Extension (PDE)

Access Module Processor (AMP)

Node

Virtual Disks (Vdisks)

Recommended by LinkedIn

Teradata Vantage Architecture Concepts

Linear Growth and Expandability

Teradata Parallelism

Teradata Retrieval Architecture

Teradata Data Distribution

ZERO-ETL

Nov 12, 2024

Generative AI Tools

Oct 25, 2024

6 GenAI Use cases - High Level Architecture (well Explained)

Oct 23, 2024

Free AI Courses With Certificates For High-Income Skills In 2024

Sep 2, 2024

Architecture Talk-1: MongoDB - Sharding Architecture

Oct 11, 2023

Bring your data to life with Microsoft generative AI

Oct 9, 2023

SNOWPARK: BUILDING BETTER DATA PIPELINES AND MODELS IN THE DATA CLOUD

Oct 9, 2023

Generative AI on Google Cloud

Oct 2, 2023

Generative AI - new GenAI innovations powered by AWS

Oct 1, 2023

Insights from the community

Others also viewed

Data Engineering Questions!

What is Data Pipeline Architecture?

The Databricks Data Lakehouse

Data Events: Trust, Transactions and ACID Properties

Space-Based Architecture: Resolving Data Consistency, Performance, and Scalability Challenges in Distributed Systems

Why Open Table Formats and Apache Iceberg Are Reshaping Data Engineering

Apache NiFi: A Comprehensive Guide to Data Integration and Flow Automation

Data Mesh: The Dark Side Of The New Data Hype

Application Design: Key Principles For Data-Intensive App Systems

Data Virtualization: Strategies for a 'Zero ETL' Future

Explore topics