What is high performance computing?

High performance computing (HPC) is the practice of aggregating computing resources to gain performance greater than that of a single workstation, server, or computer. HPC can take the form of custom-built supercomputers or groups of individual computers called clusters. HPC can be run on-premises, in the cloud, or as a hybrid of both. Each computer in a cluster is often called a node, with each node responsible for a different task. Controller nodes run essential services and coordinate work between nodes, interactive nodes or login nodes act as the hosts that users log in to, either by graphical user interface or command line, and compute or worker nodes execute the computations. Algorithms and software are run in parallel on each node of the cluster to help perform its given task. HPC typically has three main components: compute, storage, and networking.

HPC allows companies and researchers to aggregate computing resources to solve problems that are either too large for standard computers to handle individually or would take too long to process. For this reason, it is also sometimes referred to as supercomputing. 

HPC is used to solve problems in academic research, science, design, simulation, and business intelligence. HPC’s ability to quickly process massive amounts of data powers some of the most fundamental aspects of today’s society, such as the capability for banks to verify fraud on millions of credit card transactions at once, for automakers to test your car’s design for crash safety, or to know what the weather is going to be like tomorrow.

Learn more about HPC at Google.

Types of HPC clusters

High performance computing has three main components:

  • Compute
  • Network
  • Storage

In basic terms, the nodes (compute) of the HPC system are connected to other nodes to run algorithms and software simultaneously, and are then connected (network) with data servers (storage) to capture the output. As HPC projects tend to be large and complex, the nodes of the system usually have to exchange the results of their computation with each other, which means they need fast disks, high-speed memory, and low-latency, high-bandwidth networking between the nodes and storage systems. 

HPC can typically be broken down into two general design types: cluster computing and distributed computing.

Cluster computing

Parallel computing is done with a collection of computers (clusters) working together, such as a connected group of servers placed closely to one another both physically and in network topology, to minimize the latency between nodes. 

Distributed computing

The distributed computing model connects the computing power of multiple computers in a network that is either in a single location (often on-premises) or distributed across several locations, which may include on-premises hardware and cloud resources.

In addition, HPC clusters can be distinguished between homogeneous vs heterogeneous hardware models. In homogenous clusters, all machines have similar performance and configuration, and are often treated as the same and interchangeable. In heterogeneous clusters, there is a collection of hardware with different characteristics (high CPU core-count, GPU-accelerated, and more), and the system is best utilized when nodes are assigned tasks to best leverage their distinct advantages.

How do HPC jobs work?

Workloads in an HPC environment typically come in two different types: loosely coupled and tightly coupled.

Loosely coupled workloads (often called parallel or high throughput jobs) consist of independent tasks that can be run at the same time across the system. The tasks may share common storage, but they are not context-dependent and thus do not need to communicate results with each other as they are completed. An example of a loosely coupled workload would be rendering Computer Generated Imagery (CGI) in a feature film, where each frame of the video is rendered independently of the other frames, despite them sharing the same input data like backgrounds and 3D models.

Tightly coupled workloads consist of many small processes, each handled by different nodes in a cluster, which are dependent on each other to complete the overall task. Tightly coupled workloads usually require low-latency networking between nodes and fast access to shared memory and storage. Interprocess communication for these workloads is handled by a Message Passing Interface (MPI), using software such as OpenMPI and Intel MPI. An example of a tightly coupled workload would be weather forecasting, which involves physics-based simulation of dynamic and interdependent systems involving temperature, wind, pressure, precipitation, and more. Here, each cluster node may compute partial solutions to different weather factors, contributing to the overall forecast.

HPC in the cloud

HPC can be performed on-premises with dedicated equipment, in the cloud, or a hybrid of each.

HPC in the cloud offers the benefit of flexibility and scalability without having to purchase and maintain expensive dedicated supercomputers. HPC in the cloud provides all the necessary infrastructure needed to perform large, complex tasks such as data storage, networking solutions, specialized compute resources, security, and artificial intelligence applications. Workloads can be performed on demand, which means that organizations can save money on equipment and time on computing cycles, only using the resources they need, when they need them. 

Some common considerations when choosing to run HPC in the cloud include:

Latency and bandwidth: With the amount of data running in HPC workloads, cloud providers need to provide robust networking capabilities (>100 GB/s) with low latency.

Performance: HPC in the cloud works best with providers that constantly update systems to optimize performance, especially in computer processors, storage solutions, and networking capabilities.

Sustainability: HPC is a resource intensive form of computing, requiring much more electricity than normal workloads. On-premises high performance computers can cost millions of dollars a year in energy. Public clouds that prioritize renewable energy—such as Google Cloud—can mitigate the energy impacts of HPC. 

Storage: Given the size of most HPC tasks, scalable data storage is an important consideration when running HPC workloads. Cloud providers that can easily store and manage large amounts of data (such as through Cloud Storage Filestore High Scale, or DDN EXAScaler) have an advantage in HPC.

Security: A cloud provider with a privately managed global infrastructure ensures that data and applications are the least exposed to the public internet. Virtual private cloud (VPC) networks enable connectivity between nodes and can configure firewalls for HPC applications. Confidential Computing features allow encryption in use, as well as encryption at rest and in flight.

Benefits of HPC in the cloud

Speed and performance

High performance computing can process data and tasks much faster than a single server or computer. Tasks that could take weeks or months on a regular computing system can take hours in HPC.

Flexibility and efficiency

With HPC in the cloud, workloads can be scaled up or down depending on need. With a robust internet connection, HPC can be accessed from anywhere on the globe. 

Cost savings

Because of the speed, flexibility, and efficiency of HPC in the cloud, organizations can save time and money on computing resources and labor hours.

Fault tolerance

If one node of an HPC cluster fails, the system is resilient enough that the rest of the system does not come crashing down. Given the large and complex tasks performed by HPC, fault tolerance is a big advantage. 

Accelerated R&D

HPC provides an advantage to companies performing research and development by speeding up the results of data-intensive projects, such as pharmaceutical modeling, designing new machines and parts, or simulating experiments to reduce physical testing.

Initial cost

On-premises HPC clusters and supercomputers have high initial costs. On-premises HPC would be out of reach for most organizations after factoring in the cost of equipment, labor, software, and configuration.

Energy consumption

The energy costs of on-premises supercomputer installations can be large. For environmentally and cost conscious companies, HPC energy consumption can be sustainable by running HPC on the world’s cleanest cloud

Maintenance 

HPC runs best when on the latest generation of hardware and optimized software. Keeping an on-premises HPC cluster or supercomputer up to date to ensure optimal performance can quickly become a large and ongoing expense.

Why is HPC important?

The modern ecosystem is flooded with data and compute-intensive tools to analyze it. HPC allows businesses and organizations to process all of that data in a timely manner, fueling new insights, innovations, and scientific discoveries. This allows companies to forecast business scenarios, predict market fluctuations, and make recommendations. The medical field, meanwhile, is being transformed with easy access to HPC in the cloud, helping to model potential outbreaks, decode the genome of cancerous cells, and understand how diseases evolve.

In short, HPC is accelerating the rate at which scientific, technological, and business progress is made, helping humanity craft a more prosperous future.

Solve your business challenges with Google Cloud

New customers get $300 in free credits to spend on Google Cloud.
Talk to a Google Cloud sales specialist to discuss your unique challenge in more detail.

Uses for HPC

Here are some of the use cases for high performance computing.

Research

HPC is used in academic and scientific research to perform analysis and computation of large datasets, such as astronomical data from satellites and telescopes, creating new materials, discovering new drugs, or modeling proteins.

Simulation

HPC is used to simulate physical scenarios such as automobile collisions, airflow over airplane wings or inside of engines, or how potential new drugs interact with human cells.

Design

Manufacturers often use HPC and artificial intelligence to design new machines such as planes and automobiles in software before building physical prototypes. Without the computing power of HPC, designing and rendering potential models would take much longer and slow down the manufacturing process. Computer chip manufacturers use HPC to model new chip designs before prototyping them in the foundry.

Optimization

HPC can help optimize large and difficult datasets, such as financial portfolios or the most efficient routes for shipping and logistics.

Forecasting

HPC can take large, complex datasets and make predictions in a timely manner. Many aerospace companies use HPC to predict when their machines will require maintenance. Most weather forecasting is performed with high performance computing, allowing meteorologists to predict the paths of storms, or model climate change. 

Data analysis

HPC can analyze the largest of datasets. When applied with machine learning and artificial intelligence applications, HPC can help make recommendations or perform fraud detection for credit cards. HPC has greatly increased the speed at which genomes and can be sequenced.

Take the next step

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Google Cloud
  翻译: