One Petaflop
In recent discussions around big compute I was asked what a Petaflop (PFlop) would look like on Azure. Now, I had heard the term before and knew it was used to describe a considerable capacity of compute. So what is a petaflop (PFlop) and how do we convert this into something that we can all relate to?
What is a Petaflop (PFlop)
A petaflop is a measure of a computer’s processing speed and can be expressed as:
- A quadrillion (thousand trillion) floating point operations per second (FLOPS)
- A thousand teraflops
- 10 to the 15th power FLOPS
- 2 to the 50th power FLOPS
The Formula
Source: How to calculate peak theoretical performance of a CPU-based HPC system
GFlops = (CPU speed in GHz) x (Number of CPU cores) x (CPU instruction per cycle) x (number of CPUs per node) TFlops = (GFlops / 1000) PFlops = (TFlops / 1000)
Where to take a number of CPU instructions per cycle?
- Intel X5600 series CPUs and AMD 6100/6200/6300 series CPUs have 4 instructions per cycle
- Intel E5-2600v1 and E5-2600v2 series CPUs have 8 instructions per cycle
- Intel E5-2600v3 series CPUs have 8 instructions per cycle
What Does 1 PFlop Represent on Azure
As of February 2016, Azure has a few machine configurations. Microsoft has created the Azure Compute Unit (ACU) to provide a way of comparing compute (CPU) performance across Azure SKUs. This helps us easily identify which SKU is most likely to satisfy our performance needs.
SKU Family | ACU/Core |
Standard_A0 (Extra Small) | 50 |
Standard_A1-4 (Small – Large) | 100 |
Standard_A5-7 | 100 |
A8-A11 | 225 * |
D1-14 | 160 |
D1-14v2 | 210 – 250 * |
DS1-14 | 160 |
G1-5 | 180 – 240 * |
GS1-5 | 180 – 240 * |
ACUs marked with a * use Intel® Turbo technology to increase CPU frequency and provide a performance boost. The amount of the boost can vary based on the VM size, workload, and other workloads running on the same host.
Let’s do The Math
These calculations are the theoretical maximum. Real-world “FLOPS” will vary based on parallelization, network communication, the use of technologies like RDMA, GPUs and other factors.The details used for this post have been pulled from various sources on the Internet. Some details may not be exact, please verify. Furthermore, please note that Intel® Turbo technology was not taken into consideration.
For this exercise, let’s focus on SKUs A9, D14 V2, G5 and N21.
An A9 is an Intel Xeon E5-2670 and has 16 cores @ 2.6 GHz (Supports RDAM)
A D14 V2 is an Intel Xeon E5-2673v3 and has 16 cores @ 2.4 GHz
A G5 is 2 Intel Xeon E5-2698Bv3 with 16 cores @ 2.00 GHz
A N21 is 2 Intel E5-2690v3 with 12-cores @ 2.6GHz (Supports RDMA)
A9 | D14 V2 | G5 | N21 | |
GFlops | 332.8 | 307.2 | 512 | 665.6 |
PFlops | 0.0003328 | 0.0003072 | 0.000512 | 0.0006656 |
Aprox 1 PFlop | 3,005 VMs | 3,256 VMs | 1954 VMs | 1503 VMs |
Gaining more Perspective
Have a look at this top 10 from the top 500 Supercomputer Sites published in June 2015. When we start referring to compute in Petaflops, we’re entering a whole new world of compute.
For the fifth consecutive time, Tianhe-2, a supercomputer developed by China’s National University of Defense Technology, has retained its position as the world’s No. 1 system, according to the 45th edition of the twice-yearly TOP500 list of the world’s most powerful supercomputers. Tianhe-2, which means Milky Way-2, led the list with a performance of 33.86 petaflop/s (quadrillions of calculations per second) on the Linpack benchmark… more