See all Fireship transcripts on Youtube

youtube thumbnail

CPU vs GPU vs TPU vs DPU vs QPU

8 minutes 24 seconds

🇬🇧 English

S1

Speaker 1

00:00

In faraway lands, slaves dig the earth for beautiful gems called quartz, which contain silicon dioxide. Alchemists, or chemical engineers, then refine and cook them into silicon substrate, a material that can be doped to act as both a conductor and insulator. Shamans, also known as electrical engineers, then inscribe billions of microscopic symbols on them that can't be seen with the naked eye. When lightning passes through them, they can speak the incomprehensible language of binary.

S1

Speaker 1

00:25

Highly trained wizards called software engineers can learn this language to build powerful machines that create illusions. These illusions can then control the way people think and act in the real world. In today's illusion I will harness this magic to pull back the veil on the almighty computer by looking at 4 different ways computers actually compute things at the hardware level. Because to poot, a computer needs a PU, like a CPU, GPU, TPU, or DPU.

S1

Speaker 1

00:48

The last 100 years have been crazy. The first truly programmable computer was the Z1, which was created by Konrad Zuse in 1936 in his mom's basement. But then it got blown up in 1943 during the bombardment of Berlin. It's entirely mechanical with over 20, 000 parts.

S1

Speaker 1

01:01

It represents binary data with sliding metal sheets. It could do things like Boolean algebra and floating point numbers, and had a clock rate of 1 Hertz, which means it could execute 1 instruction per second. To put that in perspective, modern CPUs are measured in gigahertz, or billions of cycles per second. Over the next 10 years, people thought really hard about how computers should actually work.

S1

Speaker 1

01:20

And in 1945, we got the von Neumann architecture, which is still used in modern chips today. It's the foundational design that describes how data and instructions are stored in the same memory space, then handled by a processing unit. A couple years later, there is a huge breakthrough with the invention of the transistor, which is a semiconductor that can amplify or switch electrical signals. Like a transistor could represent a 1 if current passes through it, or a 0 if current doesn't pass through it.

S1

Speaker 1

01:44

This was a hugely forward, then in 1958, the integrated circuit was developed, allowing multiple transistors to be placed on a single silicon chip. Then finally in 1971, Intel released the first commercially available microprocessor that had all the features you know and love from a modern CPU. It was a 4-bit processor, meaning it could handle 4 bits of data at a time, with approximately 2300 transistors. The clock speed was 740 kHz, which was extremely fast at the time.

S1

Speaker 1

02:10

CPUs are pretty complicated, and if you really want to learn how they work, I'd highly recommend reading CPU.land, which does an amazing job of breaking down how they actually execute programs. It's totally free and was written by high schoolers Lexi Maddock and Hack Club, but what I want to focus on is what they're actually used for, so we can compare them to the other PUs. Like its name implies, the Central processing unit is like the brain of a computer. It runs the operating system, executes programs, and manages hardware.

S1

Speaker 1

02:36

It has access to the system's RAM, and includes a hierarchy of caches on the chip itself for faster data retrieval. A CPU is optimized for sequential computations that require extensive branching and logic. Like imagine some navigation software that needs to run an algorithm to compute the shortest possible route between 2 points. The algorithm may have a lot of conditional logic, like if-else statements, that can only be computed 1 by 1 or sequentially.

S1

Speaker 1

02:59

A CPU is optimized for this type of work. Now modern CPUs also have multiple cores which allows them to do work in parallel, which allows you to use multiple applications on your PC at the same time. And programmers can write code that does multithreading to utilize the cores on your machine to run code in parallel. Check out this video on my second channel if you want to learn how to do that in JavaScript.

S1

Speaker 1

03:18

Now to make a computer faster, 1 might think we could just add more and more CPU cores. The reality though is that CPU cores are expensive. As the cores scale up, so does power consumption and the heat dissipation requirements. It becomes a matter of diminishing returns and the extra complexity is just not worth it.

S1

Speaker 1

03:33

At the time of this video, 24 cores is typically the upper limit of higher end chips, like Apple's M2 Ultra and Intel's i9. But there are massive chips like the 128 core AMD Epyc designed for data centers. Now when it comes to CPUs, there are multiple different architectures out there, and that's a big deal if you're doing low-level systems programming, but every developer should be familiar with ARM and x86. 64-bit x86 is what you'll find on most modern desktop computers, while ARM is what you'll find on mobile devices, because it has a more simplified instruction set, and better power efficiency which means better battery life.

S1

Speaker 1

04:05

However this distinction has been changing over the last few years thanks to the Apple Silicon chips, which have proven that the ARM architecture can also work for high performance computing on laptops and desktops, And even Microsoft is investing in running Windows with ARM. In addition, ARM is becoming more and more popular with cloud providers, like the Neoverse chip or Amazon's Graviton3, which allows the cloud to compute more stuff with less power consumption, which is 1 of the biggest expenses in a data center. But at some point we've all hit the limitations of a CPU. Like when I try to run pirated Nintendo 64 games on my Raspberry Pi, it lags like crazy.

S1

Speaker 1

04:36

That's because a lot of computation is required to calculate the appearance of all the lights and shadows in a game on demand. Well that's where the GPU comes in. A graphics processing unit or graphics card is highly optimized for parallel computing. Unlike a CPU with a measly 16 cores, NVIDIA's RTX 4080 has nearly 10, 000 cores.

S1

Speaker 1

04:54

Each 1 of these cores can handle a floating point or integer computation per cycle, and that allows games to perform tons of linear algebra in parallel to render graphics instantly every time you push a button on your controller. GPUs are also essential for training deep learning models that perform tons of matrix multiplication on large data sets. This has led to massive demand in the GPU market and NVIDIA's stock price recently landed on the moon. So he says, okay, give me $200.

S1

Speaker 1

05:18

I gave him $200 and for $200, I bought 15%, I think it was 20% of NVIDIA. If GPUs have so many cores, why not just use a GPU over a CPU for everything? The short answer is that not all cores are created equal. A single CPU core is far faster than a single GPU core, and its architecture can handle complex logic and branching, whereas a GPU is only designed for simple computations.

S1

Speaker 1

05:43

Most of the code out in the world can't take advantage of parallel computing, and needs to run sequentially with a single thread. A CPU is like a Toyota Camry. It's extremely versatile but can't take you to the moon. A GPU is more like a rocket ship.

S1

Speaker 1

05:54

It's really fast when you want to go in a straight line but not really ideal for going to pick up your groceries. As the name implies, GPUs were originally designed for graphics but nowadays everybody wants them to train an AI that can overthrow the government. But there's actually hardware designed for that use case, called the TPU, or Tensor Processing Unit. These chips are very similar to GPUs, but designed specifically for tensor operations, like the matrix multiplication required for deep learning.

S1

Speaker 1

06:18

They were developed by Google in 2016 to integrate directly with its TensorFlow software. A TPU contains thousands of these things called multiply accumulators that allows the hardware to perform matrix multiplication without the need to access registers or shared memory like a GPU would. And if you have a neural network that's going to take weeks or months to train, a TPU could save you millions of dollars. That's pretty cool, but that brings us to the newest type of PU, the DPU, or Data Processing Unit.

S1

Speaker 1

06:45

The CEO of Nvidia described it as the third major pillar of computing going forward. But you'll likely never use 1 in your own computer, because they're designed specifically for big data centers. They're most like a CPU and typically based on the ARM architecture, but are highly optimized for moving data around. They handle networking functions like packet processing, routing, and security, and also deal with data storage like compression and encryption.

S1

Speaker 1

07:06

The main goal is to relieve the CPU from any data processing jobs, so it can focus on living its best life by doing general-purpose computing. And with that, we've looked at 4 different ways a computer computes. But there's 1 more wildcard that we might get to experience in our lifetime, and that's the QPU, or quantum processing unit. All the chips we've looked at so far deal in bits, ones and zeros.

S1

Speaker 1

07:27

But quantum computers deal in qubits, or quantum bits that can exist in a superposition of both states simultaneously. Now a qubit can represent multiple possibilities at once, but when measured, it collapses into 1 of the possible states based on probability. These qubits are subject to quantum entanglement, which means the state of 1 is directly related to another, no matter the distance between them. These properties are used together to create quantum gates, which are like logic gates in regular computers, but work in entirely different ways that I'm too stupid to understand.

S1

Speaker 1

07:55

What I do understand is that if this technology ever gets good, it will completely change the world. Currently, cryptographic systems like RSA are underpinned by the fact that classical algorithms used for factorization would take billions of years to crack with brute force even with the best computers of today. But quantum computers will be able to run different algorithms like Shor's algorithm that's exponentially faster at factorization, and thus poses a major threat to modern encryption and security. Luckily, there's no quantum computer today that can run this algorithm, and even if there were, they sure as hell wouldn't be telling you and me about it.