How Tesla is using a supercomputer to train its self-driving tech

Tesla's approach to autonomy is controversial: It relies on just cameras to see and understand the roads.
The supercomputer cluster has 5,760 GPUs—processing power it needs to help power its self-driving aspirations. Tesla

You can’t buy a fully self-driving car today, but automakers around the globe are racing to become the first company to place such a vehicle on dealer lots. No two companies are taking the same technological path to achieve this plan, either. Some make use of remote sensing methods like Light Detection and Ranging (LiDAR), while others rely on radar-based sensors to help pick out hard-to-see obstacles in the roadway. And typically, firms working on autonomous tech will use a combination of LiDAR, radar, and cameras. 

Then there’s Tesla, which believes vision-based image recognition using only cameras is the key to affordable and reliable autonomy.

But there’s a catch to Tesla’s method: perfecting vision-based autonomy is difficult. It requires the use of a continuously improving system that can quickly adapt to new and changing road conditions, and then it must be capable of sharing that information with other vehicles on the roadway. That kind of learning takes significantly more processing power than what is available in a single vehicle—it takes a supercomputer.

[Related: Everything self-driving cars calculate before changing lanes]

During a talk at the International Joint Conference on Computer Vision and Pattern Recognition earlier this month, Tesla’s senior director of AI, Andrej Karpathy, revealed that the automaker has been working on a project to do exactly that.

Tesla’s new supercomputer hasn’t been named, at least not publicly. The cluster itself consists of 720 individual computers called nodes. Each node has eight Nvidia A100 80GB Graphics Processing Units (GPUs) capable of performing high-intensity floating point calculations with nearly 500 times as much power compared to a standard desktop processor.

In total, the cluster has 5,760 GPUs, or enough hardware to achieve an insane 1.8 exaflops of processing power. Karpathy believes this makes Tesla’s supercomputer the fifth most powerful computing environment in the entire world, at least on paper.

Modern Teslas utilize an advanced driver assistance system called Autopilot. This suite of features allows the vehicle to make use of eight exterior-facing cameras to gather data about the vehicle’s surroundings and, when engaged and where applicable, performs lateral (steering) and longitudinal (acceleration and braking) controls under driver supervision. While this shouldn’t be confused with Waymo’s advanced self-driving, it is an interim step that uses partial automation to bridge the gap between manual driving and fully autonomous control.

[Related: How Waymo is teaching self-driving cars to deal with the chaos of parking lots]

Autopilot uses information gathered from all Tesla vehicles on the road to improve its driving decisions. As a Tesla steers along the street, its exterior cameras are constantly gathering data on the outside environment. Computers within the car study this data and make predictions of how to behave in any given scenario without actually sending controls to the vehicle itself.

This information is shared on a machine learning architecture called a neural network. The predictions are then recorded and sent back to Tesla to determine if the decision was correct or if any data was misidentified. If it was, then the data then continually runs through the supercomputer tweaking its behavior until it processes without a mistake, effectively training Tesla’s ever-improving Autopilot model.

[Related: Intel’s new chip puts a teraflop in your desktop. Here’s what that means]

This method not only consumes a large amount of processing power, but it also requires significant storage in order to stockpile the one million 10-second clips used to make up the proprietary Tesla dataset training for Autopilot. These clips alone require 1.5 petabytes of storage, whereas the system itself is capable of hoarding approximately 10 petabytes of data on ultra-fast NVMe flash storage.

Relatedly, Tesla CEO Elon Musk has previously teased “Project Dojo,” a supercomputer built on proprietary Tesla silicon specifically architectured for neural net model training. Musk noted that establishing high speed communication between components and efficient cooling was an ongoing challenge in late 2020, though the project was ongoing.

Because Karpathy’s cluster uses Nvidia-based GPUs, it doesn’t appear to be affiliated with Project Dojo. However, it still plays an important role in Tesla’s ultimate goal of being the first automaker capable of a fully self-driving vehicle on public roads.

Tesla’s rather ambitious goal has been met with quite a bit of skepticism by industry leaders and naysayers of vision-only vehicle autonomy, especially since the automaker rejected the use of ultra-precision LiDAR as part of its autonomy suite.

[Related: This supercomputer will perform 1,000,000,000,000,000,000 operations per second]

No Tesla vehicle on the road today makes use of LiDAR. In fact, Elon Musk called LiDAR a “crutch” in 2018, denouncing the technology in favor of Tesla’s own vision-based system before doing away with supplemental radars earlier this year. That decision alone cost Tesla safety endorsements from the National Highway Traffic Safety Administration.

Meanwhile, Volvo has chosen to implement LiDAR as a standard feature on the upcoming successor to its XC90 SUV. 

As for Tesla, its current-generation supercomputer will help to train its Autopilot model, and its upcoming Project Dojo likely even moreso. But only time will tell if its vision-based technology will prevail over competitors, meaning that it’s a gambit that could make or break its position as a leader in the autonomy segment.