How An Open-Source Cloud Storage Company Is Taking On Amazon and Google

The little guy, with big capacity

Backblaze B2 will now support files up to 10 TB.
Backblaze B2 will now support files up to 10 TB.Backblaze

Backblaze's first server was built in a plywood box in 2007. Now the company, which is dedicated to offering personal cloud backups, stores 200 petabytes in total. (That's 200 million gigabytes.)

But this cloud storage company doesn't look like the big names in the business — competitors like Amazon, Microsoft, and Google. Backblaze relies on its own server designs, called Storage Pods, which it personally builds at one-tenth the cost of buying servers. The designs are public, and instructions on how to build them are freely available on Backblaze's site.

In the last two years, Microsoft, Facebook, and most recently Google have joined the Open Compute Project, which is dedicated to open-sourcing these same kinds of designs. But Backblaze has been doing this since 2009, and it's created a community of businesses and tinkerers who rely on their specific builds. Their server designs have been used by everyone from small makers to Netflix.

"You never can put a 10 TB file on one hard drive."

Today, Backblaze is fully stepping into the ring with its largest competitors. Until now, the company has specialized in backing up personal computers, offering cloud storage for large businesses with some restrictive file size caps. The announcement today is that Backblaze will now support individual files up to 10 terabytes in size, as well as a host of APIs and plugins for enterprise customers. Previously, the largest files serviced were 5 GB in size. They call it the Big File Beta.

The bigger companies, Amazon with its S3 service and Microsoft with Azure, have traditionally dominated larger enterprise storage, where companies need terabytes to back up their massive server files. But Backblaze is betting that by being cheaper and more robust in certain areas, it can snag business from these titans. While it costs anywhere from $0.20 to $0.75 per gigabyte to store a file on Google, Amazon, Verizon, or Rackspace, it costs $0.005 on Backblaze.

Right now, 10 terabytes is twice the maximum file size that Amazon and Google offer to store, and ten times larger than Microsoft does. This means that when a company wants to back up their entire database as a single file, it can do it. Or it can keep humongous files on Backblaze's servers, reducing the load on its own.

Backblaze's CEO, Gleb Budman, explains how his company built the servers specifically to handle large files.

"You never can put a 10 TB file on one hard drive," Budman says. "But because we own the entire cloud file system, what we do on the back end is chop up the files into pieces and put them in a different drive in a different storage pod in a different rack."

This method of distributing one file in many pieces across hard drives is called erasure coding, and the method isn't unique to Backblaze. But the size is.

The method isn't unique to Backblaze. But the size is.

Budman says that the servers other companies purchase are different in architecture because they need to be able to operate individually, no matter if the customer buys one server or ten. Since Backblaze knows the servers it builds will just be added to its system, it can optimize for this distributed computing.

When Backblaze receives any file, whether it's 10 kilobytes or 10 terabytes, the company cuts it into 20 pieces. Seventeen of those piece are required to put the file back together. The remaining three contain redundant data, that can be used in case some of the original 17 files are lost. Budman says this system ensures that even if a drive, pod, or even whole server rack is lost, every bit of data can be recovered.

But despite the ferocity of enterprise business, Backblaze is still going to publish all of its designs.

Budman sees the small open-source community as a large part of the reason why Backblaze is successful. Besides the goodwill generated in giving designs to the community, it gets advice in return.

"Whenever we publish a new version of the Storage Pod design, we get lots and lots of comments from people talking about 'Hey, have you tried this?' or 'Have you looked at this component?'" Budman said. "Open sourcing the Storage pods was a risk and a gamble, but it's actually been fantastic."

Recently, the community actually pointed out that one of the components that Backblaze was using, a power button, was more expensive than it should be. After hearing this, the company set out to find a new switch and ended up saving more than $10,000. This is chronicled in a Reddit thread.

For another part, Backblaze also exclusively relied on one company to build another part, called a backplane. A backplane is like a computer's motherboard, a hub to connect components of the PC. However, since it open-sourced their design for the backplane, another company has started to make it, giving Backblaze another potential vendor for the part.

While competitors are also hopping on the open-source bandwagon with the Open Compute Project, Budman says that the approaches are still different. For instance, the Backblaze servers are still cheaper to make, so the company can keep its costs down, but it can't swap out drives while the Storage Pod is in use.

In Fairbanks, Alaska, the Geographic Information Network of Alaska (GINA) has built three Storage Pods. It needs to keep years of satellite images and large datasets available at all times for cartographers, scientists, professors and students, and as a small organization money can be tight, says Dayne Broderson, GINA's technical services manager.

"If I need a couple of petabytes of spinning storage, [a Storage Pod] would probably be the most cost-effective way to go," Broderson said.

However, he says that the Storage Pod solution might not be the best for everyone. Since the it isn't an out-of-the-box solution, the setup and maintenance of the server can be taxing. Luckily, for the first build GINA could rely on student labor, although it took a long time. Broderman said that having one student and a faculty member building the server (albeit not every day) took 4-5 months, complete with accidentally frying components. He chalks it up to a learning experience.

"You fry something and learn from it, and that's great, but it also takes a week to order that little $8 part before you move on," Broderman said.