Bingodisk and Strongspace: What Happened?

We have had a fantastic beginning to the year at Joyent. Our revenue continues to grow quickly. We have been gaining new Accelerator customers at a record pace. The Facebook deal is dramatically increasing the size of the Joyent community and is already paying off handsomely as successful start-ups upgrade their Accelerators to serving millions of pages to millions of users. The biggest Facebook application running on Joyent Accelerators now serves over 700 million pages per month. Yes, 700 million.

While the commercial grade Accelerator products have been growing faster than ever, two of our smaller, “prosumer” products hit a serious road-bump.

Ten days of downtime

The past 10 days have not been the best days at Joyent. Bingodisk and Strongspace went off-line 12 Saturday. Bingodisk service was restored eight days later on 19 January. Strongspace limped back into service late 21 January, nearly ten days after it went off-line. Customers of these services are rightly outraged by the outage. While Strongspace and Bingodisk represent a very small fraction of Joyent’s entire infrastructure, we understand how critical it is to many of you, and have been working and investing many, many hours to bring these services back on-line as expeditiously as possible. I apologize for the outages.

In this post I would like to report on what happened, how Joyent plans to compensate our customers, and what we plan to do in the future with Strongspace and Bingodisk.

Some Background: the Economics of Bingodisk and Strongspace

Strongspace was introduced in August, 2005 as an elegant multi-user storage solution using SFTP. It initially was deployed on EMC Clarion storage. The market for on-line storage was rapidly crowding and the price of on-line storage quickly dropped. We began the process of looking for a new architecture and hardware platform in order to remain competitive. With the Zettabyte File System (“ZFS”) in OpenSolaris and the introduction of the Sunfire X4500 (aka “Thumper”), we realized that we could build very competitive on-line storage solutions at costs that kept us more than competitive. Strongspace moved to ZFS in December, 2005 and onto a Thumper in October, 2006. We came out with Bingodisk, also based on ZFS and the Thumper, in September, 2006. Without ZFS and the Thumper, we probably would not have been able to continue Strongspace or introduce Bingodisk. The Thumper and ZFS provided the raw storage-to-controller ratios and ZFS the redundancy and data protection we required without having to spend, literally, hundreds of thousands of dollars.

I’m laying out this background to note that both Strongspace and Bingodisk were always designed to be (a) inexpensive, utility storage in the cloud, and (b) built on top of a filesystem and hardware platform that would ensure we would not lose data.

OK, enough preamble. What happened?

On 12 January, both Strongspace and Bingodisk went down because ZFS encountered what it thought was duplicate data on disc. The so-called “spacemap bug” (fixed in build 60 of OpenSolaris) apparently double-writes blocks. The problem arises when ZFS later realizes this and tries to free that which is supposedly already free. ZFS thinks that something went wrong and that it may corrupt data so it (correctly) panics. Once this loops gets going, it’s tough to break out of it.

We were conservative and methodical in how we moved forward from here.

We updated to the latest Solaris build and ZFS code so that issues from the bug wouldn’t occur again, and then set out to find and repair the problem areas while getting the services separated and running. The operating system updates went fine. The dataset imports and updates took quite a deal of time. One of the larger distractions early on in the process was a bug in the NCQ driver that made the SATA drives appear to have “issues”. We corrected the NCQ driver. We also performed a complete hardware swap-out (just to be safe). Every piece of hardware in the original Thumper was replaced with parts from a standby Thumper. In the end all the drives from the original Thumper (48 of them), ended up in a new Thumper. We had to do all this so that we could safely read the data off the original Thumper to bring the services back up on new Thumpers.

When it became clear that the data set under Bingodisk was fine and likely not where the issue lay, we moved all that data to new storage. We didn’t trust a block-level restore, so we had to read and write files, and writing that much metadata takes a enormous amount of time: about 1TB every 10 hours. We were able to get BingoDisk operational first.
Strongspace took longer because that was the dataset with the problematic area(s). Areas that took about 5-10 hours to expose themselves in testing each time. We are currently running ZFS for Strongspace in a state where we set ZFS so that it won’t panic when it hits the problem area, but will instead run a recovery.

We’re quite fortunate these problems happened to us with ZFS. ZFS at the very least gave us the confidence that our data is there and valid. No data was lost.

Some have wondered why we didn’t upgrade the operating system earlier. Upgrading the operating system is not a trivial task on a production system with so much storage in play. Further, the version of ZFS we were running on Strongspace and Bingodisk was more mature code than that code originally shipped in Solaris 10. This meant the code we had in production had gone through ZFS’s vaunted test bed. Finally, the likely scenario of an operating system upgrade would have been to expose the “spacemap” data errors on disk sooner, bringing down the services nonetheless. Once bitten…

Was there a backup?

Yes, and no. In the traditional sense of us writing the data from Bingodisk and Strongspace to tape or some other Thumper, no, there was no backup. Data redundancy is built into the ZFS/Thumper software/hardware combination. The Thumper is both server, and backup. Moreover, it’s hard to see how a backup of 18TB of data to another physical device would work, in practice. Moving Bingodisk to another Thumper during this crisis took 30 hours (3TB of data). A large, multi-tenant service such as Bingodisk or Strongspace with the amount of data they manage makes it practically impossible to do a meaningful backup. A single backup would take over a week. The backup process would kill end-user performance. A service like Strongspace, which people use to rsync their own backups, means the data turns over rapidly and an incremental backup would not make sense. ZFS has a facility, zfs_send/receive, that runs on an idle thread. There is currently no idea of giving priority to this functionality, so, again, practically speaking, this could not be used for backup.

Joyent Accelerators and Connector are backed up for disaster recovery daily. The datasets for each of these is much smaller and therefore fit into a practical backup scheme.

So, Bingodisk and Strongspace were backed up based on the redundancy built into the Thumper itself and the capabilities of ZFS. Fully 6TB of storage on a Thumper is dedicated to redundancy. ZFS’s capabilities to ensure no data loss were proven in this instance. These Thumpers sit in a telco-level data center (the best) that is rated to withstand a 9 richter earthquake. The fire systems in the data center itself mean the chances of the Thumper being lost to fire are statistically meaningless.

What is Joyent going to do for customers?

With the events of the past ten days, we’ve been doing some hard thinking about Strongspace and Bingodisk.

Here’s our plan for Strongspace. We’re not taking anymore sign-ups for Strongspace. The current Strongspace will be replaced by a new service (not named Strongspace) that will not have the economic model of the current service. It will be expensive, distributed and bullet-proof. The replacement service will likely be introduced before October, 2008. We will retire the current Strongspace on 1 October 2008. There is Strongspace functionality in Joyent Connector today, and that will remain. Customers currently on Strongspace will be allowed to continue to use the service for the next 9 months for free. If you bought Connector for Strongspace and only want Strongspace, please file a ticket. You’ll be allowed to remain on Strongspace for 9 months for free, but your Connector and Shared Hosting (or Shared Accelerator) accounts will be deleted. If you bought Connector for Strongspace and you want to keep your Connector and Shared Hosting (or Shared Accelerator) accounts, please file a ticket and you will get a coupon for 4 months of the new service for free. If you are a Mixed Grill (or similar) customer, we will be replacing the Strongspace component with the replacement product. Every current Strongspace customer will get a coupon for 2 free months (minimum) of the new service. If you just feel like saying “screw it, I don’t want to have anything to do with these guys”, please file a ticket and we will refund you for two weeks of down time.

We will be open-sourcing the current Strongspace. This will allow anyone to run Strongspace private label on any infrastructure provider they choose. After Connector, Slingshot, and our DTrace probes for Ruby, this is Joyent’s fourth major contribution to open source. We will continue to provide some infrastructure for the FreeStrongspace community and a test bed for installations, demos.

Bingodisk is used widely by people preferring HTTP over proprietary APIs to serve up static assets for web sites. Due to the downtime, we are giving Bingodisk customers four months free. In fact, anyone signing up for Bingodisk between now and March 1st will not be charged for two months. If you feel you don’t want to have anything to do with Joyent, please file a ticket and we will refund your annual subscription, pro-rated plus an additional two week. Bingodisk sign-ups are currently disabled, but we’ll be bringing that process back on-line this week. Bingodisk continues to have the same economic model of inexpensive storage and an industrial strength filesystem for data security. Over time it will be folded into Connector.

Bingodisk will also be open-sourced. Anyone will be able to run Bingodisk on any infrastructure provider they choose. This is our fifth major contribution to open source. As with Strongspace, we will continue to support the FreeBingodisk community through providing infrastructure and a test bed for installations, demos.

While these measures do not get back the eight and ten days of down time, I hope they do send the message that we value all of our customers. Again, I apologize for the down time.

———-

EDITED: Added end date on ’2 months for free’ promotion for new BingoDisk signups.

This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.

69 Responses

Comment this article