June 19, 2010

Announcement: We’re now selling storage à la carte via HTTPS

by with 13 comments

Update: SpiderOak DIY service has been discontinued, and is being replaced by the our new Nimbus.io storage service which is a new work based on everything we learned from DIY and our previous internal storage projects. It is also open source, with a fancy new ZeroMQ based architecture. Please visit nimbus.io for more information and to request an invite to use that service. The information below is provided for historical purposes only.

This is an alpha release for the SpiderOak Do-It-Yourself API for storing and accessing data directly on the SpiderOak storage network. This is similar to Amazon’s S3 and other cloud storage services, but designed specifically for the needs of long term data archival.

We’re happy that this service is open source, top to bottom (including the code we run on the storgae servers.) It’s also offered at the same very affordable prices as regular SpiderOak storage.

During the alpha, this is only available to SpiderOak customers. Every SpiderOak customer can retrieve an API key and get started immediately if they wish. At the beta release (which will be soon) we’ll enable general signup, and we’ll move out of beta shortly after that.

For details on the implementation, architecture, API, the git repositories for server and client code, please visit the DIY API Project Homepage for more
information.

Update 1: Several people have asked why they don’t see a DIY API key option on their billing page. This is because the DIY API is a paid service, so it’s not available with a 2gb free SpiderOak account. Since the storage is so conveniently accessible over HTTPS, we think it likely to be abused if anyone can easily create 2gb free accounts. However, we’ve setup a $1 upgrade you can use to test DIY when you don’t already have a paid account. Just email support and we’ll give you the upgrade code to use.

Comments
  1. "Your data is stored with a replication level of 3 in a world class data center running on nuclear power (yes, really.) With SpiderOak, your backup storage has a zero carbon footprint."

    You've got to be kidding. As if construction of nuclear power plants didn't release huge amounts of CO2, and as if nuclear waste was so much better than CO2 emission.

    Tasteless.

  2. Wow, relax. That was intended somewhat tongue-in-cheek anyway. The nuclear power plant was there decades before SpiderOak was. I'm not claiming it's zero impact, just that it doesn't have the same daily carbon consequences, for example, as a data center powered by burning coal (which also releases way more bad stuff into the atmosphere than just carbon.)

  3. And SpiderOak continues to amaze! Speaking as a starving CS student, I am thrilled about how SpiderOak operates. Nuclear powered, Python, open-source API, GNU/Linux client, technical details galore, student discount… what's not to love?</fanboy>

    Seriously, kudos on the API. I'm really looking forward to playing with it.

  4. I am a paid customer, but I don't see an API key on my billing page.

    I have already posted and asked this on Twitter and sent an email to the support, but it doesn't look that you care about your customers, because I have received generic "you are not paying a customer" reply.

  5. Tomaž – Apologies — you're right – we're correcting a problem with the billing page that was preventing the DIY stuff from showing up for some accounts. The fix should be live shortly.

  6. No problem.

    Now I have finally been able to retrieve the API key and play with it a bit.

    It looks like there are some serious problems with consistency.

    For example, when you upload a file and delete it.

    You retrieve the list of all the files (listmatch) and it shows like the file was successfully deleted (file is not listed anymore), but at some later stage you preform the listmatch again and this file is shown like it was not deleted and it's still there.

    Also, sometimes the same file is listed multiple times when using the listmatch command (e.g. I have uploaded a file README and when performing the listmatch, server returned ['README', 'README', 'README']).

    It looks like node coordination is some how messed up (multiple nodes return response instead of a single one).

  7. Here is another example where the same file / key is returned multiple times (in this case, twice):

    [....
    'backups1/duplicity-new-signatures.20100622T175449Z.to.20100622T175655Z.sigtar.gpg', 'backups1/duplicity-new-signatures.20100622T175449Z.to.20100622T175655Z.sigtar.gpg'
    ...
    ]

    Funny thing is that when I want to retrieve it, it in *most cases* returns 404.

  8. Thanks for the early reports. We've found a few minor things to fix and should have a new revision ready for more testing shortly. One of the problems was that one of the gevent WSGI handlers was silently buffering requests before processing them. Standby for updates…

  9. No problem and I do hope you are going to sort it out, because it looks like it could be a pretty good and cheaper alternative to Amazon EC3 for storing backups.

    Also the API is currently pretty limited, but after you sort all the problems I hope we are at least going to get improved listmatch method and method for bulk deleting the files (it would save a lot of requests when you need to delete multiple files).

    I have also created a SpiderOak DIY backend for Duplicity backup tool (https://code.launchpad.net/~tomaz-muraus/duplicity/backend-spideroak-diy), but it is not yet meant to be used for other then testing because of the current state of the API and the lack of tests.

  10. Tomaž – Wow, a duplicity backend. That is neat. :)

    Yes, we will certainly add bulk delete, and would love feedback on what additional things are most important to add after that. We've tracked down most of the problems now. There's one lingering issue we're trying to resolve with RabbitMQ clustering which is causing most of the trouble. Hopefully will have that one resolved today.

    Our longer term plan has always been to switch to ZeroMQ anyway. RabbitMQ clustering is just so ridiculously convenient for building a first version. :)

  11. Glad to hear that you've tracked down most of the problems.

    I've just ran the test suite for 20 times and all the tests passed each time. It also appears to be a lot faster then the regular SpiderOak backup – getting speeds up to 15Mb/s (still far from the speeds I get when uploading to the Amazon S3 servers in Europe, but like I've said a lot better then the SpiderOak backup speeds which are just slow) :)

    ZeroMQ does look very nice (simple but powerful), but I don't have a lot of hands-on experience with it (recent article on nichol.as inspired me to played with it a bit more, but that's about it).

    Another feature which is definitely needed is some kind of command which returns the amount of available space you are using.

    Also it might not be a bad idea if you would add another command or LISTMATCH would return a triple – file name / key, file size, file md5 hash.

    If you would provide the file md5 hash we would not need to download the whole file just to preform the integrity check.

  12. I forgot to add that the better error handling / reporting is needed as well.

    Currently, even if you try to delete an in-existent file you get "OK" response with status code "200 OK".

  13. And another thing which is definitely needed is renaming / moving files (keys).

    If you currently want to rename / move the key, you need to delete an existing key and upload it again with a new name.