December 6, 2011

New Browser-Based Signup Process & Maintaining ‘Zero-Knowledge’ Privacy

by with 21 comments

One of the things that has always made SpiderOak unique is our ‘Zero-Knowledge’ privacy policy. ‘Zero-Knowledge’ means no one at SpiderOak has the ability to access your data – ever. Even if we wanted to access your data or received a subpoena to do so we could never turn over plaintext data. This is accomplished by encrypting all data on your machine before it is sent to us, using encryption keys generated from your password.

With this new version of SpiderOak, we are changing our signup process to include password creation in the browser. But how can we do this and ensure ‘Zero-Knowledge’ privacy? Isn’t creating a password on the web (via a browser) in clear violation of how we maintain our security?

The short answer is that we hash your password before sending it to our servers. A hash is a one-way algorithm such that there is no way for us to reverse the hash and figure out your password. When you try to login for the first time, we hash your password again in the client and compare it to the hash stored in our servers. If the two match we know that you entered the correct password. We use a javascript implementation of bcrypt to do the hashing. This gives the convenience of a simplified signup process while maintaining your privacy. And if you don’t trust this process, we encourage you to disable javascript during signup and you will be not be prompted to create a password in the browser.

Now to focus on our motivations for making this change. We used to have everyone signup in the SpiderOak application which was great from a security perspective; however this process was awkward for customers who are used to signing up for services on a website instead of downloading an application first. It also didn’t work well with tracking behaviors – most notably our Refer-A-Friend program. Previously, when someone followed a Refer-A-Friend link to our website we had no way to know when they signed up in the application. We had a system that was pretty good at guessing after-the-fact but it was slow and often missed signups. It could take up to several weeks to get credit and sometimes the user wouldn’t get credit at all.

We needed a better solution so we conceived a way to move a portion of the signup process to the web. Since password creation was still handled in the application, we needed a way for the user to identify him/herself when the application launched on their computer for the first time (otherwise anyone could steal the account before a password was created). We accomplished this connection through generating activation codes. This system solved the Refer-A-Friend problem but activation codes proved to be a bit clunky. People would lose them or not understand what they were for.

That brings us to today. The goal of any signup process is to make it as easy and seamless for the user as possible. In our case, we also always have to keep in mind our user’s privacy which adds to the complication. With this new process in place and thanks to bcrypt, we have a much simplified process while maintaining our important ‘Zero-Knowledge’ privacy.

In the end, privacy isn’t just something we seek for additional challenge but rather a philosophical approach we believe in deeply; we have never been willing to abandon it for convenience. That said, we are always looking for ways to provide our high level of security in simpler and more usable ways. I believe that this change accomplishes our goals.

Comments
  1. I'd just like to be clear on the changes you made to password validation. Am I correct that the new user enters their password in their browser, JavaScript hashes the password and submits it to the server, where it will be stored unmodified in the database? And when users return and log in again, the over-the-wire transmitted hash is directly compared against the database value?

    This means that the actual password to my account — the hash, since that's what's being validated — is stored in the clear in your database. That's a huge security problem.

    Am I missing something about your new setup?

  2. The hash is only used to compare that the password is the same as what you entered before. You can't get the password back from the hash. The password on your local computer is what encrypts the data though. So the hash can be shared freely without worrying about anyone accessing your data.

    password -> hash -> submit to server for auth -> confirmed -> password used to encrypt data -> encrypted data sent to SO for storage

    At least that's my understanding. I'm sure someone can correct me if anything I've mentioned is incorrect.

  3. EspadaV8 is correct. This is actually quite a common way to validate passwords. The hash is very different from your password. A hash is consistent, given the same data to start with, but a one-way hash effectively cannot be decrypted. You cannot use the hash to log in, you need the actual password.

  4. TL;DR This doesn't protect my *account* any better than storing the password in the DB in plaintext (though it does protect my *data*).

    A hash is a secure, zero-knowledge means to validate users _only if_ hashing the password is a part of the authentication process on the server (and not the client, as that can be easily bypassed).

    Hashing passwords is done to make the contents of the auth database unusable to anyone who has access to them. If you gain access to a company's table of password hashes and submit a hashed password to a server that will perform a hash before comparison, the comparison will fail. Thus the only way to authenticate is to actually know the password.

    In the design I interpret from SpiderOak's post, however, the server will not perform a hash as part of the authentication process- it just compares whatever the client sends it. If someone* manages to submit a hash of my password that they obtained from SpiderOak's database they get access to my account. Perhaps they can't decrypt my data if my password is used as part of the encryption key, but they can perform any account admin actions permitted by a successful authentication.

    In effect, the hash of my password _is_ my password for the purposes of anything this authentication process authorizes. Transmitting a hash of my password and comparing that (unmodified) to a stored hash is effectively the same as transmitting my actual password and comparing it to a copy of my password in the database.

    * This could be a hacker, a SpiderOak employee who would not normally be permitted to access user accounts, or a disgruntled former employee. The possibility of database compromise is real.

  5. Spiffytech ,

    I'm not sure what operations are possible on an account other than simply verifying the password before attempting to encrypt or decrypt something. I believe all configuration data is encrypted and you can't reset your password or grant access to files. What action are you thinking someone will take on your account that does not involve encryption?

  6. Couple of things:

    1) This is actually increasing security in many ways. At the moment you can log into your data via their website, but that password is currently sent as 'plaintext' through SSL – not as a hashed password through SSL. So what they propose will fix this huge security issue. Granted not everyone will log in via the website, but its a great feature.

    2) I wonder whether there is a big step we are missing somewhere in the middle. Remember that not even SO can gain access to our data as it is, and they already have access to all the Hashes in their database. So it cannot be as easy as getting the hash, and then 'modifying' something in order to use that hash directly. Otherwise SO could simply change a very small part of code in their app to transmit whatever hash they want, and therefore get hold of whoevers data they want. So there must be something else that we are missing.

    One question I have though – will this make it easier for other apps, such as DocumentsToGo, to access my SO files? Because that is something I would absolutely love to be able to do.

  7. You could just as well have separated account/login from encryption and used a user/password combination for the website and a different encryption key for the data, right? Just like Mozilla Weave does…

  8. I'm guessing here, but I would imagine that any 'valuable' (to the user) data is encrypted with your password, so having the hash wouldn't allow them to encrypt/decrypt any of your data or details.

    Any details that SO need access to would (should?) be encrypted with some sort of public/private key (ideally a unique public/private key for each user). This way, having access to the database wouldn't do anything with also having access to SO's private key. If it's a unique key per user then there would need to be a lot of keys stolen to be of any use.

    @Chris – I don't see how this would allow easier access to third party sites/apps. They'd still need to implement the SO API (does one exist?).

  9. Yes, I would like to see a separate login username/password and encryption key. This is what Mozy does.

  10. Hello,

    Let me clarify a few of the excellent questions brought up above.

    1) The bcrypt password actually just replaces the temporary "activation code" that we used to give to new users when they created an account on the web. The "activation code" was a random string that we would require them to paste into the SpiderOak client when they first ran it, to associated the client back with their account. The client then asked for a password and locally generated encryption keys. All further authentication into the SpiderOak account is accomplished with a protocol for zero-knowledge password proof, as described on the Engineering Matters section of the website.

    With our new system, the user creates the password in their browser, and the server stores the bcrypt version of that password. However, this bcrypt password is _only used_ for the specific context of allowing the user to authenticate to the server exactly one time, when they create their first SpiderOak device (generally seconds after they create their account.) From that point, the SpiderOak client derives encryption keys from the password, and generates account keys like usual. (Also described on the Engineering Matters section.)

    Basically, the whole purpose of javascript bcrypt is to make this signup sequence more approachable to new users.
    Unsurprisingly, new users find remembering a 12 digit activation code difficult, even if it is written on the screen in big red letters. :) For most people, creating and remembering a username and password to start an account is familiar.

    Also, if you disable javascript in your browser, you'll get the old behavior, with an "activation code" generated for you like before. (Except we now call it a temporary password so the experience is consistent.)

    2) It's important to us for the user experience that we only ask people to remember ONE password. Creating and remembering a single password is hard enough. When the consequence is that "you cannot decrypt your data" if you can't remember your password, it's very important to keep this simple.

    Of course we could have different passwords for different purposes, and that makes things easy on the implementation, but hard for the user.

    We as engineers, we try to remind ourselves that most customers are just using the service and don't understand how the cryptography is being used to protect their privacy, and they shouldn't have to.

    Hopefully that make sense, and shows why we chose the route we have.

  11. I chose SpiderOak because of their security. After a short time, I was unable to open it, on either of my computers. At the direction of customer service, I tried to delete SpiderOak but am unable to empty it from the Trash on one computer. Customer service is beyond terrible—it takes days to receive a response by email, and the response is not helpful. I have requested a refund of my subscription.

  12. To spiffytech's point: Wouldn't it make more sense to hash the password in the client, and then re-encrypt the resulting hash for storage on the server side?

    If I understand this correctly, right now, anyone with access to the hash stored on your server can send that hash from a spoof version of the client as if it were the output of a legitimate client-side hashing operation, and access the account without mounting a brute-force attempt at all. So if an attacker gains access to your database, all accounts are broken.

    If, on the other hand, you re-encrypt the client-provided hash on the server before storing it (and before checking it on login, obviously), then an attacker can't spoof it into your login system – they'd have to guess the intermediary hash from which the server hash was generated, which would require a brute force attack that should be possible to mitigate before they guessed the input string.

    I'm all for client-side encryption but it should not replace server-side safety, and as it stands this is the reason I'm not signing up with SpiderOak.

  13. Js/Spiffytech: Not to worry — this is in fact what our implementation does. It's two layers of bcrypt. We save the salt from the first bcrypt so we can use it as a challenge to the client, but we bcrypt the actual result of the first bcrypt in a second layer of bcrypt, before storing it in our database. (Hopefully that makes sense. The best part about recursion is recursion!)

    Note also that bcrypt auth is only used until you have your initial account crypto keys created (which happens as soon as the first device is added, and before any data is backed up.) So effectively any potential weaknesses in our signup authentication arrangement could only be used to access accounts that have yet to store any data.

  14. You as engineers have a duty to educate users as to how cryptography is being used to protect their privacy and to not assume users are idiots or uncaring. Don't bury the details in lengthy forum discussions that never formally lay out the security architecture.

  15. When did you get the national security letter that dictated this change?

    Sorry, but the vague explanations given here don't give me the warm fuzzies about providing my password in the signup process. I guess it's time for someone to provide such a service with an open source client that can be audited.

  16. The new signup technique sounds good if we make one simple assumption: typing a password in a javascript-enabled web browser is just as safe as typing a password in the SpiderOak app.

    If that assumption is relaxed, then the new method of creating an account is less secure.

    I guess the smart way to go would be boot from a Live CD in order to create a new account, but that would definitely flunk the "ease of use" test.

  17. Why are people making this so hard? The change only applies to a single instance — when logging on for the FIRST time with a new account.

    That's it. Nothing nefarious or mysterious about it.

    -ASB

  18. I understand the official description provided above was meant for non-technical users, however, I found it simply did not say explicitly:

    1. If the requested password is only for authentication.
    2. If the requested password will also be used for local file encryption.
    3. If the client-side javascript hash sent over SSL will be re-hashed on the server as well.

    Since none of these questions were answered, I started to doubt SpiderOak's approach as a new potential customer, however, because I'm genuinely interested in your service I took the to read all of the comments and have come to understand that:

    1. This is only for the initial registration.
    2. You have additional layers of security on the server-side as well. Re-hashing the hash with salts.
    3. This hash will not be used for encryption, but only for the authentication until you have added your first device, at which time the encryption keys will be created.

    I'd prefer to have someone from spideroak.com officially make it clear to me and others what i've stated above, perhaps you could add a link for a technical overview of what you're saying.

    I was registering for my new account when I saw a link to this page. I like that you've modified the signup process (although I hadn't seen the previous one), but a more detailed explanation will clear any worries users have, as well as help us newcomers appreciate your efforts more.

    Thanks.

  19. I'm not a critic of the technical decisions that are outlined in this post, but take care of the damn spam. It's improfessional.