Tag Archives: document

What does i_ m__n __ __v_r _____ ___ ____ ____ ___c_?

We have been getting a lot of questions lately about our block level
de-duplication, how it works, and how it is applied through the SpiderOak
process. As I consider myself to be layman, please allow me to explain this in
more simplistic terms – such that even I will be able to understand.

For the sake of this example, let us say you have created a document
entitled ‘Why peanut butter and jelly sandwiches are better when you place
salt & vinegar chips in the middle’. The size of this document is 10k.
After saving the initial version, you go back and make 9 additional edits.
Each time you make an edit, you save the document as a new version thus giving
you 10 complete versions. And with each version being exactly 10k, the
complete document takes up a total of 100k on disk (or 10 versions multiplied
by 10k).

SpiderOak, on the other hand, works much more efficiently when storing data
- creating many wonderful benefits for the user. As you can imagine, from the
first version of ‘Why peanut butter and jelly sandwiches are better when you
place salt & vinegar chips in the middle’ to the last, only small pieces
of the document have changed. One simple example is replacing the word
‘excitable’ with the word ‘volatile’ in the third paragraph. Instead of
storing (and uploading) a whole new version of the document each time a small
change is made, SpiderOak breaks each document into blocks of data and then
only backs up (or uploads) the change or delta between the new version and the
old. Using this process, the same 10 versions of the aforementioned document
on SpiderOak only amounts to 15k on disk (as opposed to 100k above).

Although the below visual example only uses two versions of a document, it
does further explain how the SpiderOak de-duplication process occurs.

This process saves our users a considerable amount of space as a user is
only billed for the de-duplicated amount. Furthermore, the upload can occur
with much greater speed because only the changed blocks of data are sent from
one version to the next. In the end, SpiderOak works extraordinarily hard to
never upload and/or store the same block of data twice – saving our users
money and time.

Question: So perhaps now you may better understand the title and how it
relates to de-duplication?

Answer: What does it mean to never store the same data twice?