December 21st, 2010
This blog post is about the cloud sync solution we are working on. Finally, some might say. It has been about two years since we first thought about the option to create a cloud sync solution for Things. A lot has happened since then, and we learned many lessons. It was a long and winding road for us, but most of it happened behind closed doors. In the words of one user, our progress has been glacial – and as I sat down to write this post, it was this metaphor that came to mind. Why? Yes, a glacier is slow, but this object – naturally impressive and beautiful – carves its own path, and despite any obstacles, must arrive.
I am not sure whether “impressive” is the most appropriate term for the cloud sync technology we are working on, but it undoubtedly constitutes a most significant feature in all our products – existing or forthcoming. And to us, our cloud sync architecture is certainly beautiful. But why has it taken this long? The quickest way to answer this question is to quote Ken Arnold:
To stop worrying about it will require worrying about it a lot at first.
And that is exactly what we did. With what will be a forthcoming series of blog posts, I will not only look back to give an idea of what it’s like when a software company is worrying – but I will also share some details about the technology that will underpin our solution and what it will mean for our users.
Before getting into those details, it will be useful to explain just what kind of problem a sync solution is trying to solve. Today’s blog post will therefore concentrate on that, but before closing I’ll also offer an overview of where we stand right now.
If we lived in a world that had perfect networks, the sync problem would not exist. A perfect network is one that never fails and has unrestricted speeds. Access to remote servers would be instant. With a perfect network, it wouldn’t be necessary to store data on our devices. All our computers and mobile devices could simply connect to a remote server to store and retrieve data. This way all of our devices would have the same data available to them all the time.
But, of course, the network is not perfect. In order to provide a great user experience, it’s still necessary to store data locally on the device. For this reason, all versions of Things come with their own database. Now, in order to make sure the same data is present on each device – no matter where it’s added or changed – the Things databases need to be synced.
Looking for a Common State
What exactly does this mean? Let’s look at an example. A user might have modified her Things database on the iPhone while not being connected to the internet. Back at home she might make changes on the Mac before the iPhone had a chance to send its changes. Now we have a situation where the iPhone and Mac databases have diverged, no longer sharing a common state.
In order to remedy this situation, data has to travel both ways between the devices and undergo a process to make sure that the resulting databases ultimately reach a common state again. This process is called merging. In the example above, assume that the user reordered to-dos in a project on the iPhone and deleted or created to-dos in the very same project on the Mac. During the merge process the devices need to agree on a final state for the project, distribute all necessary data, and make sure the resulting changes are applied correctly.
The merge process gets more complicated in the presence of conflicts. Imagine a user checked off a to-do on one device but deleted it on another. In order to resolve the conflict, a decision has to be made. The deletion can either be ignored or applied; the latter effectively resulting in the completion of the to-do being ignored.
It gets even worse with objects needing to maintain or change their relationships with other objects – like to-dos, projects, areas, and tags; to-dos can be contained in projects; to-dos and projects can be contained in areas; all three of them may have tags. Changing these complex relationships on multiple devices at the same time creates ample opportunity for conflicts. It is the responsibility of the merge algorithm to resolve these conflicts and to make sure all resulting changes are distributed consistently among all devices.
Simply Make It Work?
Providing a sync solution is a complex problem. As might be apparent from what I outlined above, sync bugs have the potential to seriously mess up a user’s data. It is therefore very important to have a solid foundation. But this still does not describe the entire problem. The most difficult part is not to simply make it work, but to create a fast implementation that can scale to millions of users without diminishing the user experience.
Why is a fast sync process so important? When you launch Things, you want your database to be up to date and contain all your to-dos, no matter where you entered them. Since Things cannot update itself in the background due to iOS restrictions, this means that syncing has to take place when the app is launched. A similar situation arises when you enter new todos on your device and quit Things immediately. Add to this the possibility that a network connection might drop at any time. Every conceivable troublesome scenario points to one thing: fast sync is necessary for a good user experience.
Finally, we must consider scalability. Creating a solution for a few thousand users is one thing – creating a solution for millions of users is a different beast entirely. We have all experienced what happens when a web service is accessed by more people than it was designed for; at first, the service becomes slow, then it fails entirely. It has been our primary goal to create an architecture where scalability was not an afterthought, but rather built-in from the beginning.
A True Cloud Sync Solution
This is the first of a series of articles about our cloud sync solution. In the following installments I will talk about the various approaches and technologies we tried while working toward our goal, and why we did not continue with them. I might also touch on a few popular approaches that others have taken, and show why the trade-offs involved were not acceptable to us. I will talk about the lessons we have learned and the final solution which, at last, satisfies all the requirements we feel a true cloud sync solution demands.
Before closing this article, I would like to offer a cursory glance of where we’re at right now: We have created and deployed both server and client-side sync components. Both components are completely general and can be used for any application. They have been successfully tested using a special demo program. We are now in the process of integrating this technology into Things.
The final release of cloud sync as part of Things is still off by a few months. But we plan to publish more details about what we are doing (and have been doing) every few weeks.
Let me end this post by expressing our sincere gratitude for your patience. Driven by ambitions that were almost too high, it has taken us much longer than we expected. On a path lined with unanticipated obstacles and letdowns, it felt at times as if we would never get there – but we kept believing that we would be able to create a fine solution; a foundation for many cool things to come.
It is now in reach.