January 14th, 2011
My last post received quite a bit of criticism. Some people even said I shouldn’t have written it at all. It has been claimed that the sync problem is solved already. Our experience has shown that this couldn’t be farther from the truth – in particular for Cocoa developers. Dave Peck observed this too, and made some wise suggestions. Here is a quote from his article:
If you’re a die-hard Cocoa developer, the thought of building a scalable and reliable web back-end might just be daunting — daunting enough to cut it out entirely. I’ve noticed a strong resistance in the Cocoa community to building such things.
The Mac community is a great ecosystem and we’ve benefitted a lot from it. Things could not exist without it. This is why creating a cloud sync solution has come to mean more to us than just an improvement of our existing WiFi-based sync. In fact, one of the motivations behind this series of posts, is to start sharing our insights with the community of Cocoa developers.
The first thing you need to do when you want to solve a problem is to get a good understanding of the nature of the problem. The previous post attempted to explain the sync problem in the least technical terms possible – to make it accessible to a non-technical audience as well. Eventually, however, we will be making more technical documents available – directly addressing fellow developers.
Today, I want to talk about the technologies we’ve tried.
Back in the day when Things for the Mac was still in beta we added WiFi sync using Apple’s Bonjour technology. WiFi sync was Apple’s recommended way to realize sync. There are advantages indeed. For example, there is no need to sign up for a web service. It doesn’t even require a pre-existing local network, as any Mac can create its own spontaneous network. The user’s data never leaves the local network.
However, there are notable disadvantages as well: you actually have to remember to sync, and you have to initiate it manually. Relying on the local network, of course, also eliminates the possibility of syncing your work Mac with your Mac at home.
Obviously, we had to provide a solution that would utilize the cloud instead of the local network. The most obvious choice is perhaps MobileMe.
There are quite a number of people using Apple’s MobileMe to sync, e.g., calendar or address book information. For such users, it is only reasonable to wish that Things would sync in the same way, using their existing MobileMe subscription. Unfortunately, the sync technology Apple uses for their own apps is not available to third party developers on iOS devices.
It is available on the Mac though, and we hoped it would eventually find its way onto iOS; this hasn’t happened yet. What some competitors are in fact referring to, when they talk about MobileMe sync, is the use of remote storage through the iDisk.
Dropbox, WebDAV et. al.
Some users have suggested using Dropbox for sync. Like many similar offerings – often based on the so-called WebDAV protocol – Dropbox allows users to store files remotely on their servers. In this sense, it is similar to Apple’s iDisk (assuming iDisk sync is turned on) – what both services do is provide a folder on your computer that is mirrored on a remote server.
Neither Dropbox nor iDisk – or any similar service for that matter – were conceived as a sync solution for apps like Things. They were designed for sharing files – like photos or PDFs. Remote file storage products do not offer merge facilities that come even close to what would be needed for Things.
What is possible, however – and others are in fact doing this – is to use a hack. Without going into detail, it basically means breaking up the database file into a large number of smaller files. Merging and conflict resolution can now be handled using these smaller fragments, which eases some of the pain.
This approach works to some extent, but it is slow and error prone to begin with, and advanced options like push or anything involving user-to-user data exchange is impossible. Developers choosing this approach for apps with complex data models similar to Things paint themselves into a corner right from the start.
Database in the Cloud
Instead of using remote file storage, it is a much better idea to use a custom-designed web service. This is an approach taken by most web applications, and a whole industry has arisen developing, deploying, and maintaining applications that are written in this style. Everyone going this route has ample expertise and help available to them.
In taking advantage of this, we began working together with a great group of web development people. It was a great experience, and one we wouldn’t want to have missed – but eventually we abandoned this approach. Here is why:
If you are thinking in terms of a web application, you are basically considering placing the user’s entire database in the cloud, together with enough logic to safely manipulate that data. This makes the database in the cloud the authoritative version which can be used to determine how the data on every device should look. This sounds like a great thing to have, but it requires that all merging and conflict resolution be done on the server – and this turns out to be really slow.
Merging potentially requires accessing the database, and hence the hard-drive, very often. Hard-drive operations, however, are the most expensive in terms of performance – in particular when many databases are hosted on one server. Of course, there is a solution to this problem, it is called sharding, or in less technical terms: throwing huge amounts of hardware at the problem. But our users have made it very clear that they consider cloud sync essential and don’t want it to become an expensive service.
Doing it Right the Wrong Way
Teams of programmers collaboratively write code by making use of so called source code management systems. In 2009 we switched our source code management system from Subversion to Git. Working with Git provided us with a lot of inspiration. Git uses a decentralized approach where contributors can work with their own local repositories. This setup is very similar to the syncing problem where data needs to travel between local databases.
We were so intrigued that we decided to develop a sync solution based on Git’s core ideas. Since these were general ideas anyway, we decided to create a solution that isn’t tied to the specific properties or needs of Things. Instead we wanted to create a general framework that could be integrated with any application no matter what the specific data model or sync policies of this application were.
We have tried many things, from underpowered technologies to over-engineered solutions. The approach we finally settled on is one that strikes the right balance, and in our next article we’ll be sharing more information about what that means.
Some people said we shouldn’t have pursued cloud sync with this level of ambition. But then, that wouldn’t be us. It is not how we developed Things. We know that people are coming to Cultured Code because we take this approach. They like companies that care, companies that try – and that is what we will keep doing.