Remember the early 2000’s world of Office Space? Java was still new, usenet passed for social media and applications were using proprietary Sun hardware with Oracle databases. Then the tech bubble burst and in that time of frugality, companies shifted to using commodity hardware with open source databases. In many cases, the database of choice was MySQL. The overall solution was cheaper, but you did have to give up some features. Due to the limits of both hardware and software, MySQL was often configured in such a way that it would lose data if there was a system failure.
ACID (Atomicity, Consistency, Isolation, and Durability) is a set of guarantees that traditional database systems provide applications. Applications should be able to depend on these guarantees during regular operations or if there is a failure.
Of particular importance is Durability, which means that when the database system acknowledges a transaction (a commit in database terminology) it should be able to survive permanently, even if there is a system crash or power failure.
As companies started using MySQL on cheaper hardware, they did not always configure it to provide full ACID guarantees. Around the same time, NoSQL systems such as MongoDB became popular and these systems initially did not even offer ACID transactions or durability guarantees. What you got in return was improved performance. This way of operating buffered a greater number of changes in memory and then batched the access to storage devices.
Being able to scale (“web-scale”) was the priority. Providing correctness when failures occurred (and in complex systems, failures always do occur) was a secondary consideration.
I remember this clearly from the early days of YouTube. Most of the team had previously worked at PayPal and when the team took shortcuts around durability, the refrain always was “it’s okay, it’s not money”. This eventually changed, as monetization became important and was tied to view counts.
In today’s world, there are better examples of the need for durability than financial transactions.
The traditional example of ACID demonstrates the potential failures that could occur when withdrawing $20 in Account A and depositing into Account B, and ensuring that the system never encounters a case where the money is lost, both accounts are credited, or the transaction is acknowledged as successful, only to later be reversed.
For today’s applications a better example would be:
While participating in a collaboration tool, like Slack, you might decide to create a private channel with a few colleagues. You realize that you have added the wrong person to this channel and you want to remove them. When you do so, you receive a confirmation that this update was successful. This change updates an entry in a database server. If the server crashes two seconds later, the expectation is that the user will have been removed and will no longer see the channel.
Similarly, when I hit send on a message to a colleague, I expect that the message has been sent. I do not consider that it might show as sent, but due to a system failure may not have been received by my colleague.
(Note: Slack actually does not have these problems because they use Vitess for storage at scale.)
These problems are created by asynchronous failures. The problem occurs because the database server has told the application that the operation was successful, the application has then informed the user of success, but then the database server later failed to deliver on its promises.
Vitess is an open source database scaling system for MySQL. We originally developed Vitess to scale YouTube, and it is now a CNCF graduated project (along with Kubernetes, Prometheus, and others).
Vitess prevents the asynchronous failure scenario that we talked about above in two different ways:
“Semi-sync” provides a great tradeoff between durability guarantees and performance. By contrast, some modern systems are solving this problem with a quorum, where the majority of nodes must receive the modification for it to be successful. While quorum can simplify some of the steps of failover, as the system grows (and adds more nodes) the performance overhead increases. This means that performance decreases.
Another advantage of the Vitess approach is that not every replica needs to be a member of the semi-sync group, so you can choose to design failure zones where at least one replica in a different availability zone/data center has a copy before the operation is considered successful.
Vitess has been the system of record for companies like YouTube, Slack, Square Cash, Pinterest, JD.com and Hubspot for many years. As of today, we do not know of any data loss incidents at these companies due to Vitess.
It is easy to forget just how much we integrate modern applications into our lives. As more and more parts of our lives depend on our applications, our expectations for durability and consistency have increased. The shortcuts that we took in the early 2000’s helped us scale systems when the technology was not always there. Now that we have the technology, we should be building systems that do not lose data in the face of ordinary failures and have a much lower chance of losing data in the face of catastrophic failures.