With the recent release of our new multi-cloud offering, the PlanetScale engineering team achieved a feat that has never been done before: enabling a database-as-a-service to replicate across the three major cloud providers. To back up this bold claim, I have enlisted the help of some of the notable engineers on the project to help provide context for why this was a hard problem and the steps that we took to solve it.
Level 1: Operator Consensus with Peter Farr and Anthony Yeh
Abhi: Hi team! On the level of our Kubernetes operator, what do you think was the hardest challenge in making multi-cloud databases work?
Anthony: One hard challenge was deciding how to manage cooperation among different operators running in different Kubernetes clusters; we had to use the Kubernetes API server as the source of truth. The problem is that different clusters and operators can’t see each other’s API servers. So it came down to the question: do we give operators access to everyone else’s Kubernetes clusters or can we store the information in the topology server?
Q: So what did we end up deciding on?
Anthony: The latter. We ended up minimizing the need to coordinate across regions by having our custom resource, the PlanetScale Cluster, contain the full picture. Normally you would write a spec isolated to a certain region, but we just made one giant spec and gave all operators a view on what the others were responsible for. This allowed us to avoid having to give permissions across clusters.
Q: So how did the operators form consensus around state?
Peter: It took us a while to design consensus in a way where it’s easy for a caller to use, for it to feel native. It’s an incredibly simple-to-use API. It doesn’t immediately feel like you’re doing a big election process and a lot of it is handled safely for you. The biggest thing was making the API design correct; it’s really user-friendly. Making the package usable throughout the rest of the codebase without thinking too much about the implementation is another win on top of requiring consensus for multi-cloud.
Q: So was making the API user friendly the biggest challenge?
Peter: Well, we also spent a lot of time figuring out how to best structure the data. In our current voting process, the sum of ballots is stored as a JSON object in our global topology server, but earlier we considered having each vote live at its own topology address. We quickly figured out that this pattern is really slow. Since each operator fetches the sum of all ballots, and tries to update the cells they are responsible for, we would be making way more calls to the server if they lived individually. Instead, we have the operators concurrently fetch the one JSON object, update it, and then set a retry loop until every operator has had their turn updating it. This design both limits calls to global topology and stops us from having to use expensive locks.
Level 2: Network Mesh with Anthony Yeh
Abhi: At a high level, what was the biggest challenge that we approached on the infrastructure level?
Anthony: We had two big problems. First off, if you’re designing your system to rely on Kubernetes, you’re probably relying on load balancers created through the Service object. However, this doesn’t work when talking across different Kubernetes clusters, so service discovery needed to be handled elsewhere.
Q: Okay, and the second?
Anthony: We needed network policies that work across clusters. Normally you think of network policies blocking IP ranges or whitelisting IP ranges, but the problem is that no one has a static IP in the Kubernetes world and things are moving around all the time. By definition, network policies let you write your rules: for example, defining that a pod with a certain set of labels can talk to pods with the same labels. Finding a network policy plugin to enforce these rules at the network level was the challenge.
Q: Sounds like you’re setting this up to knock both down at the same time...
Anthony: Cilium was one of the big pieces that we decided to go with that solves both problems. Cilium is a solution that supports Kubernetes API-based network policies across clusters and regions along with service discovery through Cilium Cluster Mesh. It was a happy accident that one thing solved both of these issues for us. We evaluated several other options and decided this was the best option for both of these things.
Q: Alright, let’s go one level deeper: how did we get the cloud providers talking to each other?
Anthony: At the lowest level we needed to connect networks across regions and cloud providers using VPC peering. Doing this with Kubernetes Pods as opposed to bare VMs brings up a lot of complexities. Even if you know how to set up bare VMs talking to each other across clouds, doing it in Kubernetes is a lot more nuanced. Dynamically assigned IPs make things very difficult.
Abhi: Yeah, sounds like I need to talk to Dan about this!
Level 3: VPC Peering with Dan Kozlowski
Abhi: Hey Dan! Peering the clouds seems really hard. How’d ya do it?
Koz: The difficult challenge with peering is that there is no documentation on how to peer the cloud providers even though they all support each other. Moving past that, all three of them (AWS, GCP, and Azure) have different sets of capabilities, so each combination of peering is set up very differently; additionally, as mentioned before, it’s up to you to figure it out.
Q: Let’s go through each case. How about peering AWS and GCP?
Koz: Due to the fact that both AWS and GCP support HA VPN Gateways and BGP routing, peering AWS and GCP was fairly straightforward.
Q: I’m sure it gets more complicated though, right?
Koz: Yeah, Azure doesn’t support the BGP protocol, so we can’t use the HA VPN Gateways. As a result, everything must be done manually. We had to manually set up the routes, create firewall rules, and forward the traffic from a manually provisioned IP to the classic VPN.
Q: What are some other differences that you had to iron out?
Koz: Well there’s no actual indication that the routes are passing traffic correctly, so we had to go through debugging to figure out why no one is talking to each other. Additionally, all the cloud providers handle security groups differently. Azure does it automatically, AWS does it at their custom security group level, and GCP does it at the global network level. So we ran into port forwarding issues, routing issues, and we didn’t know all the places to manually define routes. We just set up the tunnels, and even if we know where to get them established, it doesn’t automatically start forwarding traffic. So we had to dive deeper, looking at the flow logs to see why traffic was passing in one direction but not the other.
Abhi: That’s crazy.
Koz: It wasn’t that bad. Setting up Cilium was the hard part, but it worked like magic.
[Note to the reader: The Cilium team was incredibly helpful and was instrumental in the delivery of this feature. They were very responsive and gave us guidance at all hours of the day.]
PlanetScaleDB’s multi-cloud capabilities have been enabled by the powerhouse that is Vitess, a cloud native, open source sharding framework for MySQL. Vitess’ cloud native attributes and bleeding-edge capabilities made it the ideal platform to develop this feature. If you are interested and want to learn more, join the Vitess Slack community, try it out for yourself with the quickstart guide, or check out our newly open-sourced Kubernetes operator!