In this episode, Deepthi Sigireddi of PlanetScale spoke with SE Radio host Nikhil Krishna about how Vitess scales MySQL. They discussed the design and architecture of Vitess; how Vitess impacts modern data problems; sharding and scale out; connection pooling; components of the Vitess system; configuration; and running Vitess on Kubernetes.
This transcript was automatically generated. To suggest improvements in the text, please contact firstname.lastname@example.org and include the episode number and URL.
Nikhil Krishna 00:00:19 Hi, my name is Nikhil and I’m a host for Software Engineering Radio. Today it is my pleasure to introduce Deepthi Sigireddi from Vitess. Deepthi is a Technical Lead for the Vitess project. She’s a software engineer at Planet Scale, where she leads the Open-Source engineering team. Prior to Vitess, Deepthi had spent most of her career working on large-scale supply chain planning problems in the retail space. She has spoken more than once at open source and cloud native conferences about Vitess and is one of the experts in the technology. Welcome to the show, Deepthi.
Deepthi Sigireddi 00:01:00 Hi Nikhil, it’s great to be here.
Nikhil Krishna 00:01:01 So let’s get into it. So, what is Vitess?
Deepthi Sigireddi 00:01:06 Vitess is a project that was started at YouTube in 2010 to solve YouTube’s scaling problem. At that time, YouTube had grown so much that they were having outages almost every day because the infrastructure could not keep up with the kind of traffic they were getting. And this was primarily database infrastructure because YouTube had started with MySQL, and they were running many, many MySQL instances, and they all had to be managed. Some of the engineers, including Sougoumarane who’s currently the CTO at Planet Scale, got together and decided that they needed to solve this problem once and for all. That whatever temporary band-aids they were putting in place were not cutting it. And they were not going to work at all, looking at YouTube’s trajectory. So, they got together and they started trying to solve this whole issue of you have maybe hundreds of MySQLs, where you have manually sharded, where you’ve manually allocated different MySQLs to different applications.
Deepthi Sigireddi 00:02:10 And each application is talking to its own database or set of databases, and all these things have to work together in a coherent manner. So, that’s a little bit about the very beginnings of Vitess. It evolved over time to become a much more general-purpose scaling solution for MySQL databases. Or you can even think of it as a distributed database where you don’t really care about what’s behind the scenes. It just presents as a single relational distributed database. The team at YouTube donated Vitess to the Cloud Native Computing Foundation in early 2018. Even though Vitess was open-source from the very beginning, the copyright was owned by Google until it was donated to CNCF. And now it is owned by CNCF the license is Apache 2; there is a maintainer team consisting of 20-odd people working at various companies. We have hundreds of contributors and the way we count contributions includes non-code contributions. So, documentation, filing issues, verifying issues, all those things count. Over the last two years, we’ve had 400+ contributors from more than 60 companies, and there is a vibrant community around it. We have a Slack workspace with around 2,700 members.
Nikhil Krishna 00:03:39 That’s a great introduction. What specifically is the problem that Vitess is targeting to solve? You said that it is involved in scaling database, or it can be considered a distributed database. Could you go a little bit into what is that problem of scale you are trying to solve?
Deepthi Sigireddi 00:03:59 These days when people build applications, every application is essentially a web application. You have to have a web interface, and users interact with applications through the web. So, every application has to be scalable, reliable. You have to maintain availability. Users don’t like it if they are not able to connect to your application. What happens then is that these requirements — the scalability and availability requirements — that are necessary at the application level start percolating down the stack and you start requiring the same sort of scalability and availability from your database layer. Or, I want to say data layer because the data layer is not necessarily always relational, not always what we have conventionally thought of as databases. So, at the data layer, if you want to be able to scale — meaning, today I have a thousand users, tomorrow I may have 5,000 or next month I may have 10,000 — can I easily grow? Now what happens if something goes wrong? If there is a failure, what is the recovery mechanism? How automated is that? How much manual intervention is required? How much time do people have to spend on call, trying to figure out what went wrong? So, these are all problems at a business level or application level that start percolating down into the data level, and that is the problem that Vitess is solving.
Nikhil Krishna 00:05:28 And so you mentioned that it’s solving this data problem. We also have obviously the standard RDBMS databases like MySQL, MariaDB, Postgres etc., how is it that those databases are not able to do what Vitess can do? What is the problem with just using regular MySQL DB for all of these?
Deepthi Sigireddi 00:05:56 The thing with MySQL is that the traditional way of scaling it has been to put it on bigger and bigger and bigger machines. Over time, MySQL has built replication so you can get high availability. MySQL has a feature called Group Replication, where you establish a quorum before you write anything so that you get the durability. Even if one server goes down, there is another server that can accept writes. So your MySQL or the entire database doesn’t go down. So things have been evolving in that direction, in the RDBMS space as well. It is not that whatever Vitess is doing, other people are not trying to solve. If we want to talk about Postgres, there was a company called Citus Data, and there is a product called Citus, which was acquired by Microsoft, which does something very similar to what we are doing for MySQL in Vitess. The problem that the vertical scaling, putting things on larger and larger machines is that either you outgrow the most expensive hardware you can buy, or you can’t afford to buy the expensive hardware you need for your scale.
Deepthi Sigireddi 00:07:12 The other problem is that as you grow the database larger and larger, recovery times become longer if something fails. So if you take MySQL, you can grow it larger, you can replicate it. You can do the group replication so that you have a fallback. You can do all of those things, but you don’t natively have something like sharding where you can keep your individual MySQL databases small. And there is a layer that figures out how to combine data from different individual MySQL databases and present a unified view. And that’s what Vitess is doing. So we keep the databases small, you can run it on commodity hardware that keeps the costs down, and there is no practical limit to how big you can get, because you can just keep adding servers.
Nikhil Krishna 00:08:00 Is this anything special that needs to be done, if I were to adopt Vitess as my data layer? So, in the application is there anything special that I need to do?
Deepthi Sigireddi 00:08:12 So it really depends on what the application is doing and how it’s written. So, it may be as simple as just changing the connection string to point to your new Vitess backed database. Or maybe there are some features that you get with MySQL 8.org which are new in MySQL 8.org that the application is using, which are not yet supported by Vitess. So, it really depends on the queries that the application is producing. So typically, the migration path we recommend is that you take your existing database, assuming it’s MySQL, if it’s not, then the migration looks different. And you put Vitess in front of it without sharding, and you start running your queries through Vitess. And then you can flip a switch that says unsharded, but not really. You are still just, one shard. So really unsharded, but a mode where you can get errors, but what would happen if you were really sharded as warnings, and then you can work through them. And once you work through them, then you are ready to fully erupt with this and go into sharding and things like that.
Nikhil Krishna 00:09:26 So, one quick question out here, we mentioned that Vitess is a layer on top of MySQL and you pointed out that there are some features of MySQL, that are not yet supported. Can you kind of quickly elaborate as to what is the supported surface for the Vitess project right now?
Deepthi Sigireddi 00:09:47 So almost everything that MySQL 5.7 supports, is supported. I think the only exception to that is that if you want to use views, then it doesn’t quite work in a sharded environment. It still works in an unsharded environment and the same thing for stored procedures or functions. They have to be managed at the MySQL level, not at the Vitess level. So except for those couple of caveats, everything should work with 5.7. In 8.0, a lot of new syntax was introduced and some of them we have added support for. So we are in the process of doing that compatibility with MySQL 8.0. So, there are people running in production today with MySQL 8.0 with Vitess, no problems because they don’t use common table expressions or Window functions or some of the JSON functions, we don’t yet support. We support a subset of the JSON functions, not all of them. And like I said, the compatibility work is ongoing. And when I check on it every once in a while, I can see how that list is getting smaller and smaller. We have tracking issues on GitHub and I can see the check boxes of what we now support.
Nikhil Krishna 00:11:03 So is MySQL, MySQL itself has couple of flavors, right? So, there is the official MySQL and then there are couple of other projects like MariaDB and Percona and all that. What about those, are they also supported or is that kind of different?
Deepthi Sigireddi 00:11:21 Until fairly recently we supported Enterprise, MySQL community, MariaDB, Percona. We still fully support Enterprise, MySQL community and Percona, Percona is pretty much indistinguishable from MySQL, except they have patches in, they have bug fixes that they keep carrying on their newer releases. MariaDB is different. So we had support for MariaDB. There were people who were running on MariaDB or trying to run on MariaDB, but they have run into problems because MariaDB has diverged quite a bit from MySQL. We actually have an open RFC proposing that we will officially drop support for MariaDB sometime next year when 10.2 goes to end of life. 10.4 is where a compatibility starts breaking.
Nikhil Krishna 00:12:15 Right. So coming back to how Vitess scales the data layer, can you talk a little bit about the cluster topology? So how does Vitess kind of shard and how does it do the horizontal replication that it does?
Deepthi Sigireddi 00:12:37 Okay so there are two facets to the cluster management. One is availability. So we always run, or the recommended way of running Vitess is you always run it in a primary replica configuration. There may be people who are running it just primaries, which means that if the primary goes down, you have downtime, it’s an outage. But the recommended configuration is primary replicas and the replicas are keeping up with the primaries so that if the primary has to be taken down for maintenance, you can do a plan failover, no disruption to client traffic. If there is an unplanned, I don’t want to call it downtime, unplanned failure. Let’s say the primary goes down. There is some disc failure or MySQL ran out of memory or something like that. Right? Then there are primitives in Vitess that let a human take an action, basically a push of a button to fail over to one of the replicas, and then the system will start functioning again.
Deepthi Sigireddi 00:13:36 One of the projects that is in progress is to totally automate this, even in an emergency situation, Vitess should be able to detect and do an auto fail over without human intervention. And we are very close to making that GA in the next release 14.0, which will be out in a few months around June. That should be GA. So there is that availability aspect to it. Then there is the scalability aspect, which is where sharding comes in. So you have your whole database, when you shard what you’re doing is you are saying, I store a subset of the data on each server and together a group of servers will have all of the data. And what that means is that your data can keep growing and you can keep breaking it up across more servers. So maybe you have 250 gigabytes of data. It’s fine. MySQL will run fine, no problems. One shard with the primary and a couple of replicas is good, but let’s say you grow to 500 gig, one terabyte, two terabytes. The recommended size is 250 gigs. So you may say, okay, when I get to 300 or 350, I’m going to go to two shards. When I get to 600 or 700, I’ll go to four shards. And Vitess can transparently make this happen behind the scenes while applications are still connecting to the database.
Nikhil Krishna 00:15:04 So when you say transparently, do it behind the scenes. Is there some kind of hardware or infrastructure setup that needs to be done, or is it like switching or just changing a value in some kind of config, or do you think that, I mean, is there kind like a config file that you need to modify and say, hey this is the new server, that going to be the new replica.
Deepthi Sigireddi 00:15:31 That’s a great question. So when I say transparently, it’s transparent to the client applications that are connecting to the database. So whoever’s running the Vitess system still needs to provision hardware. When you increase the number of shards, there is a hardware cost to it, whether that is bare metal or VNS or a cloud environment, somebody has to provision the additional hardware. And like you said, there is a configuration file where you specify whether things are sharded or not. And for each table, you’ll also specify the sharding scheme. So there is a config file that has to change when you first go from unsharded to sharded. But if you are already sharded and you want to split one of your shards, then there are commands that Vitess provides, which will do that for you. So you can say, I want to re-shard and my source is X and my destinations are going to be this set Y, letís say, right?
Deepthi Sigireddi 00:16:28 Or ABC then Vitess will figure out what the boundaries are for the sharding keys. And it will copy all of the data from the original shard to the new shards. And it will keep them up to date until an operator is ready to say, okay, I’m ready to cut over. Let’s stop using the old shard, let’s start using the new shards. So, there is a lot of human intervention or orchestration in this process, but that is somewhat by design because re-sharding is somewhat of a scary thing to do. And you want to be able to have these checkpoints where you can sort of pause and run some check sums, or we provide a Diff tool that can do a Diff between the source and destination, which takes a long time to run because you are comparing gigabytes of data or hundreds of gigabytes of data. And then when you’re comfortable, you can actually say, okay, I’m ready to switch. And when you switch you can say, can you by the way, keep the source in sync with the new shards so that if something goes wrong or we made a mistake, we can quickly fall back.
Nikhil Krishna 00:17:44 Right.
Deepthi Sigireddi 00:17:45 And then redo it.
Nikhil Krishna 00:17:48 Awesome. So it basically sounds like, other than the planning that you need to do to make sure that you have the necessary hardware and planning to understand that these are the tables I’m going to be sharding, and making those decisions, most of the other work, basically we test handles in the sense of making sure the databases, the data is moved over and that it is synced up and it keeps the maintenance so that you can switch over smoothly. Right. OK. Awesome. Let’s kind of like go into maybe some of the basic concepts of what a test database is like. Happened to be looking through the Vitess documentation, which is quite extensive. And there were certain terms that I thought might be good that we could discuss in the podcast. So let’s start with this term of what a cell, right? So what is a cell and how does that work?
Deepthi Sigireddi 00:18:46 A cell is a failure domain. So it is the unit where if something fails, maybe everything fails. That’s a possibility, right? So it could be a cloud region, a cloud availability zone, or if you’re running on bare metal, it may be a rack or a server. So people can define what the cell looks like. And the purpose of having multiple cells is to, is to be able to reason about failures. So people can say, okay, I’ve deployed Vitess, in this availability zone from Amazon or this zone from Google, what happens if the whole thing goes down, it’s rare, but it happens, right? Then you can say, oh, then maybe I should create another cell in a different availability zone and replicate into that. So that even if one say goes down, the other one is up. Defining cells in your Vitess topology allows you to plan for failures at the infrastructure level.
Nikhil Krishna 00:19:51 Okay, just a quick question over there. So can you actually define cells that are geographically separated? So can I have like one cell in America and another cell in Europe?
Deepthi Sigireddi 00:20:05 Yes, you can do that. And in fact, YouTube ran with replicas all over the world. Their primaries were located in north America, but they had replicas everywhere. And those were different cells.
Nikhil Krishna 00:20:19 Obviously, that’s kind of like a base level infrastructure concept on top of that, then there is this concept of a key space. So, what is a key space and how does that work?
Deepthi Sigireddi 00:20:30 So a key space is basically a distributed database or distributed schema. You can think of it as a schema in MySQL terms. So, in MySQL on a single database server, you can have multiple schemas. In Vitess, a single Vitess cluster you can have multiple key spaces. And a key space is a logical database that can physically be backed by multiple servers, multiple replicas, shards, all of that is part of one key space.
Nikhil Krishna 00:21:02 Okay. The way to kind of think of it is like, I can call it my, so if I have like a, I donít know, eCommerce site, this would be the name of the logical set of tables that we call in a database in MySQL, okay? And so obviously that is the logical thing. It is distributed over many physical databases. The next concept over there would be the shard. So, because that would be one level down from the database. So, can you describe what is a shot from the perspective of the test?
Deepthi Sigireddi 00:21:36 A shard is a subset of the key space. So, let’s say your key space spans 10 tables, and let’s say one of them has a hundred rows, right? A hundred just because that’s a simple number to work with. Now, let’s say you want to have four shards. Then those hundred rows will be distributed across those four shards. In some fashion, they may not be 25, 25 each, maybe they’re 22, 28, 27, somewhere there, but each row in a key space lives in one shard and only one shard. And every row in a key space lives in some shard. So, in mathematical terms, if you think of your data as a set, then the shard comprises a partition of that set.
Nikhil Krishna 00:22:19 So you said that a shard or a data row can live exactly in one shard? So don’t you think from that, that’s kind of a problem? What happens if that shard dies? Do you, it means that that data is no longer available?
Deepthi Sigireddi 00:22:39 So this is why you do the primary replica configuration. So in each shard you have a primary and you have multiple replicas. So total shard failure is very rare, because it’s going to be very rare that all of your nodes in that shard go down at the same time and you could distribute each shard across multiple cells. So every shard can live in every cell. And that way you get fault tolerance to even total zonal failure.
Nikhil Krishna 00:23:09 The cell we’ve got the key space, that’s the logical grouping of the database, and then there’s a shard, which is logically one partition, but physically you have multiple copies of it. The next concept, I guess, would be how you manage all of this. Right? So I saw there is this idea of a tablet in Vitess. So what is the tablet? And what does that do?
Deepthi Sigireddi 00:23:33 A tablet is basically a management component over MySQL. All the data is stored in MySQL instances, but we need something that can say, well, this is the primary for this shard. And we need to let everybody else who’s involved in this distributed system, know that this is the primary, or we may need to start and stop application. So let’s say we are doing a failover from the current primary to a new one. There are some MySQL level actions you need to take with the appropriate commands so that you can elect the new primary and you can make the old primary now change itself into a replica and start replicating something with the primary. So, these are the sorts of management things that the tablet does. The tablet can watch the replication and make sure that it’s managing the replica and for any reason, replication breaks, try to restart it.
Nikhil Krishna 00:24:34 So is a tablet basically running as a separate server component or is it client that can connects to the cluster and is it like a control plane concept of Kubernetes?
Deepthi Sigireddi 00:24:47 It is a separate process. Typically, it runs on the same server machine. Physical or virtual as MySQL and it connects through the UNIX socket. So connecting through the UNIX socket means that a lot of security things you don’t have to worry about.
Nikhil Krishna 00:25:05 Right. So, for every MySQL or a node that you have in your cluster, there is a tablet that is running along with it?
Deepthi Sigireddi 00:25:13 Yeah. That’s basically like a thin layer sitting on top of the MySQL.
Nikhil Krishna 00:25:17 That makes sense. So the next, obviously ways to think about, now you have a cluster of machines and it’s this Vitess cluster, how do you actually connect to it? So there is a proxy, there is this concept of a VT gate proxy. So could you talk a little bit about that?
Deepthi Sigireddi 00:25:38 You’re exactly right. You have all of these, many MySQL instances with VT tablets managing them. How does the client know who to talk to, okay? So, VT gate is the one that lets Vitess, pretend to be a single database. So we give the illusion that its existing database, you have a single connection string that you can use to connect to this VT gate or basically, a server address and a port. People typically run it on the standard MySQL port 3306, mitigate can speak the MySQL protocol. So any MySQL client can connect to it, including JDC – MySQL clients, GoLine- MySQL clients, Python-MySQL clients, even the Ruby-build in MySQL clients works with VT gate. It can also support gRPC. So clients which implement the GRPC protocol can connect to VT gates using that protocol.
Deepthi Sigireddi 00:26:40 And the thing it does is that it routes queries to the right place. So let’s say we get a simple query, select X, Y, Z from some table where X equals 10. VT is the one that figures out, where should I go look for this data? And if it is unsharded, its simple, it just sends it to the unsharded primary, if it is sharded, it has to figure out the routing. And for more complex queries, it may have to send the query to multiple shards, either all shards or a subset of shards and it may have to consolidate the results. So maybe there are rows in like three different shards where X equals 10 is a match. Then it has to combine all of them and return the full results set to the client.
Nikhil Krishna 00:27:29 Then this particular proxy, depending on how complex the query is, how complex the cluster is, can be a significant machine or a node, right? It probably takes up a lot of your resources as well.
Deepthi Sigireddi 00:27:42 Correct.
Nikhil Krishna 00:27:45 Do you have replication for this, or what happens if your proxy goes down?
Deepthi Sigireddi 00:27:47 You can have any number of VT gates. So what people usually do is they benchmark and they size the Vt gates to their traffic. And they may, people will always run at least two, maybe three, but some installs of Vitess runs hundreds or thousands of VT gates.
Nikhil Krishna 00:28:04 What kind of scenarios needs that kind of. . .
Deepthi Sigireddi 00:28:08 There are some users of Vitess where they’re processing millions of queries a second. And they’re trying to keep each VT gate at maybe 50 to 100 thousand queries a second. So just like you can scale your backend as your data grows, you can scale the VT gates as your query volume grows.
Nikhil Krishna 00:28:29 Right. Does that mean that at some point, I mean, especially for that particular scenario that you mentioned, you probably want to have a proxy in front of the proxy to kind of figure out which proxy to go to?
Deepthi Sigireddi 00:28:44 Correct. So what people is their unload balances? So a load balancer will receive the query and it’ll basically do some sort of round Robin across the VT gates. Or maybe you’ve deployed your application through a CDN in various parts of the world and behind the CDN you have a small set of VT gates, which will receive the traffic.
Nikhil Krishna 00:29:10 That makes a lot of sense. So there’s another particular term that I came across your documentation called the Topology Service. What is this topology service and what does it do?
Deepthi Sigireddi 00:29:23 What the topology service does is it stores the cluster state so that different components can discover each other. So really the component that really needs to discover everybody else is VT gate because it needs to know which tablets it can route to. So when a VT gate comes up, it’ll be able to read what key spaces exist, what shards exist, which tablets belong to each shard. The other piece of information we store there right now, which in theory you don’t have to, is which is the primary tablet for a shard. So let’s say you add a new replica. You decide that, oh, I have a primary and two replicas, but I want to add two more replicas for whatever reason. Those replicas have to discover, which is the primary tablet that they should start replicating from. And they do that by consulting the topology service. So metadata about the cluster is what is stored in the topology service.
Nikhil Krishna 00:30:22 Is it possible to then query that metadata to understand? Is kind of like a monitoring tool that you can build, is it available over Vitess?.
Deepthi Sigireddi 00:30:32 The metadata stores we support are at CD, Zookeeper and some people use Console. All of them are well known tools, which come their own APIs. So it is possible to query them directly, but we also have a client. So Vitess comes with a Client that you can use to say, get me a list of the key spaces, get me a list of the shards in the key space, get me a list of all the tablets that you know about and what the Client will do is it’ll talk to a server, a control lane server, which will query the topology server. And it knows how to convert that the binary data, it receives from the topology server into structured data that the Clients can consume.
Nikhil Krishna 00:31:21 Thanks. That kind of gives an overview of how Vitess is set up. Kind of like an overview of the architecture. But obviously the main thing that Vitess does is use sharding to kind of scale horizontally. So,perhaps at least for the users, it might be useful to go a little bit into what is database sharding and how that works and how does it help scale a database?
Deepthi Sigireddi 00:31:51 We talked a little bit about this already, so we’ll go a little deeper now. To recap, sharding is the process of splitting up your data into subsets and storing or hosting those subsets on different service, physical or virtual. And the reason we do this is because smaller databases are faster. You can improve your latency, but you can also improve your throughput. You can serve more queries at the same time because you have more computer sources and there’s less contention within the database when you split them up this way. And we can support more connections at the, MySQL level. Usually people configure MySQL with some max connections number based on their workload. Let’s say that’s 10,000 or I’ve seen 15,000, but not more than that. But with VT gates and the way we do things, we can actually support hundreds of thousands of connections or millions of concurrent connections. As to how the sharding actually happens,
Deepthi Sigireddi 00:32:52 we talked about how there is some configuration that you have to set up and then the process will stop. The way it works is that Vitess will first create the necessary metadata. So let’s say we are splitting one shard into two, it will create those two shards in the metadata. And then the operator, the person who’s running this, has to provision the tablets for that shard and start them up and say that, okay, these are now the new tablets. Then what Vitess can do it, it will say, okay, I need to now start copying the data. And because we write only to primary in each of the destination shards, I’m going to start writing into the primaries. So in each of the destination shards, I’m going to start what is called the V replication. And that V replication stream will copy data from the source to the destination. And the source is given to it as a key space shard specification. So it consults the topology server to say, what tablets are available that I can stream from, and it will choose one of the available tablets and it will start a copy process.
Nikhil Krishna 00:34:05 OK. Just a fundamental thing. How granular can you make a shard? Is it kind of like at the level of a table, can you go smaller than a table? Can you have like set of tables to become a shard?
Deepthi Sigireddi 00:34:21 Sometimes people will split tables out into another key space. This is what we call vertical sharding or move tables. So let’s say you have 10 tables. Two of them are very big and eight of them are small. You don’t have to horizontally shard all of them, maybe you just move those two large tables into their own key space first and then you can shard that key space while keeping the smaller tables unsharded. So there is vertical sharding and there’s horizontal sharding. So a shard can contain a subset of tables or it can contain a subset of the data in a subset of all of your tables.
Nikhil Krishna 00:35:00 Right. So is it possible for Vitess to have, like you mentioned, I have this huge single table, which is like my primary table with no NTP and there’s a lot of data in it. But there’s a lot of kind of like reference tables and master data tables, a few rows but you keep them for the configuration data set, right? So is it possible to have, like those tables, not in any shards but just this big one in its own key space in the shard?
Deepthi Sigireddi 00:35:31 Yes, that’s definitely possible.
Nikhil Krishna 00:35:33 So if that is the case, then how does that kind of work when it’s like, you’re running a query, which has joints in it, for example, right. So you would have to go to one shard for, some of the data and another shard for the other data. Don’t you think that’s kind of like, doesn’t it have a performance implication?
Deepthi Sigireddi 00:35:53 That’s an excellent question. So Vitess supports cross key space joints, so it can happen. But there is a feature in Vitess called Reference Tables. So what you can do is you can say that these are my reference tables, which are in this unsharded key space, but replicate them into the sharded key space. So then every shard in the sharded key space will have a local copy of the reference tables, which is kept up to date with the single source of truth, and joints become local.
Nikhil Krishna 00:36:25 Ah okay. And since those tables arenít very big it’s acceptable overhead?
Deepthi Sigireddi 00:36:30 Exactly.
Nikhil Krishna 00:36:31 Is there any particular type of joints which are, let’s say less optimize, is there any kind of optimization you can do around your SQL querying to make your performance on Vitess better?
Deepthi Sigireddi 00:36:47 There is a tool that comes with Vitess called VT Explain, to which you can provide what your planned sharding scheme is and number of shards, and it can simulate what your joint will end up actually looking like. So the client is issuing one query, but behind the scenes, maybe we have to do a bunch of select from a bunch of shards and then use those results and issue another bunch of select from the same or different shards, and then combine all of them. Right. So it’ll actually show you that plan. What does that plan look like? And people use this tool VT Explain, to look at what their query plan will look like in Vitess. How it’s being routed, how it’s being combined, maybe there’s an aggregation, and that can be used to then if desired, rewrite the queries so they result in more efficient plans.
Deepthi Sigireddi 00:37:43 We do also do some optimizations during the query planning. So we build up an in-memory representation of the query that lets us basically do relational algebra on them. So maybe you’ve built up a three representation of the query and it’s possible to take a filter, which is at a higher level and push it down to the lower level. What that then means is that you’re combining smaller sets of data together after filtering versus combining two large subsets of data, and then filtering on that. So we can do optimizations of that sort during the query planning.
Nikhil Krishna 00:38:21 Okay. And that would be, so is that something that happens like transparently and the client doesn’t care? Or is that something that can be helped or is that kind of like a hint that we can give?
Deepthi Sigireddi 00:38:34 So it happens transparently. It happens in VT gate during query planning. There are some query comments slash hints that we support, but very few. And I don’t know if there are any that actually affect the planning.
Nikhil Krishna 00:38:52 Okay. So the data is basically now written in multiple shards and you have obviously in the configuration file, you probably specify, Okay, I want so many copies of the data so the shard, basically have so many copies created. How do you actually optimize that? Because you might be getting certain queries that happen a lot, and that kind of affect only certain parts of the database, right? So you might have large OTP database. It’s a primary, database’s always getting queried, but there may be some other user related, user service data that’s not queried quite so often. And you want to kind of, maybe it’s like even like time series data. So it is time sensitive, right? They may be querying a lot on the recent few days versus a year ago. Is there any optimizations that Vitess does that kind of help improve the performance from that perspective?
Deepthi Sigireddi 00:39:52 A lot of this is sort of Vitess cluster architecture that people design themselves. So, if you have tables which are less frequently used and they are not typically queried in joins with the more frequently used tables, then you may just put them in a key space that is not resourced so heavily. You run it on smaller machines. There are a couple of things Vitess does do for you in order to reduce the load on the system. One of them is what we call query consolidation. Some people call it query dedpulication (?). So the VT tablet layer, which is in front of MySQL, receives the query that it is supposed to execute from VT gate and passes it onto the MySQL and then gets the results and sends them back. So it knows what are all the inflight queries when I receive a new query. And if it so happens that there is a query that’s already in flight and I’ve received 10 identical queries, same queries, same bind variables, same wear clause, same values, everything the same. Then what VT tablet will do is it will not issue those additional 10 queries to the MySQL. It will say I’ll cue them. And as soon as the first one returns, I can return all of these because they have the same results set. So if you have, like a hot row in terms of reads, a row that is being queried a lot, then this actually says we will not do the wasteful work of querying the same data over and over again.
Nikhil Krishna 00:41:23 Okay, so it has its own kind of cache of the data?
Deepthi Sigireddi 00:41:28 Right. Of the results. Yeah. But it’s a very short-lived cache because as soon as you start caching, you start getting into staleness problems.
Nikhil Krishna 00:41:36 Yeah.
Deepthi Sigireddi 00:41:37 So it’s extremely short-lived. There is a leader which is currently executing. There are followers that are waiting. As soon as the leader returns, all of the followers that are waiting return. Then the next one you get will become the leader. So, at that point effectively, you’ve cleared your cache and you have no staleness.
Nikhil Krishna 00:41:57 Right. OK, cool.
Deepthi Sigireddi 00:41:59 There’s one other feature, which is, again, maybe there is a row that is being written to very frequently and that can cause contention at the database level. If many transactions are trying to operate on the same range of data, which we compute in some way, then we’ll actually say let’s not create contention at the database level between all of these transactions, let us at the VT tablet level, serialize them so that only one of them is hitting the database at any given time.
Nikhil Krishna 00:42:34 Okay. So, is that something similar to like, when you say serialized, right? You’re talking about serializing at the tablet level, right. So at a particular shard level, you still have the replication happening independently and copies of the data are being kept or in multiple tables, correct?
Deepthi Sigireddi 00:42:56 Correct.
Nikhil Krishna 00:42:57 Okay, so is there any kind of restriction or constraint around, okay, can I set up Vitess in such a way that I say, Hey, okay this data that I’m writing is important, I need to make sure that it is there and it is available. Can I control it so that it works, or rather the transaction commits only if it has been written to multiple key spaces of multiples shards, something like that?
Deepthi Sigireddi 00:43:25 Okay, so we should talk about durability and then we should talk about cross-shard transactions. So the default replication mode for MySQL is asynchronous. So you write to a primary, as soon as that gets written to disk, or however MySQL decides that the transaction is complete, it returns to the client and any replicas that are receiving binary logs from the primary, there is no acknowledgement. There’s no guarantee that anybody has received them. They are just following along at their own pace. But MySQL does have a semi-synchronous replication mode. This was originally developed at Google and then it became a part of standard MySQL. What happens in semi-synchronous replication is that the primary is not allowed to respond to a client with a success for a transaction until one of the replicas acknowledges that it has received that transaction.
Deepthi Sigireddi 00:44:28 It doesn’t have to write it to its tables. It just has to have received it because what receiving means is that the replica has written it to its disc in a file called the relay log. So, the primary has been logged, sends them to the replica. The replicas relay log gets written when it receives the binary logs. And then once it’s applied those relay logs to its copy of the database, then its binary log gets written. So, there is semi-synchronous replication, which if you enable it and set the time out to basically infinite. You don’t let it time out so that you are guaranteed that if the primary returns success for a transaction, then it has persisted on two discs, not just one disc. So that gives you durability. You don’t control this at the client level. It is a server setting. There are other distributed databases that let you choose some of these settings at the client level. But in MySQL it’s a server setting.
Nikhil Krishna 00:45:31 Right.
Deepthi Sigireddi 00:45:33 So that is the durability of a transaction that a client has been told has been accepted. So this way, even if the primary goes down, you’re guaranteed that you can find that transaction somewhere.
Nikhil Krishna 00:45:45 Now that we have an idea of how MySQL ensures that you have at least two copies, I guess the question would be, do you need to have semi-synchronous replication in order to have a distributed transaction? Or can you have this? And can you even set it to be a little bit more strict than just the two-way replication that semi-synchronous allows?
Deepthi Sigireddi 00:46:07 It is possible to set the number of acknowledgements you should receive before the transaction is completed. So, MySQL lets you say that most people set it to one because two failures in two different discs are unlikely, but you can set it to two acknowledgements. Then it will be written to three places before it succeeds. But you sacrifice latency for durability — for higher durability — at that point.
Nikhil Krishna 00:46:33 OK, cool. So, one thought that occurred at that time was, does this work across availability regions, right? So, suppose you’ve configured your Vitess shard to be across multiple regions, can I then say, Hey, I want to do a distributed transaction where I want it to be in two availability regions?
Deepthi Sigireddi 00:46:59 That’s another great question. So people do this. So they will have a cell in one AZ, they’ll have another cell in another AZ and they set up replication between them and configure Vitess in such a way that unless you receive an acknowledgement from a different availability zone, the transaction doesn’t complete. It introduces a little bit of latency. So if you’re in the same region — AWS but different availability zones — people have measured this. The latency is about, additional latency is about 150 milliseconds. So you are adding that much time to each of your transactions, but that’s a tolerable additional latency.
Nikhil Krishna 00:47:41 Right. Moving on to another question, which is regarding the queries: you mentioned that Vitess has this internal query planner that figures out the best way to execute the query across shards, right? How does that actually improve? Is that something that’s part of MySQLís roadmap, or is that something that Vitess kind of creates and improves on its own? How does that actually get better?
Deepthi Sigireddi 00:48:13 OK. So the way it gets better is that we have a team working on it. Five years ago, the query planning was rewritten and we called it V3 and last year we rewrote it again and called it Gen4 and we are planning the Gen5. So this team that specializes in query serving and query planning, they are going out and reading the research on how you can build better query plans and applying it to our specific use case of: you have a query, it’ll be cross-shard, what’s the best way to execute it?
Nikhil Krishna 00:48:48 Okay.
Deepthi Sigireddi 00:48:49 So that’s how we get improvements.
Nikhil Krishna 00:48:51 And then that’s probably why you don’t support that many hints from the client anyway, because can restrict the way then you can improve query,
Deepthi Sigireddi 00:49:02 Correct. Sometimes this can happen, but in general it’s unlikely that the human has enough data to come up with the best hint, right? Which works under different circumstances. So maybe it works for today’s workload, but doesn’t work for tomorrow’s workload.
Nikhil Krishna 00:49:24 Cool. So, moving on to another question, we talked about how Vitess uses the VT gate server and the VT concept to basically have so many database connections, right? So a MySQL connection is not kind of like a, you know, my server connections basically are pretty heavy weight. You can’t really go beyond 10, 15 thousand connections. It starts becoming a bottleneck for the database. How does having millions of connections on a VT gate, doesn’t that need to get translated into MySQL connections at the end of the day? So how do you kind of optimize that so that it doesn’t affect the MySQL load?
Deepthi Sigireddi 00:50:09 The way you do it is through connection pooling. And connection pooling has become a pretty standard thing for people to do now. So for Postgres, there’s a tool called PGbouncer. There are tools like HAproxy, or proxySQL. So there are many tools that have implemented this connection pooling concept — even frameworks. So, Ruby on Rails, you say I want a connection pool, and you just use those pool connections. So, the way this improves what you can do at the MySQL level, the way you can support hundreds of thousands or millions of connections at a VT gate level with say, 10,000 connections at each back-end MySQL level, is that typically not all of those connections are active at any given point in time. If you look at an end user, what they are doing, let’s say I go to a web application or even a desktop application.
Deepthi Sigireddi 00:51:02 I bring up Slack, I’m reading through messages. I don’t need to be executing a query against the database every millisecond, right? Maybe the way the Slack app works every second, it fetches new messages and shows me. So, most of the time, it doesn’t actually need a database connection or need to use the database connection. So, instead of a dedicated connection to the backend MySQL for each end user, you say we will give you a super lightweight connection at the VT gate level, which is just a session, a few bytes of data. And when you really need to access the backend MySQL, then we will take a connection from a pool and we will use that connection, fetch the data and return the connection to the of pool. Connection pools can also get exhausted, but you’ve now increased the size of, or the number of connections you can support by 10X or 100X.
Nikhil Krishna 00:51:59 Right. To kind of discuss that a little bit more. So one of the things I have noticed, at least, when I’m working with systems is that there is this microservices architecture mode, right? And one of the usual things that happens with microservices architecture is that every microservice has its own database. But they put all the databases on the same physical machine. I’m kind of like why are we doing this again? But one of the challenges bottleneck that end up happening is that each microservice kind of then, like you said, using the Ruby framework for the Python framework, they’ll create a connection pool of 10 connections say, and then very rapidly you’ll run out of connections because you have every microservice is holding onto 10 different connections. Right? Obviously it sounds to me that Vitess basically is a nice way to kind of handle that particular architecture’s particular problem. But one thought on that is, okay, microservices by definition are independent, right? So if you have multiple microservices, for whatever reason, they’re kind of having say write transactions or are doing work, right? You might actually have the situation where you have different connection pools that are all holding onto heavy connection. So, it’s not that idea of having the lightweight thread, does not necessarily always work because you might have maybe multiple processes or multiple clients from the Vitess perspective, there’ll be multiple clients, all trying to do heavy writing work, maybe not necessarily to the same table, but to the same database.
Deepthi Sigireddi 00:53:41 Right, right. Like you said, if there are thousands of services and each of them has a connection pool of 10 or 20, then maybe you will run out of what you can support at the backend. And the way people have solved this problem. So what we are calling microservices, people have typically called them applications. So we have Vitess installs where they do have hundreds of applications because they’ve structured their system in such a way that it’s not monolithic. So what people tend to start doing then is to start splitting the data out into key spaces. Because if you have a separate key space, then you basically have a separate Vitess cluster with your own compute. It’s not going to be interfered with by some other key space. So maybe you group your microservices and say, okay, this group of microservices gets this key space. And this group of microservices, which is in no way connected to this other group at all, can have its own key space and they don’t need to talk to each other at all. So that’s what people have done.
Nikhil Krishna 00:54:46 So you can use the key space concept to kind of break that out into its own set. Okay, that’s pretty cool.
Deepthi Sigireddi 00:54:54 Right. So that you no longer have a monolithic database, which is a bottleneck at the back end, you have multiple smaller databases.
Nikhil Krishna 00:55:03 Okay. So moving to another question over here is, so obviously one of the things about RDBMSs and databases is asset compliance, right? So how does Vitess support asset compliance? Is it completely asset compliant, or is that like a no SQL thing where it is not fully asset complaint?
Deepthi Sigireddi 00:55:30 If you are in unsharded mode Vitess is fully asset compliant. It’s no different from MySQL. But when you go sharded, then you are a distributed system, a distributed database. And some of those guarantees start to break down and we can take like each of them one at a time. So the first one is atomicity in Vitess there are three transaction modes. You can say, single, in which case multi-shard transactions are forbidden and you’ll get an error. And there are people who run it that way. The default is multi, which is like a best effort. So what you do when the transaction mode is multi, is first you figure out which all shards will be involved in this transaction. And you begin the transaction. So you can do it in three phases begin, write and commit. The begin and write can be combined into one phase.
Deepthi Sigireddi 00:56:23 So you basically open a transaction on each shard that is going to be involved and you write the data, but you don’t commit it. And you do them in parallel. So you may write in parallel to like three or four shards. So you’ve written the data, the transaction is still open. It’s not being committed. So then what you do is that you committing in sequence. So one at a time, and if any commit fails, you basically say, okay, this is a failure. And you stop at that point. So what that means is that a failed trans multi-transaction in Vitess is not atomic. Some data has been written, some data has not been written. It’s possible for the application to repair it by reissuing the same write as long as it’s idempotent. For example, if you’re doing an update, no problem, right?
Deepthi Sigireddi 00:57:17 Update set to the same value is fine. Let’s say you’re doing an insert. Maybe the insert does insert ignore or insert on duplicate key update, or something like that. Then you can reissue the transaction. Maybe this time it succeeds, but by default, in case of a shard level, then you can reshoot the transaction. Maybe this time it succeeds. But by default, in case of a shard level commit failure, you don’t get atomicity for these types of transactions. That is atomicity, the default behavior. We do have a two-phase commit protocol. So if you set the transaction mode to two phase commit, then you get atomic transactions in the sense that it’s all or nothing. So there is a coordinator process. We write the metadata; we go through the state transitions for the distributed transaction. There is prepare and commit and then complete or failed.
Deepthi Sigireddi 00:58:16 And at the end of it, either all of it has been written, or it has failed. And if something has failed, then we try to resolve it. So, if something has not succeeded after a certain time period since it started, then one of the VT tablets, which realizes that ‘oh, this transaction is still in a failed state’ will try to resolve it. So we have two PC transactions, but they come with a cost because they will be significantly slower than the best effort multitransaction mode. So that’s atomicity. Do you want to ask any follow questions before we go on to consistency?
Nikhil Krishna 00:58:56 No, I think we’re good. So we talked about two-phase commit; we talked about multi, so yeah, please go ahead.
Deepthi Sigireddi 00:59:04 Okay. So the next one is consistency. For a traditional RDBMS, all that is meant by consistency is that any database-level rules have to be respected when you write a transaction to the database. So this is uniqueness constraints. Maybe you’ve set some checks on particular values. Maybe you want to provide a default value. There’s a Not Null check, or there is an auto increment. Then the system must make sure that the next value you write doesn’t collide with any of the previous values. So these types of database-level constraints, that is what consistency means for like a single database. In a distributed database, you sort of have to reimplement some of these things. So, in Vitess we may have four shards. And if somebody wants a column value to be unique, then we at the Vitess level have to ensure that that column value is unique across all of those shards. And we can do that if that column is the sharding scheme, because for a given value of the sharding column, we can make sure that it is unique. The other one is auto increment. So we can’t just have people doing auto increment at the MySQL level, because then in different shards, they will end up with the same values because you’ll start at 1, 1, 2, 3, 4 in each shard. So Vitess provides something called a sequence that you can use to do auto increment in such a way that it is consistent across all of the shards.
Nikhil Krishna 01:00:39 Okay. When you said that the sharding scheme, you can be consistent in a column — a unique column — if the column is the sharding scheme. Does that mean that each shard would have a separate partition or a separate set of values for that column?
Deepthi Sigireddi 01:00:56 Yeah, pretty much. So, when you get the value, you have to figure out which shard to put it into, and you compute some sort of a function on that value and that tells you which shard it goes into.
Nikhil Krishna 01:01:08 How would that actually work for if you have like, so if I have got a 100 rows and I’ve set fours shards, that means that the first 0-25 will be in one shard, 25-50 will be in another, 50-75 will be in another, and the last shard will basically be anything about 75?
Deepthi Sigireddi 01:01:28 Well, it depends on how you define the sharding scheme. So Vitess has many different sharding schemes, the simplest one, which gives you good distribution is hash. So if you have a numeric column and you hash it, then you’ll get a good distribution. You won’t get this sort of over loading of one shard. But there is a sharding scheme called numeric. You can do that too. Maybe, your application is producing random numbers and numeric is a good way to shard them. There are like seven or eight built in sharding schemes. For example, if you have a string column, then you can do a Unicode MD5 type of algorithm on it. You can do XS hash. So there are a handful, I would say about 8 or 10 built-in functions that you can use to do sharding, or you can do custom sharding. You can say everything in this range goes to this shard.
Nikhil Krishna 01:02:27 Okay.
Deepthi Sigireddi 01:02:29 Or something like that, any type of custom sharding, any function you can build on top of those values you can do with Vitess; it’s extensible.
Nikhil Krishna 01:02:38 Right. Okay. Awesome.
Deepthi Sigireddi 01:02:40 I think let’s talk about the rest of the asset, and then we can wrap up. We talked about atomocity, consistency, then isolation. So what is isolation? There are different levels of isolation that databases define, read uncommitted, read, committed, repeatable, read serializable. There are all these things. But in general what isolation means is that if a transaction is in progress and I’m reading the data, either I should see all effects of the transaction or none of the effects of the transaction. That’s what typically people want. So that’s not read uncommitted. That’s read committed. What happens in Vitess, if you are writing transactions in the multi-mode is that you don’t get the read committed isolation. What you get is sort of like read uncommitted, because you can see intermediate states of the distributed transaction. This people have started calling fractured reads. So, maybe in one shard, you see what the transaction wrote.
Deepthi Sigireddi 01:03:41 And from another shard, you see the state before the transaction. And there are now papers on how you can provide better guarantees around reads when you have a distributed transaction. So, some of that work we will probably do in the future; we are researching what will be a good model to provide. What sort of guarantees do we want to provide optionally? Because all of these things will slow things down. That’s isolation, and we’ll quickly talk about durability. So at a database level, durability basically means data is not going to get lost. If I told you that I accepted your data, then I cannot lose it. In the past, that meant writing to stay storage disc. Now we think that’s not sufficient because discs can also be lost. If you have 10,000 nodes, maybe one of them goes out once a year. Right? So that’s where the semi synchronous replication comes in. And we achieve durability through replication.
Nikhil Krishna 01:04:38 Right. Okay. So just moving on a little bit, I think it’s safe to kind of go through the, skip the considerations about the replication and stuff like that. I think we discussed that already, but there is one thing that I wanted kind of talk about, which is change data capture. So how does Vitess handle change data capture?
Deepthi Sigireddi 01:05:02 We have a feature in Vitess called V replication, and that is the basis for our re-sharding as well. And what that allows us to do is — because it’s very flexible in terms of what it can read. If you are doing re-sharding you want to copy all the data. So the query you give to V replication is select start, right? But you can select a subset of the columns, or you can perform some simple aggregations on columns and extract that as a stream from Vitess, and then you can send it to any of your applications that want to process those changes. Those events
Nikhil Krishna 01:05:43 Is this stream that you’re calling you call this, is that a continuous. . .
Deepthi Sigireddi 01:05:48 It doesn’t have be; it doesn’t have to be. So you can, say, start receiving the stream. You can stop and record what was the position that you got last. And then you can come back later and say, now, can you give me everything that changed after this position?
Nikhil Krishna 01:06:07 Ah, right. OK. But how do you actually get that position in a cluster? Because you might be actually having data in different data, in different shards. Right?
Deepthi Sigireddi 01:06:20 We have something called we GTID, which is Global Transaction ID, which contains that information. So it’ll say for this key space shard, this is the, MySQL GTID. For this other key space shard, this is the MySQL GTID. So this is like a distributed Global Transaction ID.
Nikhil Krishna 01:06:37 Nice. Okay, cool. So then I can use that, to say that this is the position that I was at, I want to move forward from there.
Deepthi Sigireddi 01:06:45 Right, right. And if you send it back to Vitess, Vitess knows how to interpret that and then start sending you the changes from those positions.
Nikhil Krishna 01:06:54 Right. So how does Vitess manage backups, logging, and the standard things that most SQL databases have to handle? Is there anything specific we have to do if it is a cluster?
Deepthi Sigireddi 01:07:11 Vitess has a built-in backup method where we just copy the files. But we also support Percon as extra backup. And typically anyone who’s running a Vitess cluster will take regular backups because if a replica goes down and you lose the disc, the way to bring it back is to restore from a backup point to the current primary, and then start replicating the Delta. Since the backup was taken. And binary logs become very big and start consuming a lot of disc space. So people purge them on a regular basis. And this allows you to recover failed replicas or add new replicas without storing all the binary logs from the beginning of time.
Nikhil Krishna 01:07:55 Right. In a reasonably large Vitess cluster, you probably have least 20, 30, maybe nodes, right? So, does Vitess kind of have just like your management topology, the client, does it have a client or a tool that we can use to know that, okay, I’ve completed the backups for X out of Y nodes, and I need to do the rest.
Deepthi Sigireddi 01:08:21 Okay. You can use the same Vitess client to list all the back-ups for a key space shard or all the backups for a key space and using that you can figure out, when was the last time I took a back-up for a particular shard? I don’t think we do a great job of showing progress while a backup is in progress. That is kind written just to the VT tablet log.
Nikhil Krishna 01:08:47 But you still know from the, from the topology that X out of Y tablets have been backed up. And what was the last time it was backed up?
Deepthi Sigireddi 01:08:57 Correct. Yeah. It’s possible to infer that this is a great point. These things can be improved.
Nikhil Krishna 01:09:04 We talked about binary logs and how they can become really big. In some architectures, basically, logging is kind of try to, they try to centralize logging. They send logs to a different place and stuff like that, right? Is there something like that here or is that still managed through MySQL standard?
Deepthi Sigireddi 01:09:22 Right now? It is still up to the operator of the Vitess cluster to manage these things, like setting the bin log retention period, and things like that. There are some thoughts of building a Vitess compatible binary log server so that all replicas can replicate from that. And that replicates from the primary that will reduce the amount of binary logs you have to keep. There are some thoughts around doing something like that, but we are not actually working on that right now.
Nikhil Krishna 01:09:55 So we talked a lot about the type of work and scaling that Vitess does. I’d also kind of like to get your viewpoint on what kind of scenarios is Vitess not suited for, right? So, it’s kind of like a negative thing, but obviously, every architecture has its pros and cons. There are certain things that’s not suited for. So, for what kind of architecture, what kind of solution I should not be looking at, but I should look at something else?
Deepthi Sigireddi 01:10:28 So analytics, or all app workloads, is one thing that, in my opinion, relational databases, the row-based ones are not very well suited for; column-based databases are much better suited for analytics workloads. So, it may not be a great idea to use Vitess if what you’re trying to do is data warehousing.
Nikhil Krishna 01:10:48 OK. Any final thoughts that you might want to mention that I missed in talking about Vitess? With you just generally if you kind of want to follow out?
Deepthi Sigireddi 01:11:00 I think one thing that is pretty much unique about Vitess is that a) your sharding scheme is flexible and different tables can have different sharding schemes. This other distributed databases do provide, but you can go from unsharded to sharded and back from sharded to unsharded. So, you can merge shards and you can even do M to N. So let’s say you have three shards and you want to go to eight, or you have eight shards, and you want to combine them into three because you overprovisioned when you split up your key spaces and this particular key space is not getting that much traffic, or whatever reason, right? The other thing you can do is you can change your mind about your sharding key. There is a cost, which is you have to provision additional hardware and copy everything over into your new sharding scheme, but you can say, well I thought that I’m a multi-tenant system and tenant ID would be a great thing to shard on, but look, I have these huge tenants and I have these tiny tenants and that’s not a good data distribution. So I’m actually going to change my mind and shard it by, I don’t know, user ID, or message ID, or some other transaction ID, right? That is possible. You can do that in Vitess. In most systems, once you’ve made your sharding decision, you cannot go back.
Nikhil Krishna 01:12:20 Awesome. Thank you so much Deepthi for spending above and beyond with me and going so deep into Vitess. I’m sure our audience would be very interested to know how to contact you, or if where to kind find you and follow you.
Deepthi Sigireddi 01:12:36 I’m on LinkedIn, I’m on Twitter. Do join our Vitess Slack; I’m usually in there answering questions. Visit the Vitess website. We have some pretty decent examples to get people started off. Visit the Planet Scale website, and you can reach me on any of these social media spaces.
Nikhil Krishna 01:12:59 Awesome. And I’ll put your Twitter and your LinkedIn links in the show notes so that we can reach out to y. Thank you so much Deepthi, have a nice day.
Deepthi Sigireddi 01:13:10 Thank you, Nikhil. This was really enjoyable, and I appreciate the opportunity.
[End of Audio]