Announce Bytebase 1.0 and our Team Plan view the plan
Tianzhou: Hey Danny, I don't think SQLite is good for our long-term game. I started with it because it is easy to use for local development.
Danny: We're launching Bytebase public in a couple of weeks. We should make a decision right now. I'm with you, SQLite is charming but it's a database for localhost only. I'm not seeing it in the architecture picture of Bytebase as a Cloud product. Some users want to run Bytebase container on Heroku, Render, and Google Cloud Serverless with persistent storage but they cannot do it today with SQLite.
Tianzhou: Right, and some features in the plan require time-series which isn't supported by SQLite.
Danny: SQLite schema update has limitations too. For example, altering column constraints isn't supported, and many other things. This makes it difficult to upgrade Bytebase version with schema changes later.
Tianzhou: It will be difficult to migrate existing customers later.
Danny: Yes, let's do this! How about PostgreSQL? I know we both like it.
Tianzhou: That sounds good. By the way, I want to provide the same one-click installation user experience today without requiring users to install and connect to PostgreSQL themselves.
Danny: We can either package PostgreSQL into our binary or build it into our Docker image. I prefer the former because not all users will use Docker. Hey, I found another reason to use PostgreSQL instead of MySQL. The compressed PostgreSQL package for Linux is about 13MB (Darwin 27MB), however, the MySQL package is 200+MB. I just wrote a 36-line unit test to simulate what I just talked about and it seems to work!
Tianzhou: Sweet. Oh, you used Go embed to stuff the package.
Both: We went down the street to stuff our stomachs first.
I wrote down the following lines in a note because it is so important to build a plan for any engineering project. This is the exact recipe that Google uses to migrate storage systems except we don't have to migrate existing data this time luckily.
We started with packaging both PostgreSQL Linux and Darwin packages using Go embedded. When Bytebase starts, we will unzip the package, run initdb to set up the database if not yet, then run the PostgreSQL server. There is a lot of optimization that we did later including,
We had a complex SQLite schema with 1K+ lines. To find the PostgreSQL equivalent schema, we execute the SQLite DDLs against PostgreSQL and resolve failures one by one. Here are some noticeable differences between them.
Then it is a lot of engineering work to update existing SQL queries. SQLite (similar to MySQL) dialect uses the question mark parameters while PostgreSQL uses positioned parameters for prepared statements. Because we break down into multiple smaller changes, the changes for table writes should be implemented in topological sorting order if there are foreign keys. For example, if the author table is referenced by the book table, we should perform dual-writes to the author table first in a change, then do dual-writes to the book table as a follow-up change.
Mission completed! Real word navigation has a different feeling. After spending a couple of days in a hotel room, I decided to hang out with friends and visit some popular shops in Downtown Shanghai together. Yup, we got lost on the street constantly without having a written-down plan.
SQLite is still good for many on-device use cases while PostgreSQL is more ubiquitous. As an extra story, Bytebase does support external database options later as an open-source community contribution. This enables customers to run Bytebase in a k8s cluster container with persistent database storage elsewhere.
As part of the effort, we are really following Life of a Feature established at Bytebase with influences from Google's engineering best practices. We'd like to highlight these points which really helped to make our database migration go smoothly.
Bytebase offers a web-based collaboration workspace to help DBAs and Developers manage the lifecycle of application database schemas. People might be curious how Bytebase manages its own database schema. We will chat about it next time.