Vitess for MySQL Horizontal Scaling
Vitess is a cloud-native database clustering system for MySQL that provides horizontal sharding, connection pooling, and query routing to scale MySQL to web-scale workloads. Originally built at YouTube and used by companies like Slack and GitHub, Vitess enables transparent sharding with minimal application changes through its VTGate query router and VTTablet sidecar architecture.
Prerequisites
- Kubernetes cluster (recommended) or bare Linux servers
- MySQL 8.0 installed on tablet nodes
- etcd cluster for topology storage
vtctldclientCLI tool- Minimum 3 nodes for a basic sharded setup
Vitess Architecture Overview
Key components in a Vitess deployment:
- VTTablet: A sidecar process running alongside each MySQL instance that manages replication and handles query routing
- VTGate: A stateless proxy that routes queries to the correct shard based on the sharding key
- VTCtld: The topology management server that stores cluster state in etcd
- etcd: Stores topology information (which shards exist, which tablets serve them)
Installing Vitess
# Download Vitess binaries
VITESS_VERSION=v19.0.0
OS=linux
ARCH=amd64
wget https://github.com/vitessio/vitess/releases/download/${VITESS_VERSION}/vitess-${VITESS_VERSION}-${OS}-${ARCH}.tar.gz
tar xzf vitess-${VITESS_VERSION}-${OS}-${ARCH}.tar.gz
sudo mv vitess-${VITESS_VERSION}-${OS}-${ARCH}/bin/* /usr/local/bin/
# Verify installation
vtctldclient --version
vttablet --version
vtgate --version
# Install via Helm on Kubernetes (recommended for production)
helm repo add vitess https://vitessio.github.io/vitess
helm repo update
# Install Vitess operator
helm install vitess vitess/vitess \
--namespace vitess \
--create-namespace
# Verify operator
kubectl -n vitess get pods
Set up etcd for topology storage:
# Install etcd (Ubuntu/Debian)
sudo apt-get install -y etcd
# Configure etcd for Vitess topology
cat > /etc/etcd/etcd.conf <<EOF
name: etcd-vitess-01
data-dir: /var/lib/etcd
listen-client-urls: http://0.0.0.0:2379
advertise-client-urls: http://10.0.0.11:2379
listen-peer-urls: http://0.0.0.0:2380
initial-advertise-peer-urls: http://10.0.0.11:2380
initial-cluster: etcd-vitess-01=http://10.0.0.11:2380,etcd-vitess-02=http://10.0.0.12:2380,etcd-vitess-03=http://10.0.0.13:2380
initial-cluster-state: new
EOF
sudo systemctl enable --now etcd
VTTablet and MySQL Setup
Configure VTTablet alongside MySQL on each database node:
# Create MySQL user for Vitess
mysql -u root -p <<EOF
CREATE USER 'vt_dba'@'localhost' IDENTIFIED BY 'vt-dba-password';
GRANT ALL PRIVILEGES ON *.* TO 'vt_dba'@'localhost' WITH GRANT OPTION;
CREATE USER 'vt_repl'@'%' IDENTIFIED BY 'vt-repl-password';
GRANT REPLICATION SLAVE ON *.* TO 'vt_repl'@'%';
CREATE USER 'vt_filtered'@'%' IDENTIFIED BY 'vt-filtered-password';
GRANT SELECT ON *.* TO 'vt_filtered'@'%';
CREATE USER 'vt_app'@'%' IDENTIFIED BY 'vt-app-password';
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER,
CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, CREATE VIEW, SHOW VIEW,
CREATE ROUTINE, ALTER ROUTINE, EVENT, TRIGGER ON *.* TO 'vt_app'@'%';
FLUSH PRIVILEGES;
EOF
# Configure MySQL for Vitess (add to my.cnf)
cat >> /etc/mysql/mysql.conf.d/mysqld.cnf <<EOF
# Vitess required settings
server-id = 1 # Unique per node
log-bin = mysql-bin
binlog-format = ROW
gtid_mode = ON
enforce_gtid_consistency = ON
log_slave_updates = ON
expire_logs_days = 3
binlog_row_image = FULL
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
EOF
sudo systemctl restart mysql
# Start VTTablet (keyspace=commerce, shard=0, tablet type=primary)
mkdir -p /vt/vtdataroot
vttablet \
--topo_implementation=etcd2 \
--topo_global_server_address=etcd-01:2379,etcd-02:2379,etcd-03:2379 \
--topo_global_root=/vitess/global \
--tablet-path=zone1-0000000100 \
--init_keyspace=commerce \
--init_shard=0 \
--init_tablet_type=replica \
--port=15100 \
--grpc_port=16100 \
--service_map='grpc-queryservice,grpc-tabletmanager,grpc-updatestream' \
--vtctld_addr=http://vtctld:15000 \
--db_dba_user=vt_dba \
--db_dba_password=vt-dba-password \
--db_repl_user=vt_repl \
--db_repl_password=vt-repl-password \
--db_filtered_user=vt_filtered \
--db_filtered_password=vt-filtered-password \
--db_app_user=vt_app \
--db_app_password=vt-app-password \
--mycnf_mysql_port=3306 &
VTGate Configuration
VTGate is the SQL proxy that routes queries to the correct shard:
# Start VTGate
vtgate \
--topo_implementation=etcd2 \
--topo_global_server_address=etcd-01:2379,etcd-02:2379,etcd-03:2379 \
--topo_global_root=/vitess/global \
--gateway_initial_tablet_timeout=20s \
--port=15001 \
--grpc_port=15991 \
--mysql_server_port=3306 \
--mysql_auth_server_impl=none \
--cell=zone1 \
--cells_to_watch=zone1 \
--tablet_types_to_wait=PRIMARY,REPLICA \
--service_map='grpc-vtgateservice' \
--log_dir=/vt/vtdataroot &
# Applications connect to VTGate on port 3306
# VTGate presents as a MySQL server
mysql -h vtgate-host -P 3306 -u vt_app -pvt-app-password commerce
# Create a VSchema (Vitess schema that defines sharding)
vtctldclient --server vtctld:15999 \
ApplyVSchema \
--vschema='
{
"sharded": false,
"tables": {
"users": {},
"orders": {},
"products": {}
}
}' \
commerce
Sharding Strategies
Configure sharding for horizontal scale:
# Hash-based sharding (most common)
vtctldclient --server vtctld:15999 \
ApplyVSchema \
--vschema='
{
"sharded": true,
"vindexes": {
"hash": {
"type": "hash"
},
"unicode_loose_md5": {
"type": "unicode_loose_md5"
}
},
"tables": {
"users": {
"columnVindexes": [
{
"column": "user_id",
"name": "hash"
}
]
},
"orders": {
"columnVindexes": [
{
"column": "user_id",
"name": "hash"
}
],
"autoIncrement": {
"column": "order_id",
"sequence": "order_id_seq"
}
}
}
}' \
commerce
# Create sequence tables for auto-increment across shards
mysql -h vtgate-host -P 3306 -u vt_app commerce <<EOF
-- Sequences must be in an unsharded keyspace
CREATE TABLE order_id_seq (
id INT,
next_id BIGINT,
cache BIGINT,
increment BIGINT,
minimum BIGINT,
maximum BIGINT,
start BIGINT,
cycle TINYINT,
PRIMARY KEY (id)
) COMMENT 'vitess_sequence';
INSERT INTO order_id_seq (id, next_id, cache) VALUES (0, 1, 1000);
EOF
# Range-based sharding (for time-series or ordered data)
# Shard -80 holds keyspace IDs 0x00 to 0x80
# Shard 80- holds keyspace IDs 0x80 to 0xFF
vtctldclient --server vtctld:15999 \
CreateShard commerce/-80
vtctldclient --server vtctld:15999 \
CreateShard commerce/80-
Schema Management
Use Vitess's online schema changes (avoiding downtime):
# Apply schema changes using gh-ost or pt-online-schema-change
vtctldclient --server vtctld:15999 \
ApplySchema \
--ddl_strategy='vitess' \
--sql="ALTER TABLE orders ADD COLUMN shipped_at TIMESTAMP NULL" \
commerce
# Monitor schema migration progress
vtctldclient --server vtctld:15999 \
GetSchemaMigrations \
--recent=1h \
commerce
# Show schema migration details
vtctldclient --server vtctld:15999 \
ShowSchemaMigration \
--uuid=<migration-uuid> \
commerce
# Revert a failed migration
vtctldclient --server vtctld:15999 \
RevertSchemaMigration \
--uuid=<migration-uuid> \
commerce
# Apply schema to all shards at once
vtctldclient --server vtctld:15999 \
ApplySchema \
--ddl_strategy='direct' \
--sql="CREATE INDEX idx_orders_user_id ON orders(user_id)" \
commerce
Resharding Operations
Resharding splits an existing shard into two as data grows:
# Current: Single shard (0)
# Target: Two shards (-80 and 80-)
# Step 1: Create target shards and tablets
vtctldclient --server vtctld:15999 CreateShard commerce/-80
vtctldclient --server vtctld:15999 CreateShard commerce/80-
# Start tablet processes for new shards (repeat VTTablet start command
# with different --init_shard values: -80 and 80-)
# Step 2: Copy data from source to target shards
vtctldclient --server vtctld:15999 \
Reshard \
--workflow=reshard_commerce \
--target_shards='-80,80-' \
--source_shards='0' \
Create \
commerce
# Step 3: Monitor copy progress
vtctldclient --server vtctld:15999 \
Reshard \
--workflow=reshard_commerce \
Progress \
commerce
# Step 4: Verify data integrity
vtctldclient --server vtctld:15999 \
Reshard \
--workflow=reshard_commerce \
SwitchTraffic \
--tablet_types=RDONLY \
commerce
# Verify on RDONLY tablets first, then switch read traffic
vtctldclient --server vtctld:15999 \
Reshard \
--workflow=reshard_commerce \
SwitchTraffic \
--tablet_types=REPLICA \
commerce
# Final: Switch primary (write) traffic
vtctldclient --server vtctld:15999 \
Reshard \
--workflow=reshard_commerce \
SwitchTraffic \
--tablet_types=PRIMARY \
commerce
# Clean up source shard after verification
vtctldclient --server vtctld:15999 \
Reshard \
--workflow=reshard_commerce \
Complete \
commerce
Troubleshooting
VTGate can't find tablets:
# Check tablet registration in topology
vtctldclient --server vtctld:15999 GetTablet zone1-0000000100
# List all tablets
vtctldclient --server vtctld:15999 GetTablets --cell=zone1
# Check VTTablet health
curl http://vttablet-host:15100/healthz
# Verify etcd is accessible
etcdctl --endpoints=etcd-01:2379 get /vitess/global/keyspaces --prefix --keys-only
Cross-shard queries failing:
# Vitess may refuse scatter queries for safety
# Check vtgate logs
journalctl -u vtgate -n 50
# Allow scatter queries (use with caution)
# In vtgate startup: --allow_scatter_queries=true
# Or use the @primary or @replica routing hints in queries
# SELECT * FROM orders WHERE user_id IN (1,2,3) -- May scatter
# Use single-shard queries when possible
Resharding stuck:
# Check workflow status
vtctldclient --server vtctld:15999 \
Reshard --workflow=reshard_commerce Status commerce
# Check replication lag on target tablets
vtctldclient --server vtctld:15999 \
GetTablets --tablet_type=PRIMARY
# If stuck, cancel and restart
vtctldclient --server vtctld:15999 \
Reshard --workflow=reshard_commerce Cancel commerce
Conclusion
Vitess provides a battle-tested path to horizontal MySQL scaling that maintains MySQL wire protocol compatibility, allowing most applications to connect without code changes. By routing queries through VTGate and managing shards with VTTablet, Vitess handles resharding, failover, and schema changes transparently. Start with a single shard to validate the setup, then progressively shard tables with high cardinality as data grows, using Vitess's built-in resharding workflows to split shards online with zero downtime.


