ScyllaDB Installation for High-Performance NoSQL
ScyllaDB is a Cassandra-compatible NoSQL database rebuilt in C++ with a shard-per-core architecture that eliminates GC pauses and delivers consistent low-latency performance at high throughput. This guide covers installing ScyllaDB on Linux, setting up a cluster, using the CQL interface, monitoring, and migrating from Apache Cassandra.
Prerequisites
- Ubuntu 20.04/22.04 or CentOS 8/Rocky Linux 8+
- Minimum 8 GB RAM (16+ GB recommended)
- Multi-core CPU (ScyllaDB scales linearly with cores)
- SSD storage required (NVMe preferred)
- For cluster: 3+ nodes with low-latency network
- Root or sudo access
Install ScyllaDB
Ubuntu/Debian:
# Add ScyllaDB repository
sudo apt install -y curl gnupg
curl -sSL https://downloads.scylladb.com/downloads/scylla/ubuntu/scylla.key | sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/scylla.gpg
# For the latest stable release (check https://www.scylladb.com/download/ for current version)
SCYLLA_VERSION="6.0"
echo "deb [signed-by=/etc/apt/trusted.gpg.d/scylla.gpg] https://downloads.scylladb.com/downloads/scylla/ubuntu scylladb-${SCYLLA_VERSION}/ubuntu focal main" \
| sudo tee /etc/apt/sources.list.d/scylla.list
sudo apt update
sudo apt install -y scylla
# Run ScyllaDB setup script (IMPORTANT - optimizes the system)
sudo scylla_setup
# During setup, it will ask about:
# - Developer mode (say No for production)
# - NTP configuration
# - RAID setup (if using multiple disks)
# - Disk/CPU optimization
sudo systemctl enable scylla-server
sudo systemctl start scylla-server
CentOS/Rocky Linux:
sudo rpm --import https://downloads.scylladb.com/downloads/scylla/rpm/unstable/centos/scylladb.key
cat > /etc/yum.repos.d/scylla.repo << 'EOF'
[ScyllaDB]
name=ScyllaDB
baseurl=https://downloads.scylladb.com/downloads/scylla/rpm/centos/scylladb-6.0/x86_64/
enabled=1
gpgcheck=1
gpgkey=https://downloads.scylladb.com/downloads/scylla/rpm/unstable/centos/scylladb.key
EOF
sudo dnf install -y scylla
sudo scylla_setup
sudo systemctl enable --now scylla-server
Verify the installation:
# Wait for ScyllaDB to start (30-60 seconds)
sudo journalctl -u scylla-server -f
# Check node status
nodetool status
# Should show: UN (Up Normal) for the local node
# Connect with cqlsh
cqlsh localhost 9042
Initial Configuration and Tuning
ScyllaDB's main configuration is in /etc/scylla/scylla.yaml:
sudo nano /etc/scylla/scylla.yaml
Key settings:
# /etc/scylla/scylla.yaml
# Cluster name (must match across all nodes in the cluster)
cluster_name: 'MyScyllaCluster'
# Listen address (this node's IP)
listen_address: 192.168.1.10
# Broadcast address (for clients)
rpc_address: 0.0.0.0
broadcast_rpc_address: 192.168.1.10
# Seed nodes (IPs of seed nodes for cluster discovery)
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "192.168.1.10,192.168.1.11"
# Data storage
data_file_directories:
- /var/lib/scylla/data
commitlog_directory: /var/lib/scylla/commitlog
# Authentication
authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer
# Snitch for datacenter/rack topology
endpoint_snitch: GossipingPropertyFileSnitch
Run ScyllaDB tuning tools:
# Setup system tuning (run once on each node)
sudo scylla_io_setup # Optimizes I/O scheduler
sudo scylla_cpuset_conf # Configures CPU isolation
sudo scylla_ntp_setup # Configures NTP
# Check current configuration
sudo scylla_dev_mode_setup # Enable dev mode for testing (no tuning required)
Cluster Setup
# Node 1 (192.168.1.10) - first seed node
# scylla.yaml already configured above
# Node 2 (192.168.1.11)
sudo nano /etc/scylla/scylla.yaml
# Set:
# listen_address: 192.168.1.11
# broadcast_rpc_address: 192.168.1.11
# seeds: "192.168.1.10,192.168.1.11" (same seed list)
# Node 3 (192.168.1.12)
sudo nano /etc/scylla/scylla.yaml
# Set:
# listen_address: 192.168.1.12
# broadcast_rpc_address: 192.168.1.12
# seeds: "192.168.1.10,192.168.1.11"
# Start nodes one at a time, starting with the seeds
sudo systemctl start scylla-server # Start on node 1 first
# Wait for node 1 to be UP, then start node 2
nodetool status # Check from node 1
sudo systemctl start scylla-server # On node 2
# Then node 3
sudo systemctl start scylla-server # On node 3
# Verify cluster from any node
nodetool status
# Expected output:
# Datacenter: datacenter1
# Status=Up/Down |/ State=Normal/Leaving/Joining/Moving
# UN 192.168.1.10 ...
# UN 192.168.1.11 ...
# UN 192.168.1.12 ...
# Check token distribution
nodetool ring
Open required ports:
# ScyllaDB requires these ports
sudo ufw allow 7000/tcp # Inter-node communication
sudo ufw allow 7001/tcp # TLS inter-node
sudo ufw allow 9042/tcp # CQL client port
sudo ufw allow 9160/tcp # Thrift (legacy)
sudo ufw allow 10000/tcp # REST API
sudo ufw allow 9180/tcp # Prometheus metrics
CQL Interface and Data Modeling
# Connect to ScyllaDB
cqlsh 192.168.1.10 9042 -u cassandra -p cassandra
# Change default password immediately
ALTER USER cassandra WITH PASSWORD 'newstrongpassword';
CREATE USER appuser WITH PASSWORD 'apppassword' NOSUPERUSER;
-- Create a keyspace with replication
CREATE KEYSPACE IF NOT EXISTS myapp
WITH replication = {
'class': 'NetworkTopologyStrategy',
'datacenter1': 3 -- 3 replicas across datacenter1
};
USE myapp;
-- Create a time-series table (optimized for ScyllaDB)
CREATE TABLE IF NOT EXISTS metrics (
host TEXT,
ts TIMESTAMP,
cpu_usage FLOAT,
memory_mb INT,
disk_io_mb FLOAT,
PRIMARY KEY ((host), ts) -- host is partition key, ts is clustering key
) WITH CLUSTERING ORDER BY (ts DESC)
AND compaction = {
'class': 'TimeWindowCompactionStrategy',
'compaction_window_unit': 'HOURS',
'compaction_window_size': '1'
};
-- Insert data
INSERT INTO metrics (host, ts, cpu_usage, memory_mb)
VALUES ('web01', toTimestamp(now()), 45.2, 2048);
-- Query with time range
SELECT * FROM metrics
WHERE host = 'web01'
AND ts > '2024-01-01 00:00:00'
AND ts < '2024-01-02 00:00:00'
LIMIT 100;
-- Create a wide-row table for IoT data
CREATE TABLE IF NOT EXISTS sensor_data (
device_id UUID,
reading_date DATE, -- partition by date for time bucketing
reading_time TIMESTAMP,
value DOUBLE,
unit TEXT,
PRIMARY KEY ((device_id, reading_date), reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC);
-- Create secondary index (use sparingly in ScyllaDB)
CREATE INDEX ON metrics (cpu_usage);
-- Materialized view for alternative access patterns
CREATE MATERIALIZED VIEW metrics_by_cpu AS
SELECT host, ts, cpu_usage
FROM metrics
WHERE cpu_usage IS NOT NULL AND ts IS NOT NULL
PRIMARY KEY ((cpu_usage), ts, host)
WITH CLUSTERING ORDER BY (ts DESC);
Shard-per-Core Architecture
ScyllaDB's shard-per-core model is its key differentiator from Cassandra:
# Each CPU core handles a dedicated shard with its own memory
# This eliminates cross-core coordination and GC pauses
# Check shard count (equals number of CPU cores ScyllaDB uses)
nodetool info | grep "Shards"
# Each shard handles a portion of the token range
# Client drivers (like Datastax driver v4) are "shard-aware"
# and route requests directly to the correct shard
# View per-shard statistics
curl http://localhost:10000/metrics | grep "shard"
# Check CPU utilization per shard
curl -s http://localhost:9180/metrics | grep cpu_utilization | head -20
# Configure number of shards (usually = nproc)
# In scylla.yaml: smp: 8 # use 8 CPU cores (0 = auto-detect)
Use a shard-aware driver in applications:
# pip install cassandra-driver
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra.policies import TokenAwarePolicy, DCAwareRoundRobinPolicy
# ScyllaDB-specific: use TokenAwarePolicy for shard-aware routing
auth_provider = PlainTextAuthProvider(username='appuser', password='apppassword')
cluster = Cluster(
contact_points=['192.168.1.10', '192.168.1.11', '192.168.1.12'],
port=9042,
auth_provider=auth_provider,
load_balancing_policy=TokenAwarePolicy(DCAwareRoundRobinPolicy(local_dc='datacenter1')),
connect_timeout=10,
protocol_version=4
)
session = cluster.connect('myapp')
rows = session.execute("SELECT * FROM metrics WHERE host = 'web01' LIMIT 10")
for row in rows:
print(row)
cluster.shutdown()
Monitoring ScyllaDB
# ScyllaDB exposes Prometheus metrics on port 9180
curl http://localhost:9180/metrics | grep -E "^scylla" | head -30
# Key metrics to watch:
# scylla_scheduler_runtime_ms - per-shard CPU usage
# scylla_storage_proxy_write_unavailable - write errors
# scylla_storage_proxy_read_unavailable - read errors
# scylla_io_queue_delay - disk I/O latency
# Nodetool commands for cluster health
nodetool status # Node up/down status
nodetool tpstats # Thread pool statistics
nodetool compactionstats # Active compactions
nodetool tablestats myapp # Per-table statistics
nodetool cfstats myapp.metrics # Specific table stats
# Check read/write latency
nodetool tablehistograms myapp metrics
Migrate from Cassandra
# Method 1: sstableloader (for offline migration)
# Export from Cassandra
nodetool snapshot myapp
# Snapshot stored at: /var/lib/cassandra/data/myapp/<table>/snapshots/<name>/
# Load into ScyllaDB
sstableloader -d 192.168.1.10 \
/var/lib/cassandra/data/myapp/metrics-abcdef123456/snapshots/snap1/
# Method 2: COPY command for smaller datasets
# Export from Cassandra
cqlsh cassandra-host -e "COPY myapp.metrics TO '/tmp/metrics.csv' WITH HEADER=TRUE;"
# Import into ScyllaDB
cqlsh scylla-host -e "COPY myapp.metrics FROM '/tmp/metrics.csv' WITH HEADER=TRUE;"
# Method 3: Dual-write migration
# 1. Write to both Cassandra and ScyllaDB
# 2. Backfill historical data with sstableloader
# 3. Switch reads to ScyllaDB
# 4. Stop writing to Cassandra
Troubleshooting
Node won't join cluster (stuck in JoiningState):
# Check logs
sudo journalctl -u scylla-server -n 100 | grep ERROR
# Verify seeds are reachable
nc -zv 192.168.1.10 7000
# Check listen_address is correct (not 127.0.0.1)
grep listen_address /etc/scylla/scylla.yaml
# Clear token state and rejoin (use only if necessary)
nodetool removenode <node-id>
High read/write latency:
# Check if disk is the bottleneck
iostat -x 1 5
# Ensure ScyllaDB has I/O scheduler configured correctly
cat /sys/block/sda/queue/scheduler # Should be "none" or "noop" for SSDs
# Check for compaction pressure
nodetool compactionstats
# Increase concurrent reads/writes if CPU bound
# In scylla.yaml:
# concurrent_reads: 32 (default: 32, increase to 64 if needed)
# concurrent_writes: 32
CQL authentication error:
# If locked out of default account
sudo systemctl stop scylla-server
# Disable auth temporarily
# In scylla.yaml: authenticator: AllowAllAuthenticator
sudo systemctl start scylla-server
cqlsh localhost -e "ALTER USER cassandra WITH PASSWORD 'newpassword';"
# Re-enable auth
# authenticator: PasswordAuthenticator
sudo systemctl restart scylla-server
Conclusion
ScyllaDB's shard-per-core architecture delivers consistent low latency and high throughput that scales linearly with CPU cores, making it an excellent choice for high-performance NoSQL workloads on modern multi-core servers. Its Cassandra compatibility means existing CQL schemas and drivers work without modification. For production deployments, run the scylla_setup scripts on each node, use NVMe storage, and enable shard-aware routing in your client driver for optimal performance.


