Apache Superset Data Visualization Installation

Apache Superset is an open-source data exploration and visualization platform that connects to dozens of databases and lets you build interactive dashboards without writing code. This guide covers installing Superset on Linux using Docker Compose, configuring database connections, creating charts and dashboards, using SQL Lab, and managing user roles.

Prerequisites

  • Ubuntu 20.04+ or CentOS 8+ / Rocky Linux 8+
  • Docker and Docker Compose, or Python 3.9+
  • 4 GB RAM minimum (8 GB recommended for production)
  • A target database to visualize (PostgreSQL, MySQL, Trino, etc.)

Installing Superset with Docker Compose

The Docker Compose approach is the recommended path for production:

# Clone the Superset repository
git clone https://github.com/apache/superset.git
cd superset

# Check out the latest stable release tag
git checkout $(git tag | grep -E '^[0-9]+\.[0-9]+\.[0-9]+$' | sort -V | tail -1)

# Copy example environment file
cp docker/.env-non-dev docker/.env

# Edit the environment file - set a strong SECRET_KEY
nano docker/.env
# Change: SECRET_KEY=your_very_long_random_secret_key_here
# Generate one with: openssl rand -base64 42

# Start Superset (first start downloads images and runs migrations - takes ~5 min)
docker compose -f docker-compose-non-dev.yml up -d

# Check status
docker compose -f docker-compose-non-dev.yml ps

# Default admin credentials: admin / general
# Access at http://your-server:8088

Installing Superset with pip

For a non-Docker installation on Ubuntu:

# Install system dependencies
sudo apt-get update
sudo apt-get install -y build-essential libssl-dev libffi-dev python3-dev \
    python3-pip libsasl2-dev libldap2-dev default-libmysqlclient-dev

# Create virtual environment
python3 -m venv /opt/superset-venv
source /opt/superset-venv/bin/activate

# Install Superset
pip install apache-superset

# Install database drivers (add what you need)
pip install psycopg2-binary  # PostgreSQL
pip install mysqlclient       # MySQL
pip install pydruid           # Druid

# Set environment variables
export FLASK_APP=superset
export SECRET_KEY=$(openssl rand -base64 42)

# Initialize the database
superset db upgrade

# Create admin user
superset fab create-admin \
    --username admin \
    --firstname Admin \
    --lastname User \
    --email [email protected] \
    --password adminpassword

# Load example data (optional)
superset load_examples

# Initialize default roles and permissions
superset init

# Start the development server (use gunicorn for production)
superset run -p 8088 --with-threads --reload --debugger

For production, run with Gunicorn:

# Install gunicorn and celery
pip install gunicorn celery redis

# Start with gunicorn
gunicorn \
    --bind 0.0.0.0:8088 \
    --workers 4 \
    --timeout 120 \
    --limit-request-line 0 \
    --limit-request-field_size 0 \
    "superset.app:create_app()"

Connecting Databases

Superset supports 40+ databases through SQLAlchemy. To add a database:

  1. Go to SettingsDatabase Connections+ Database
  2. Select your database type from the dropdown
  3. Enter the SQLAlchemy URI or use the form fields

Common connection strings:

# PostgreSQL
postgresql+psycopg2://user:password@host:5432/dbname

# MySQL
mysql+mysqlconnector://user:password@host:3306/dbname

# SQLite (for testing)
sqlite:////path/to/database.db

# Amazon Redshift
redshift+psycopg2://user:password@host:5439/dbname

# BigQuery (requires google-cloud-bigquery pip package)
bigquery://project-id

Enable Allow DML and Expose in SQL Lab as needed, then click Test Connection before saving.

Creating Charts and Dashboards

Create a chart:

  1. Go to Charts+ Chart
  2. Choose your dataset (table or saved SQL query)
  3. Select a chart type (Bar Chart, Line Chart, Table, Map, etc.)
  4. Configure the chart in the Data tab:
    • Set Dimensions (X-axis / group by)
    • Set Metrics (COUNT, SUM, AVG, etc.)
    • Add Filters as needed
  5. Customize in the Customize tab (colors, labels, legends)
  6. Click Save

Create a dashboard:

  1. Go to Dashboards+ Dashboard
  2. Name your dashboard
  3. Click Edit dashboard
  4. Drag charts from the right panel onto the canvas
  5. Resize and rearrange cards
  6. Add Filters using the filter icon to link charts
  7. Click Save

For cross-filtering (click one chart to filter others), enable it in Dashboard propertiesCross-filtering.

Using SQL Lab

SQL Lab is Superset's SQL IDE for ad-hoc analysis:

  1. Go to SQLSQL Lab
  2. Select a database and schema from the dropdowns
  3. Write your query:
-- Example: cohort analysis
SELECT
    date_trunc('week', first_order_date)::date AS cohort_week,
    count(DISTINCT customer_id) AS cohort_size,
    sum(revenue) AS total_revenue
FROM (
    SELECT
        customer_id,
        min(created_at) AS first_order_date,
        sum(amount) AS revenue
    FROM orders
    WHERE created_at >= '2024-01-01'
    GROUP BY customer_id
) sub
GROUP BY 1
ORDER BY 1;
  1. Press Ctrl+Enter or click Run
  2. Click Save to save as a query or Explore to create a chart from results
  3. Use Create dataset to make results available as a reusable dataset

Query history is saved automatically. Use Search to find previous queries.

Caching Configuration

Configure Redis caching to avoid re-running expensive queries:

# superset_config.py (mount into container or set in config)
from cachelib.redis import RedisCache

CACHE_CONFIG = {
    "CACHE_TYPE": "RedisCache",
    "CACHE_DEFAULT_TIMEOUT": 300,  # 5 minutes
    "CACHE_KEY_PREFIX": "superset_",
    "CACHE_REDIS_URL": "redis://redis:6379/0",
}

DATA_CACHE_CONFIG = {
    "CACHE_TYPE": "RedisCache",
    "CACHE_DEFAULT_TIMEOUT": 3600,  # 1 hour for query results
    "CACHE_KEY_PREFIX": "superset_data_",
    "CACHE_REDIS_URL": "redis://redis:6379/1",
}

# Async query execution via Celery
RESULTS_BACKEND = RedisCache(
    host="redis",
    port=6379,
    key_prefix="superset_results_"
)

Set chart-level cache timeout in chart settings under the Data tab.

User Roles and Row-Level Security

Superset uses Flask-AppBuilder roles:

RoleAccess Level
AdminFull access
AlphaCan create charts/dashboards, manage own data
GammaView-only, sees what's explicitly granted
PublicAnonymous access (if enabled)

Create a custom role:

  1. SettingsList Roles+
  2. Name the role and add specific permissions

Row-Level Security (RLS) restricts which rows users see:

  1. SecurityRow Level Security
  2. Click + and configure:
    • Table: the dataset to restrict
    • Roles: who the filter applies to
    • Group Key: optional grouping
    • Clause: SQL WHERE clause fragment
-- Example RLS clause: users only see their department's data
department = '{{current_username}}'

-- Or use a lookup table
region IN (SELECT region FROM user_regions WHERE username = '{{current_username}}')

Troubleshooting

Superset container keeps restarting:

docker compose -f docker-compose-non-dev.yml logs superset_app | tail -30
# Common cause: wrong SECRET_KEY format or missing DB migration

Database connection fails:

# Test the SQLAlchemy URI directly
docker exec -it superset_app python3 -c "
from sqlalchemy import create_engine
e = create_engine('postgresql+psycopg2://user:pass@host/db')
print(e.connect())
"

Charts load slowly:

  • Enable Redis caching (see above)
  • Set an appropriate Cache Timeout on the dataset
  • Add indexes to your database on GROUP BY and WHERE columns
  • Use Async execution for long queries

"Unknown database" error after adding driver:

# Rebuild container with new pip packages
docker compose -f docker-compose-non-dev.yml build --no-cache superset

Celery workers not processing async queries:

docker compose -f docker-compose-non-dev.yml logs superset_worker
# Ensure Redis is running and BROKER_URL is correct

Conclusion

Apache Superset delivers a full-featured self-hosted BI platform with SQL Lab for ad-hoc analysis, a rich chart library, and granular role-based access control. Docker Compose is the easiest path to a production deployment, and Redis caching keeps dashboards responsive even against large datasets. With row-level security, you can safely expose the same dashboard to users who should only see their own data slice.