Sonic Lightweight Search Engine Installation

Sonic is a fast, schema-less search backend written in Rust that stores a search index on disk and exposes a simple line-based protocol for indexing and querying text. This guide covers installing Sonic on Linux, using the channel protocol, ingesting data, searching and suggesting terms, language detection, and integrating Sonic with your application.

Prerequisites

  • Ubuntu 20.04+ / Debian 11+ or CentOS 8+ / Rocky Linux 8+
  • 256 MB RAM minimum (very lightweight)
  • Disk space for the search index (proportional to content volume)
  • Root or sudo access

Installing Sonic

# Method 1: Download a pre-built binary from GitHub releases
SONIC_VERSION="1.4.9"
wget https://github.com/valeriansaliou/sonic/releases/download/v${SONIC_VERSION}/sonic-v${SONIC_VERSION}-x86_64-unknown-linux-gnu.tar.gz

tar xzf sonic-v${SONIC_VERSION}-x86_64-unknown-linux-gnu.tar.gz
sudo mv sonic /usr/local/bin/
sonic --version

# Method 2: Build from source (requires Rust toolchain)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env

git clone https://github.com/valeriansaliou/sonic.git
cd sonic
cargo build --release
sudo cp target/release/sonic /usr/local/bin/

# Create directories
sudo mkdir -p /var/lib/sonic/store
sudo useradd -r -s /sbin/nologin sonic
sudo chown -R sonic:sonic /var/lib/sonic

Configuration

Download and edit the example configuration:

# Download the example config
wget https://raw.githubusercontent.com/valeriansaliou/sonic/master/config.cfg
sudo mkdir -p /etc/sonic
sudo mv config.cfg /etc/sonic/config.cfg
sudo nano /etc/sonic/config.cfg

Key configuration sections:

# /etc/sonic/config.cfg

[server]
log_level = "error"       # error, warn, info, debug, trace

[channel]
inet = "127.0.0.1:1491"   # Listen address (keep on localhost, use Nginx for external)
tcp_timeout = 300          # Seconds before idle connection is closed
auth_password = "your-strong-password-here"  # Required for all connections

[store]
[store.kv]
path = "/var/lib/sonic/store/kv/"
retain_word_objects = 1000  # Keep this many objects per word in memory

[store.kv.pool]
inactive_after = 1800       # Close idle KV stores after 30 minutes

[store.fst]
path = "/var/lib/sonic/store/fst/"  # FST files for autocomplete

[store.fst.pool]
inactive_after = 300        # Close idle FST stores after 5 minutes

[store.fst.graph]
consolidate_after = 180     # Rebuild FST graph after this many seconds of inactivity
max_size = 2048             # Max FST graph size in KB

Running Sonic as a Service

# Create systemd service
sudo cat > /etc/systemd/system/sonic.service << 'EOF'
[Unit]
Description=Sonic Search Backend
After=network.target

[Service]
User=sonic
Group=sonic
ExecStart=/usr/local/bin/sonic -c /etc/sonic/config.cfg
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now sonic
sudo systemctl status sonic

# Verify Sonic is listening
ss -tlnp | grep 1491

The Sonic Protocol: Channels

Sonic uses a raw TCP text protocol with three operation modes (channels):

  • SEARCH - read-only: search and suggest queries
  • INGEST - write: push, pop, flush, count operations
  • CONTROL - admin: trigger, info, ping operations

To interact manually using nc (netcat):

nc localhost 1491

# Server responds:
# CONNECTED <sonic-server v1.4.9>

# Start a channel
START ingest your-strong-password-here
# Server: STARTED ingest protocol(1) v1.4.9

# Push a text object
PUSH messages inbox:user1 conversation:42 "Hello world this is a test message"
# Server: OK

# End the session
QUIT

Ingesting Data

The PUSH command syntax:

PUSH <collection> <bucket> <object> "<text>"
  • collection: logical grouping (e.g., messages, products, articles)
  • bucket: sub-grouping within a collection (e.g., user ID, category)
  • object: unique identifier (e.g., document ID, message ID)
  • text: the text content to index
# Index products using nc
{
echo "START ingest your-strong-password-here"
echo 'PUSH products store:main item:1 "Mechanical keyboard with Cherry MX switches and RGB lighting"'
echo 'PUSH products store:main item:2 "Ergonomic vertical mouse for reduced wrist strain"'
echo 'PUSH products store:main item:3 "USB-C hub with 7 ports including HDMI and SD card reader"'
echo 'PUSH products store:electronics item:1 "Noise cancelling wireless headphones"'
echo "QUIT"
} | nc localhost 1491

# Count indexed objects in a collection/bucket
{
echo "START ingest your-strong-password-here"
echo "COUNT products store:main"
echo "QUIT"
} | nc localhost 1491
# Server: RESULT 3

# Remove a specific object from the index
{
echo "START ingest your-strong-password-here"
echo "POP products store:main item:2"
echo "QUIT"
} | nc localhost 1491

# Flush all objects in a bucket
{
echo "START ingest your-strong-password-here"
echo "FLUSHB products store:old-data"
echo "QUIT"
} | nc localhost 1491

# Flush an entire collection
{
echo "START ingest your-strong-password-here"
echo "FLUSHC products"
echo "QUIT"
} | nc localhost 1491

For bulk ingestion, use a script:

#!/bin/bash
# bulk_ingest.sh - Index from a TSV file: id\ttext

PASSWORD="your-strong-password-here"
COLLECTION="articles"
BUCKET="main"

(
  echo "START ingest $PASSWORD"
  while IFS=$'\t' read -r id text; do
    # Escape quotes in text
    escaped="${text//\"/\\\"}"
    echo "PUSH $COLLECTION $BUCKET article:${id} \"${escaped}\""
  done < articles.tsv
  echo "QUIT"
) | nc localhost 1491

Searching and Suggesting

# Search channel - search and suggest
{
echo "START search your-strong-password-here"
echo "QUERY products store:main \"mechanical keyboard\" LIMIT(5)"
echo "QUIT"
} | nc localhost 1491
# Server: PENDING <id>
# Server: EVENT QUERY <id> item:1  (returns object IDs)

# Search across all buckets in a collection
{
echo "START search your-strong-password-here"
echo "QUERY products * \"wireless\" LIMIT(10) OFFSET(0)"
echo "QUIT"
} | nc localhost 1491

# Suggest - autocomplete from partial word
{
echo "START search your-strong-password-here"
echo "SUGGEST products store:main \"mech\" LIMIT(5)"
echo "QUIT"
} | nc localhost 1491
# Returns words that start with "mech": mechanical, mechanic, etc.

Important: Sonic returns object IDs, not the actual documents. Your application must use those IDs to look up full documents from your primary database (PostgreSQL, Redis, etc.).

Language Detection

Sonic auto-detects the language of indexed text and uses appropriate tokenization and stop words. You can also specify the language explicitly:

# Push text with explicit language (ISO 639-3 code)
PUSH articles main article:1 "Bonjour le monde" LANG(fra)
PUSH articles main article:2 "Hola mundo" LANG(spa)
PUSH articles main article:3 "Hello world" LANG(eng)

Supported languages include eng, fra, spa, deu, ita, por, nld, rus, jpn, zho, and many more. Language detection works automatically when text is long enough; specify it explicitly for short texts.

Application Integration

Most languages have Sonic client libraries. Example in Python:

from sonic import IngestClient, SearchClient

# Index documents
with IngestClient("127.0.0.1", 1491, "your-strong-password-here") as ingest:
    ingest.push("products", "store:main", "item:1",
                "Mechanical keyboard with Cherry MX switches")
    ingest.push("products", "store:main", "item:2",
                "Ergonomic vertical mouse")
    ingest.push("products", "store:main", "item:3",
                "USB-C hub 7 ports HDMI")

# Search and retrieve IDs
with SearchClient("127.0.0.1", 1491, "your-strong-password-here") as search:
    # Query returns a list of object IDs
    results = search.query("products", "store:main", "keyboard", limit=10)
    print(results)  # ['item:1']

    # Suggest returns word completions
    suggestions = search.suggest("products", "store:main", "keyb", limit=5)
    print(suggestions)  # ['keyboard']

# Use the returned IDs to fetch from your primary DB
import psycopg2
conn = psycopg2.connect("dbname=mydb user=myuser")
cur = conn.cursor()
for obj_id in results:
    item_id = obj_id.split(":")[1]
    cur.execute("SELECT * FROM products WHERE id = %s", (item_id,))
    print(cur.fetchone())

Example in Node.js:

const { SonicChannelSearch, SonicChannelIngest } = require("sonic-channel");

const ingest = new SonicChannelIngest({
  host: "127.0.0.1",
  port: 1491,
  auth: "your-strong-password-here",
}).connect();

ingest.push("products", "store:main", "item:1",
            "Mechanical keyboard Cherry MX", "eng",
            () => console.log("Indexed"));

const search = new SonicChannelSearch({
  host: "127.0.0.1",
  port: 1491,
  auth: "your-strong-password-here",
}).connect();

search.query("products", "store:main", "keyboard", 10, 0, "eng",
             (ids) => console.log("Found:", ids));

Troubleshooting

Sonic won't start:

sudo journalctl -u sonic -f
# Check for permission issues on /var/lib/sonic or port conflicts

No search results after ingestion:

# Trigger FST graph consolidation (builds autocomplete index)
{
echo "START control your-strong-password-here"
echo "TRIGGER consolidate"
echo "QUIT"
} | nc localhost 1491

# Check object count
{
echo "START ingest your-strong-password-here"
echo "COUNT products store:main"
echo "QUIT"
} | nc localhost 1491

High disk usage:

du -sh /var/lib/sonic/store/
# KV store grows with indexed text; FST graph is smaller
# Remove old data with FLUSHB or FLUSHC

Connection refused:

ss -tlnp | grep 1491
# Verify Sonic is running and listening
cat /etc/sonic/config.cfg | grep inet

Special characters breaking PUSH command:

  • Escape double quotes in text: \"
  • Sonic's protocol is line-based; avoid newlines in text (replace with spaces)

Conclusion

Sonic provides an extremely lightweight search backend that's perfect for applications needing fast text search without the overhead of Elasticsearch or Solr. Its simple line protocol and minimal resource requirements make it ideal for small VPS deployments, while the object-ID return model keeps Sonic stateless relative to your primary data store. Pair it with Redis for caching search results and your primary database for document retrieval to build a complete, efficient search experience.