Apache Solr Installation and Configuration

Apache Solr is a battle-tested enterprise search platform built on Apache Lucene that provides full-text search, faceted navigation, real-time indexing, and SolrCloud clustering. This guide covers deploying Solr on Linux, creating cores, designing schemas, indexing documents, faceted search, and SolrCloud setup for high availability.

Prerequisites

  • Ubuntu 20.04+ or CentOS 8+ / Rocky Linux 8+
  • Java 11 or Java 17
  • 2 GB RAM minimum (8 GB recommended for production)
  • SSD storage for index files
  • Root or sudo access

Installing Apache Solr

# Install Java 17
# Ubuntu/Debian:
sudo apt-get update && sudo apt-get install -y openjdk-17-jre-headless

# CentOS/Rocky:
sudo dnf install -y java-17-openjdk-headless

java -version

# Download Solr
SOLR_VERSION="9.5.0"
wget https://downloads.apache.org/solr/solr/${SOLR_VERSION}/solr-${SOLR_VERSION}.tgz

# Verify checksum
wget https://downloads.apache.org/solr/solr/${SOLR_VERSION}/solr-${SOLR_VERSION}.tgz.sha512
sha512sum -c solr-${SOLR_VERSION}.tgz.sha512

# Extract the install script
tar xzf solr-${SOLR_VERSION}.tgz solr-${SOLR_VERSION}/bin/install_solr_service.sh --strip-components=2

# Install as a system service
sudo bash ./install_solr_service.sh solr-${SOLR_VERSION}.tgz

# The installer:
# - Creates /opt/solr (symlink to versioned dir)
# - Creates /var/solr (data directory)
# - Creates the 'solr' system user
# - Installs /etc/init.d/solr or systemd unit

# Check status
sudo systemctl status solr
sudo -u solr /opt/solr/bin/solr status

# Access admin UI at http://your-server:8983/solr

Tune the JVM heap in /etc/default/solr.in.sh:

sudo nano /etc/default/solr.in.sh
# Uncomment and set:
SOLR_HEAP="2g"            # For 4 GB total RAM
SOLR_JAVA_HOME="/usr/lib/jvm/java-17-openjdk-amd64"
SOLR_HOST="your-server-ip"

Creating a Core

A core is a single Solr index. In standalone mode (non-cloud):

# Create a core using the managed schema configset
sudo -u solr /opt/solr/bin/solr create_core -c mystore -configset _default

# Verify the core was created
curl http://localhost:8983/solr/admin/cores?action=STATUS&core=mystore

# Cores are stored in /var/solr/data/mystore/

Using the Admin UI:

  1. Open http://your-server:8983/solr
  2. Go to Core AdminAdd Core
  3. Fill in name and use _default as the config set

Schema Design

Solr uses a managed-schema.xml (or schema.xml) to define field types and fields. Edit it with the Schema API for managed schema:

# Add fields via the Schema API
# Add a text field with full-text analysis
curl -X POST http://localhost:8983/solr/mystore/schema \
  -H 'Content-Type: application/json' \
  -d '{
    "add-field": [
      {"name": "title",       "type": "text_general", "indexed": true, "stored": true},
      {"name": "description", "type": "text_general", "indexed": true, "stored": true},
      {"name": "category",    "type": "string",        "indexed": true, "stored": true, "docValues": true},
      {"name": "price",       "type": "pfloat",        "indexed": true, "stored": true, "docValues": true},
      {"name": "in_stock",    "type": "boolean",       "indexed": true, "stored": true},
      {"name": "tags",        "type": "strings",       "indexed": true, "stored": true, "multiValued": true},
      {"name": "created_at",  "type": "pdate",         "indexed": true, "stored": true}
    ],
    "add-copy-field": [
      {"source": "title",       "dest": "_text_"},
      {"source": "description", "dest": "_text_"}
    ]
  }'

# Copy fields to _text_ for catch-all search
# pdate, pfloat, pint, plong are Solr's modern numeric types (use these over legacy float/date)

View the current schema:

curl http://localhost:8983/solr/mystore/schema?wt=json

Indexing Documents

# Index JSON documents
curl -X POST http://localhost:8983/solr/mystore/update/json/docs \
  -H 'Content-Type: application/json' \
  -d '[
    {
      "id": "prod-1",
      "title": "Mechanical Keyboard TKL",
      "description": "Tenkeyless layout with Cherry MX Brown switches",
      "category": "Electronics",
      "price": 89.99,
      "in_stock": true,
      "tags": ["keyboard", "mechanical", "tkl"],
      "created_at": "2024-01-15T00:00:00Z"
    },
    {
      "id": "prod-2",
      "title": "Ergonomic Mouse",
      "description": "Vertical mouse for wrist comfort",
      "category": "Electronics",
      "price": 45.00,
      "in_stock": false,
      "tags": ["mouse", "ergonomic"],
      "created_at": "2024-01-20T00:00:00Z"
    }
  ]'

# Commit the changes (make them visible for search)
curl http://localhost:8983/solr/mystore/update?commit=true

# Or auto-commit - add to solrconfig.xml:
# <autoCommit>
#   <maxTime>30000</maxTime>  <!-- commit every 30 seconds -->
#   <openSearcher>false</openSearcher>
# </autoCommit>

# Bulk index from CSV
curl http://localhost:8983/solr/mystore/update/csv \
  -H 'Content-Type: application/csv' \
  --data-binary @products.csv \
  "?commit=true&header=true&fieldnames=id,title,category,price"

# Delete documents
curl -X POST http://localhost:8983/solr/mystore/update \
  -H 'Content-Type: application/json' \
  -d '{"delete": {"id": "prod-1"}}'

# Delete by query
curl -X POST http://localhost:8983/solr/mystore/update \
  -H 'Content-Type: application/json' \
  -d '{"delete": {"query": "in_stock:false"}}'
curl "http://localhost:8983/solr/mystore/update?commit=true"

Searching and Facets

# Basic search
curl "http://localhost:8983/solr/mystore/select?q=keyboard&wt=json&indent=true"

# Full-text search with field boosting
curl "http://localhost:8983/solr/mystore/select" \
  --get \
  --data-urlencode "q=mechanical keyboard" \
  --data-urlencode "qf=title^3 description^1 tags^2" \
  --data-urlencode "defType=edismax" \
  --data-urlencode "rows=10" \
  --data-urlencode "start=0" \
  --data-urlencode "fl=id,title,price,score" \
  --data-urlencode "sort=score desc, price asc"

# Faceted search
curl "http://localhost:8983/solr/mystore/select" \
  --get \
  --data-urlencode "q=*:*" \
  --data-urlencode "facet=true" \
  --data-urlencode "facet.field=category" \
  --data-urlencode "facet.field=tags" \
  --data-urlencode "facet.range=price" \
  --data-urlencode "facet.range.start=0" \
  --data-urlencode "facet.range.end=200" \
  --data-urlencode "facet.range.gap=50" \
  --data-urlencode "fq=in_stock:true" \
  --data-urlencode "rows=20"

# Highlighting search terms in results
curl "http://localhost:8983/solr/mystore/select" \
  --get \
  --data-urlencode "q=ergonomic" \
  --data-urlencode "hl=true" \
  --data-urlencode "hl.fl=title,description" \
  --data-urlencode "hl.snippets=2" \
  --data-urlencode "hl.fragsize=100"

# Autocomplete / suggest
curl "http://localhost:8983/solr/mystore/suggest?suggest=true&suggest.q=keyb&suggest.dictionary=mySuggester"

Key query parsers:

  • defType=lucene - default Lucene syntax
  • defType=edismax - extended DisMax, best for user-facing search with field boosts (qf)
  • fq - filter query (applied after scoring, cached separately for performance)

SolrCloud Clustering

SolrCloud uses ZooKeeper for cluster coordination:

# Start Solr in cloud mode with embedded ZooKeeper (single node)
sudo -u solr /opt/solr/bin/solr start -cloud -p 8983 -z localhost:9983

# For a production 3-node cluster, first start an external ZooKeeper ensemble
# Then start each Solr node pointing to ZooKeeper:
sudo -u solr /opt/solr/bin/solr start -cloud \
  -p 8983 \
  -z zk1:2181,zk2:2181,zk3:2181/solr

# Create a collection with 2 shards and 2 replicas
curl "http://localhost:8983/solr/admin/collections?action=CREATE&name=mystore&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName=_default"

# Check cluster status
curl http://localhost:8983/solr/admin/collections?action=CLUSTERSTATUS

# Upload a custom config set to ZooKeeper
sudo -u solr /opt/solr/bin/solr zk upconfig \
  -n my-configset \
  -d /path/to/configset \
  -z zk1:2181,zk2:2181,zk3:2181/solr

Security and Authentication

Enable Basic Auth:

# Create security.json
cat > /tmp/security.json << 'EOF'
{
  "authentication": {
    "blockUnknown": true,
    "class": "solr.BasicAuthPlugin",
    "credentials": {
      "solr": "IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="
    }
  },
  "authorization": {
    "class": "solr.RuleBasedAuthorizationPlugin",
    "permissions": [
      {"name": "security-edit", "role": "admin"},
      {"name": "read",          "role": "*"},
      {"name": "update",        "role": "admin"}
    ],
    "user-role": {"solr": "admin"}
  }
}
EOF

# For standalone: copy to core config directory
sudo cp /tmp/security.json /var/solr/data/security.json
sudo systemctl restart solr

# For SolrCloud: upload to ZooKeeper
sudo -u solr /opt/solr/bin/solr zk cp /tmp/security.json zk:security.json -z localhost:9983

Generate a hashed password:

sudo -u solr java -Dlog4j.configurationFile=/opt/solr/server/resources/log4j2-console.xml \
  -classpath '/opt/solr/server/solr-webapp/webapp/WEB-INF/lib/*:/opt/solr/server/lib/ext/*' \
  org.apache.solr.security.Sha256AuthenticationProvider admin_password

Troubleshooting

Check Solr logs:

sudo tail -f /var/solr/logs/solr.log
sudo tail -f /var/solr/logs/solr_gc.log

Out of memory / GC overhead:

# Increase heap in /etc/default/solr.in.sh
SOLR_HEAP="4g"
sudo systemctl restart solr

Core fails to load:

curl http://localhost:8983/solr/admin/cores?action=STATUS&wt=json
# Check "initFailures" in response

Slow queries:

# Enable slow query logging
curl -X POST http://localhost:8983/solr/mystore/config \
  -H 'Content-Type: application/json' \
  -d '{"set-property": {"slowQueryThresholdMillis": 1000}}'
# Slow queries appear in solr.log

Index corruption:

# Optimize the index (merges segments, can be slow)
curl "http://localhost:8983/solr/mystore/update?optimize=true"

Conclusion

Apache Solr is a proven enterprise search platform with powerful full-text analysis, faceting, and SolrCloud clustering for horizontal scalability. The eDisMax query parser with field boosting handles most user-facing search scenarios, while copy fields allow catch-all searching across document fields. For production deployments, use SolrCloud with an external ZooKeeper ensemble, enable Basic Auth or LDAP authentication, and set up autoCommit to balance indexing throughput with search freshness.