Apache Solr Installation and Configuration
Apache Solr is a battle-tested enterprise search platform built on Apache Lucene that provides full-text search, faceted navigation, real-time indexing, and SolrCloud clustering. This guide covers deploying Solr on Linux, creating cores, designing schemas, indexing documents, faceted search, and SolrCloud setup for high availability.
Prerequisites
- Ubuntu 20.04+ or CentOS 8+ / Rocky Linux 8+
- Java 11 or Java 17
- 2 GB RAM minimum (8 GB recommended for production)
- SSD storage for index files
- Root or sudo access
Installing Apache Solr
# Install Java 17
# Ubuntu/Debian:
sudo apt-get update && sudo apt-get install -y openjdk-17-jre-headless
# CentOS/Rocky:
sudo dnf install -y java-17-openjdk-headless
java -version
# Download Solr
SOLR_VERSION="9.5.0"
wget https://downloads.apache.org/solr/solr/${SOLR_VERSION}/solr-${SOLR_VERSION}.tgz
# Verify checksum
wget https://downloads.apache.org/solr/solr/${SOLR_VERSION}/solr-${SOLR_VERSION}.tgz.sha512
sha512sum -c solr-${SOLR_VERSION}.tgz.sha512
# Extract the install script
tar xzf solr-${SOLR_VERSION}.tgz solr-${SOLR_VERSION}/bin/install_solr_service.sh --strip-components=2
# Install as a system service
sudo bash ./install_solr_service.sh solr-${SOLR_VERSION}.tgz
# The installer:
# - Creates /opt/solr (symlink to versioned dir)
# - Creates /var/solr (data directory)
# - Creates the 'solr' system user
# - Installs /etc/init.d/solr or systemd unit
# Check status
sudo systemctl status solr
sudo -u solr /opt/solr/bin/solr status
# Access admin UI at http://your-server:8983/solr
Tune the JVM heap in /etc/default/solr.in.sh:
sudo nano /etc/default/solr.in.sh
# Uncomment and set:
SOLR_HEAP="2g" # For 4 GB total RAM
SOLR_JAVA_HOME="/usr/lib/jvm/java-17-openjdk-amd64"
SOLR_HOST="your-server-ip"
Creating a Core
A core is a single Solr index. In standalone mode (non-cloud):
# Create a core using the managed schema configset
sudo -u solr /opt/solr/bin/solr create_core -c mystore -configset _default
# Verify the core was created
curl http://localhost:8983/solr/admin/cores?action=STATUS&core=mystore
# Cores are stored in /var/solr/data/mystore/
Using the Admin UI:
- Open
http://your-server:8983/solr - Go to Core Admin → Add Core
- Fill in name and use
_defaultas the config set
Schema Design
Solr uses a managed-schema.xml (or schema.xml) to define field types and fields. Edit it with the Schema API for managed schema:
# Add fields via the Schema API
# Add a text field with full-text analysis
curl -X POST http://localhost:8983/solr/mystore/schema \
-H 'Content-Type: application/json' \
-d '{
"add-field": [
{"name": "title", "type": "text_general", "indexed": true, "stored": true},
{"name": "description", "type": "text_general", "indexed": true, "stored": true},
{"name": "category", "type": "string", "indexed": true, "stored": true, "docValues": true},
{"name": "price", "type": "pfloat", "indexed": true, "stored": true, "docValues": true},
{"name": "in_stock", "type": "boolean", "indexed": true, "stored": true},
{"name": "tags", "type": "strings", "indexed": true, "stored": true, "multiValued": true},
{"name": "created_at", "type": "pdate", "indexed": true, "stored": true}
],
"add-copy-field": [
{"source": "title", "dest": "_text_"},
{"source": "description", "dest": "_text_"}
]
}'
# Copy fields to _text_ for catch-all search
# pdate, pfloat, pint, plong are Solr's modern numeric types (use these over legacy float/date)
View the current schema:
curl http://localhost:8983/solr/mystore/schema?wt=json
Indexing Documents
# Index JSON documents
curl -X POST http://localhost:8983/solr/mystore/update/json/docs \
-H 'Content-Type: application/json' \
-d '[
{
"id": "prod-1",
"title": "Mechanical Keyboard TKL",
"description": "Tenkeyless layout with Cherry MX Brown switches",
"category": "Electronics",
"price": 89.99,
"in_stock": true,
"tags": ["keyboard", "mechanical", "tkl"],
"created_at": "2024-01-15T00:00:00Z"
},
{
"id": "prod-2",
"title": "Ergonomic Mouse",
"description": "Vertical mouse for wrist comfort",
"category": "Electronics",
"price": 45.00,
"in_stock": false,
"tags": ["mouse", "ergonomic"],
"created_at": "2024-01-20T00:00:00Z"
}
]'
# Commit the changes (make them visible for search)
curl http://localhost:8983/solr/mystore/update?commit=true
# Or auto-commit - add to solrconfig.xml:
# <autoCommit>
# <maxTime>30000</maxTime> <!-- commit every 30 seconds -->
# <openSearcher>false</openSearcher>
# </autoCommit>
# Bulk index from CSV
curl http://localhost:8983/solr/mystore/update/csv \
-H 'Content-Type: application/csv' \
--data-binary @products.csv \
"?commit=true&header=true&fieldnames=id,title,category,price"
# Delete documents
curl -X POST http://localhost:8983/solr/mystore/update \
-H 'Content-Type: application/json' \
-d '{"delete": {"id": "prod-1"}}'
# Delete by query
curl -X POST http://localhost:8983/solr/mystore/update \
-H 'Content-Type: application/json' \
-d '{"delete": {"query": "in_stock:false"}}'
curl "http://localhost:8983/solr/mystore/update?commit=true"
Searching and Facets
# Basic search
curl "http://localhost:8983/solr/mystore/select?q=keyboard&wt=json&indent=true"
# Full-text search with field boosting
curl "http://localhost:8983/solr/mystore/select" \
--get \
--data-urlencode "q=mechanical keyboard" \
--data-urlencode "qf=title^3 description^1 tags^2" \
--data-urlencode "defType=edismax" \
--data-urlencode "rows=10" \
--data-urlencode "start=0" \
--data-urlencode "fl=id,title,price,score" \
--data-urlencode "sort=score desc, price asc"
# Faceted search
curl "http://localhost:8983/solr/mystore/select" \
--get \
--data-urlencode "q=*:*" \
--data-urlencode "facet=true" \
--data-urlencode "facet.field=category" \
--data-urlencode "facet.field=tags" \
--data-urlencode "facet.range=price" \
--data-urlencode "facet.range.start=0" \
--data-urlencode "facet.range.end=200" \
--data-urlencode "facet.range.gap=50" \
--data-urlencode "fq=in_stock:true" \
--data-urlencode "rows=20"
# Highlighting search terms in results
curl "http://localhost:8983/solr/mystore/select" \
--get \
--data-urlencode "q=ergonomic" \
--data-urlencode "hl=true" \
--data-urlencode "hl.fl=title,description" \
--data-urlencode "hl.snippets=2" \
--data-urlencode "hl.fragsize=100"
# Autocomplete / suggest
curl "http://localhost:8983/solr/mystore/suggest?suggest=true&suggest.q=keyb&suggest.dictionary=mySuggester"
Key query parsers:
defType=lucene- default Lucene syntaxdefType=edismax- extended DisMax, best for user-facing search with field boosts (qf)fq- filter query (applied after scoring, cached separately for performance)
SolrCloud Clustering
SolrCloud uses ZooKeeper for cluster coordination:
# Start Solr in cloud mode with embedded ZooKeeper (single node)
sudo -u solr /opt/solr/bin/solr start -cloud -p 8983 -z localhost:9983
# For a production 3-node cluster, first start an external ZooKeeper ensemble
# Then start each Solr node pointing to ZooKeeper:
sudo -u solr /opt/solr/bin/solr start -cloud \
-p 8983 \
-z zk1:2181,zk2:2181,zk3:2181/solr
# Create a collection with 2 shards and 2 replicas
curl "http://localhost:8983/solr/admin/collections?action=CREATE&name=mystore&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName=_default"
# Check cluster status
curl http://localhost:8983/solr/admin/collections?action=CLUSTERSTATUS
# Upload a custom config set to ZooKeeper
sudo -u solr /opt/solr/bin/solr zk upconfig \
-n my-configset \
-d /path/to/configset \
-z zk1:2181,zk2:2181,zk3:2181/solr
Security and Authentication
Enable Basic Auth:
# Create security.json
cat > /tmp/security.json << 'EOF'
{
"authentication": {
"blockUnknown": true,
"class": "solr.BasicAuthPlugin",
"credentials": {
"solr": "IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="
}
},
"authorization": {
"class": "solr.RuleBasedAuthorizationPlugin",
"permissions": [
{"name": "security-edit", "role": "admin"},
{"name": "read", "role": "*"},
{"name": "update", "role": "admin"}
],
"user-role": {"solr": "admin"}
}
}
EOF
# For standalone: copy to core config directory
sudo cp /tmp/security.json /var/solr/data/security.json
sudo systemctl restart solr
# For SolrCloud: upload to ZooKeeper
sudo -u solr /opt/solr/bin/solr zk cp /tmp/security.json zk:security.json -z localhost:9983
Generate a hashed password:
sudo -u solr java -Dlog4j.configurationFile=/opt/solr/server/resources/log4j2-console.xml \
-classpath '/opt/solr/server/solr-webapp/webapp/WEB-INF/lib/*:/opt/solr/server/lib/ext/*' \
org.apache.solr.security.Sha256AuthenticationProvider admin_password
Troubleshooting
Check Solr logs:
sudo tail -f /var/solr/logs/solr.log
sudo tail -f /var/solr/logs/solr_gc.log
Out of memory / GC overhead:
# Increase heap in /etc/default/solr.in.sh
SOLR_HEAP="4g"
sudo systemctl restart solr
Core fails to load:
curl http://localhost:8983/solr/admin/cores?action=STATUS&wt=json
# Check "initFailures" in response
Slow queries:
# Enable slow query logging
curl -X POST http://localhost:8983/solr/mystore/config \
-H 'Content-Type: application/json' \
-d '{"set-property": {"slowQueryThresholdMillis": 1000}}'
# Slow queries appear in solr.log
Index corruption:
# Optimize the index (merges segments, can be slow)
curl "http://localhost:8983/solr/mystore/update?optimize=true"
Conclusion
Apache Solr is a proven enterprise search platform with powerful full-text analysis, faceting, and SolrCloud clustering for horizontal scalability. The eDisMax query parser with field boosting handles most user-facing search scenarios, while copy fields allow catch-all searching across document fields. For production deployments, use SolrCloud with an external ZooKeeper ensemble, enable Basic Auth or LDAP authentication, and set up autoCommit to balance indexing throughput with search freshness.


