Neo4j Graph Database Installation

Neo4j is a leading graph database that efficiently stores and queries relationships between data points. It provides a schema-optional NoSQL model with ACID guarantees and a powerful query language called Cypher. This comprehensive guide covers Neo4j installation, Cypher fundamentals, index management, the APOC plugin ecosystem, backup strategies, and production deployment considerations for relationship-focused applications.

Table of Contents

Graph Database Concepts

Neo4j models data as a property graph consisting of nodes (entities), relationships (connections), and properties (attributes on nodes and relationships). Each relationship has a type and direction, enabling efficient traversal patterns. The graph model excels at representing complex relationships like social networks, knowledge graphs, recommendation engines, and identity management systems.

Cypher is Neo4j's declarative query language designed for expressing graph patterns. Unlike SQL which operates on sets of rows, Cypher operates on paths through the graph. Neo4j evaluates Cypher queries using property indexes and relationship indices for efficient access, making complex relationship traversals faster than traditional relational joins.

Installation

Install Neo4j on Ubuntu/Debian systems using the official repository:

# Add Neo4j repository
curl -fsSL https://debian.neo4j.com/neotechnology.asc | sudo apt-key add -
echo "deb https://debian.neo4j.com stable latest" | sudo tee /etc/apt/sources.list.d/neo4j.list

# Update and install
sudo apt-get update
sudo apt-get install -y neo4j

# Or install specific version
sudo apt-get install -y neo4j=1:5.11.0

# Verify installation
neo4j --version

On CentOS/RHEL:

# Add repository
sudo dnf config-manager --add-repo https://yum.neo4j.com/centos

# Install
sudo dnf install -y neo4j

# Verify
neo4j --version

Create dedicated user and directories:

# Neo4j runs as neo4j user by default
sudo useradd -r -s /bin/false neo4j 2>/dev/null || true

# Create data directory
sudo mkdir -p /var/lib/neo4j/data
sudo mkdir -p /var/lib/neo4j/logs
sudo mkdir -p /var/lib/neo4j/import

# Set ownership
sudo chown -R neo4j:neo4j /var/lib/neo4j
sudo chmod 755 /var/lib/neo4j

Enable and start Neo4j:

# Enable automatic startup
sudo systemctl enable neo4j

# Start service
sudo systemctl start neo4j

# Check status
sudo systemctl status neo4j

# Monitor logs
sudo journalctl -u neo4j -f

Initial Configuration

Configure Neo4j for production use. Edit the configuration file:

sudo nano /etc/neo4j/neo4j.conf

Configure essential settings:

# Database storage location
dbms.directories.data=/var/lib/neo4j/data
dbms.directories.imports=/var/lib/neo4j/import
dbms.directories.logs=/var/lib/neo4j/logs

# Network configuration
server.default_listen_address=0.0.0.0
server.default_advertised_address=192.168.1.10

# Bolt protocol port (native driver protocol)
server.bolt.listen_address=0.0.0.0:7687
server.bolt.advertised_address=192.168.1.10:7687

# HTTP API port (optional, disable for security)
server.http.listen_address=0.0.0.0:7474
server.http.enabled=true

# HTTPS configuration (recommended for production)
server.https.enabled=true
server.https.listen_address=0.0.0.0:7473
server.https.advertised_address=192.168.1.10:7473

# SSL/TLS settings
dbms.ssl.policy.bolt.enabled=true
dbms.ssl.policy.bolt.private_key=private.key
dbms.ssl.policy.bolt.public_certificate=public.crt
dbms.ssl.policy.bolt.client_auth=NONE

# Memory configuration
dbms.memory.heap.initial_size=2G
dbms.memory.heap.max_size=4G
dbms.memory.pagecache.size=4G

# Query optimization
dbms.query_exec_timeout=60s
dbms.transaction.timeout=60s

# Performance tuning
dbms.cypher_compiler=auto
dbms.cypher.min_replan_interval=10s
dbms.query.cardinality.input_prediction_cache_size=1000

# Logging
server.logs.debug.level=INFO
server.logs.query.enabled=true
server.logs.query.parameter_logging_enabled=false

# Security
dbms.security.auth_enabled=true
dbms.security.procedures.unrestricted=apoc.*

# Backup settings
dbms.backup.enabled=true

Generate SSL certificates for HTTPS:

# Create certificate directory
sudo mkdir -p /etc/neo4j/certificates
sudo chown neo4j:neo4j /etc/neo4j/certificates

# Generate self-signed certificate
sudo openssl req -x509 -newkey rsa:4096 -keyout /etc/neo4j/certificates/private.key \
  -out /etc/neo4j/certificates/public.crt -days 365 -nodes \
  -subj "/CN=192.168.1.10"

# Set permissions
sudo chown neo4j:neo4j /etc/neo4j/certificates/*
sudo chmod 600 /etc/neo4j/certificates/*

Set initial password:

# Connect to Neo4j with default credentials
neo4j-admin dbms set-initial-password newpassword

# Or set via command
sudo systemctl stop neo4j
sudo neo4j-admin dbms set-initial-password yourpassword
sudo systemctl start neo4j

Restart Neo4j to apply configuration:

sudo systemctl restart neo4j

# Verify service is running
sudo systemctl status neo4j

# Test connection
cypher-shell -u neo4j -p yourpassword "RETURN 1 as result"

Cypher Query Language

Connect to Neo4j and execute Cypher queries:

# Connect with cypher-shell
cypher-shell -u neo4j -p yourpassword -a bolt://192.168.1.10:7687

# Or use curl for HTTP API
curl -u neo4j:yourpassword -H "Content-Type: application/json" \
  -X POST http://192.168.1.10:7474/db/neo4j/tx -d '
{
  "statements": [
    {
      "statement": "RETURN 1 as result"
    }
  ]
}'

Create nodes and relationships:

-- Create a person node
CREATE (alice:Person {name: "Alice", age: 30, email: "[email protected]"})
RETURN alice;

-- Create multiple nodes
CREATE (bob:Person {name: "Bob", age: 28}),
       (charlie:Person {name: "Charlie", age: 35})
RETURN bob, charlie;

-- Create nodes with relationships
CREATE (alice:Person {name: "Alice"})-[:KNOWS]->(bob:Person {name: "Bob"})
RETURN alice, bob;

-- Create complex graph pattern
CREATE (alice:Person {name: "Alice", email: "[email protected]"})
CREATE (bob:Person {name: "Bob", email: "[email protected]"})
CREATE (charlie:Person {name: "Charlie"})
CREATE (alice)-[:KNOWS {since: 2020}]->(bob)
CREATE (bob)-[:KNOWS {since: 2019}]->(charlie)
CREATE (alice)-[:COLLEAGUE_OF {company: "TechCorp"}]->(charlie)
RETURN alice, bob, charlie;

Query the graph:

-- Find all people
MATCH (p:Person) RETURN p;

-- Find person by name
MATCH (p:Person {name: "Alice"}) RETURN p;

-- Find people with age condition
MATCH (p:Person) WHERE p.age > 25 RETURN p.name, p.age ORDER BY p.age;

-- Find relationships between people
MATCH (a:Person)-[knows:KNOWS]->(b:Person) 
RETURN a.name, knows.since, b.name;

-- Find friends of friends (two-hop relationships)
MATCH (alice:Person {name: "Alice"})-[:KNOWS]->(friend)-[:KNOWS]->(friend_of_friend)
RETURN friend_of_friend.name;

-- Count relationships
MATCH (p:Person)-[r:KNOWS]->() 
RETURN p.name, COUNT(r) as friend_count;

-- Find connected components
MATCH (p1:Person)-[:KNOWS*..2]->(p2:Person {name: "Alice"})
RETURN p1.name, p2.name;

-- Aggregate results
MATCH (p:Person) 
RETURN AVG(p.age) as avg_age, MAX(p.age) as max_age, COUNT(p) as total;

Modify data:

-- Update node properties
MATCH (p:Person {name: "Alice"})
SET p.age = 31, p.updated_at = timestamp()
RETURN p;

-- Add relationship
MATCH (alice:Person {name: "Alice"}), (bob:Person {name: "Bob"})
CREATE (alice)-[:COLLEAGUE_OF {since: 2021}]->(bob)
RETURN alice, bob;

-- Delete node and relationships
MATCH (p:Person {name: "Charlie"})
DETACH DELETE p;

-- Remove relationship
MATCH (alice:Person {name: "Alice"})-[rel:KNOWS]->(bob:Person {name: "Bob"})
DELETE rel
RETURN alice, bob;

-- Merge (create or update)
MERGE (p:Person {email: "[email protected]"})
ON CREATE SET p.name = "Alice", p.created_at = timestamp()
ON MATCH SET p.updated_at = timestamp()
RETURN p;

Data Modeling

Design effective graph schemas:

-- Create domain model for social network
CREATE (alice:Person {id: "1", name: "Alice", email: "[email protected]", age: 30})
CREATE (bob:Person {id: "2", name: "Bob", email: "[email protected]", age: 28})
CREATE (techcorp:Company {id: "c1", name: "TechCorp", founded: 2015})
CREATE (project:Project {id: "p1", name: "AI Initiative", budget: 1000000})

-- Create relationships
CREATE (alice)-[:WORKS_FOR {start_date: 2020}]->(techcorp)
CREATE (bob)-[:WORKS_FOR {start_date: 2021}]->(techcorp)
CREATE (alice)-[:LEADS]->(project)
CREATE (bob)-[:CONTRIBUTES_TO {role: "Engineer"}]->(project)
CREATE (alice)-[:KNOWS {since: 2020}]->(bob)

-- Add location information
CREATE (sf:Location {name: "San Francisco", type: "City"})
CREATE (ny:Location {name: "New York", type: "City"})
CREATE (techcorp)-[:HEADQUARTERED_IN]->(sf)
CREATE (alice)-[:LIVES_IN]->(sf)
CREATE (bob)-[:LIVES_IN]->(ny);

Create recommendation graph:

-- E-commerce recommendation model
CREATE (customer:Customer {id: "cust1", name: "Alice"})
CREATE (product1:Product {id: "p1", name: "Laptop", price: 1000, category: "Electronics"})
CREATE (product2:Product {id: "p2", name: "Mouse", price: 30, category: "Electronics"})
CREATE (product3:Product {id: "p3", name: "Book", price: 20, category: "Books"})

-- Purchases
CREATE (customer)-[:PURCHASED {date: "2024-01-15", rating: 5}]->(product1)
CREATE (customer)-[:PURCHASED {date: "2024-01-20", rating: 4}]->(product2)

-- Find similar customers through purchases
MATCH (c1:Customer)-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(c2:Customer)
WHERE c1.id <> c2.id
RETURN c1.name, c2.name, COUNT(p) as common_purchases
ORDER BY common_purchases DESC;

-- Recommend products based on similar purchases
MATCH (customer:Customer {id: "cust1"})-[:PURCHASED]->(p1:Product),
      (other:Customer)-[:PURCHASED]->(p1),
      (other)-[:PURCHASED]->(p2:Product)
WHERE NOT (customer)-[:PURCHASED]->(p2)
RETURN p2.name, COUNT(*) as recommendation_score
ORDER BY recommendation_score DESC
LIMIT 5;

Indexes and Constraints

Create indexes for query performance:

-- Create index on single property
CREATE INDEX FOR (p:Person) ON (p.email);

-- Create compound index
CREATE INDEX FOR (p:Person) ON (p.age, p.email);

-- Create unique constraint (also creates index)
CREATE CONSTRAINT unique_email FOR (p:Person) REQUIRE p.email IS UNIQUE;

-- Create existence constraint
CREATE CONSTRAINT person_has_name FOR (p:Person) REQUIRE p.name IS NOT NULL;

-- Create relationship property constraint
CREATE CONSTRAINT for ()-[rel:WORKS_FOR]-() REQUIRE rel.start_date IS NOT NULL;

-- List all indexes
CALL db.indexes() YIELD name, type, labelsOrTypes, properties, state;

-- List all constraints
CALL db.constraints() YIELD name, type, labelsOrTypes, properties;

-- Drop index
DROP INDEX index_name;

-- Drop constraint
DROP CONSTRAINT constraint_name;

Optimize query performance:

-- Use EXPLAIN to see query plan
EXPLAIN MATCH (p:Person {email: "[email protected]"}) RETURN p;

-- Use PROFILE to see actual execution
PROFILE MATCH (p:Person {email: "[email protected]"}) RETURN p;

-- Check slow queries
CALL dbms.queryJmx('queries') YIELD queries
UNWIND queries as q
RETURN q.query, q.time ORDER BY q.time DESC;

-- Reuse indexes with WHERE clause
MATCH (p:Person)
WHERE p.age > 25
RETURN p;

-- Use indexed label scan
MATCH (p:Person)
WHERE p.email = "[email protected]"
RETURN p;

APOC Plugins

Install and use the APOC library for extended functionality:

# Download APOC plugin
cd /var/lib/neo4j/plugins
sudo wget https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/4.4.0.0/apoc-4.4.0.0-all.jar

# Set permissions
sudo chown neo4j:neo4j apoc-*.jar

# Restart Neo4j
sudo systemctl restart neo4j

# Verify APOC is loaded
cypher-shell -u neo4j -p yourpassword "CALL apoc.help('hello')"

Use common APOC procedures:

-- Path finding with cost
MATCH (alice:Person {name: "Alice"}), (charlie:Person {name: "Charlie"})
CALL apoc.algo.dijkstra(alice, charlie, 'KNOWS', 'weight')
YIELD path, weight
RETURN path, weight;

-- JSON operations
WITH {name: "Alice", age: 30} as data
RETURN apoc.convert.toJson(data) as json_str;

-- Create relationships with properties
MATCH (p1:Person), (p2:Person)
WHERE p1 <> p2
CALL apoc.create.relationship(p1, 'KNOWS', {since: 2024}, p2)
YIELD rel
RETURN rel;

-- Export to CSV
MATCH (p:Person)
CALL apoc.export.csv.query("MATCH (p:Person) RETURN p.name, p.age", "/tmp/people.csv", {})
YIELD file, nodes, relationships, properties, time, rows
RETURN file, nodes, relationships, properties, time, rows;

-- Path expansion with depth
MATCH (alice:Person {name: "Alice"})
CALL apoc.path.expand(alice, 'KNOWS', '', 0, 3)
YIELD path
RETURN path;

-- Detect cycles
CALL apoc.algo.hasCycle() YIELD hasCycle, cycleNodes, cycleRels
RETURN hasCycle, cycleNodes, cycleRels;

-- Community detection
CALL apoc.algo.community.label.propagation() 
YIELD nodeId, community
RETURN community, COUNT(*) as size
ORDER BY size DESC;

Backup and Recovery

Implement backup procedures:

# Perform offline backup (database must be stopped)
sudo systemctl stop neo4j

# Create backup
sudo neo4j-admin backup --backup-dir=/backup --name=neo4j-$(date +%Y%m%d)

# Start database
sudo systemctl start neo4j

# Or perform online backup with script
#!/bin/bash
BACKUP_DIR="/backup"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)

# Trigger backup via API
curl -u neo4j:yourpassword -H "Content-Type: application/json" \
  -X POST http://localhost:7474/db/neo4j/admin/backup \
  -d '{"backup_dir": "'$BACKUP_DIR/neo4j-$TIMESTAMP'"}' 

# Verify backup
ls -la $BACKUP_DIR/neo4j-$TIMESTAMP/

Restore from backup:

# Stop Neo4j
sudo systemctl stop neo4j

# Restore from backup
sudo neo4j-admin restore --from-backup=/backup/neo4j-20240101 --database=neo4j --force

# Verify integrity
sudo neo4j-admin check-consistency --database=neo4j

# Start Neo4j
sudo systemctl start neo4j

# Verify restoration
cypher-shell -u neo4j -p yourpassword "MATCH (n) RETURN COUNT(n) as node_count"

Bolt Configuration

Configure Bolt protocol for native driver connections:

# In neo4j.conf
server.bolt.listen_address=0.0.0.0:7687
server.bolt.advertised_address=192.168.1.10:7687
server.bolt.connection_max_lifetime=3600

# SSL/TLS for Bolt
dbms.ssl.policy.bolt.enabled=true
dbms.ssl.policy.bolt.base_directory=certificates/bolt
dbms.ssl.policy.bolt.private_key=private.key
dbms.ssl.policy.bolt.public_certificate=public.crt

Connect via Bolt in Python:

from neo4j import GraphDatabase

driver = GraphDatabase.driver(
    "bolt://192.168.1.10:7687",
    auth=("neo4j", "yourpassword"),
    encrypted=True,
    trust="TRUST_ALL_CERTIFICATES"
)

def get_people(session):
    result = session.run("MATCH (p:Person) RETURN p.name, p.age")
    for record in result:
        print(f"{record['p.name']}: {record['p.age']}")

with driver.session() as session:
    get_people(session)

driver.close()

Performance Monitoring

Monitor Neo4j performance:

-- Check database statistics
CALL db.stats() YIELD clustered, mode, maxNodeId, maxRelId, nodesCreated, 
  nodesDeleted, relationshipsCreated, relationshipsDeleted;

-- Monitor transaction log
CALL apoc.monitor.store() YIELD value;

-- Check query cache
CALL apoc.config.list() YIELD key, value WHERE key CONTAINS 'query';

-- Monitor page cache
CALL apoc.monitor.kernel() YIELD value;

-- List running transactions
CALL dbms.listTransactions() YIELD database, transactionId, currentQuery, 
  currentQueryStartTime, requestedStatus;

-- Kill long-running transaction
CALL dbms.killTransaction("db-transaction-123") YIELD message;

Monitor via JMX:

# Enable JMX in neo4j-wrapper.conf
sudo nano /etc/neo4j/neo4j-wrapper.conf

# Add JMX settings
wrapper.java.additional=-Dcom.sun.management.jmxremote
wrapper.java.additional=-Dcom.sun.management.jmxremote.port=3637
wrapper.java.additional=-Dcom.sun.management.jmxremote.authenticate=false
wrapper.java.additional=-Dcom.sun.management.jmxremote.ssl=false

# Connect with monitoring tools
jconsole 192.168.1.10:3637

Conclusion

Neo4j provides a powerful platform for applications where relationships are as important as the data itself. Its intuitive Cypher query language makes relationship traversal and pattern matching straightforward while maintaining excellent performance through intelligent indexing and query optimization. The APOC library extends Neo4j's capabilities with advanced algorithms, data processing, and integration features. By properly designing your graph schema, implementing appropriate indexes, and leveraging APOC procedures, you can build efficient applications for recommendation engines, social networks, knowledge graphs, and any scenario where relationship analysis drives business value.