Neo4j Graph Database Instalación

Neo4j is a leading graph database that efficiently stores and queries relationships between data points. It proporciona a schema-optional NoSQL model with ACID guarantees and a powerful query language called Cypher. Esta guía completa cubre Neo4j installation, Cypher fundamentals, index management, the APOC plugin ecosystem, backup strategies, and production deployment considerations for relationship-focused applications.

Tabla de Contenidos

Graph Database Concepts

Neo4j models data as a property graph consisting of nodos (entities), relationships (connections), and properties (attributes on nodos and relationships). Each relationship has a type and direction, enabling efficient traversal patterns. The graph model excels at representing complex relationships like social redes, knowledge graphs, recommendation engines, and identity management systems.

Cypher is Neo4j's declarative query language designed for expressing graph patterns. Unlike SQL which operates on sets of rows, Cypher operates on paths through the graph. Neo4j evaluates Cypher queries using property indexes and relationship indices for efficient access, making complex relationship traversals faster than traditional relational joins.

Instalación

Instala Neo4j on Ubuntu/Debian systems using the official repositorio:

# Add Neo4j repositorio
curl -fsSL https://debian.neo4j.com/neotechnology.asc | sudo apt-key add -
echo "deb https://debian.neo4j.com stable latest" | sudo tee /etc/apt/sources.list.d/neo4j.list

# Actualiza and install
sudo apt-get update
sudo apt-get install -y neo4j

# Or install specific version
sudo apt-get install -y neo4j=1:5.11.0

# Verifica installation
neo4j --version

On CentOS/RHEL:

# Add repositorio
sudo dnf config-manager --add-repo https://yum.neo4j.com/centos

# Instala
sudo dnf install -y neo4j

# Verifica
neo4j --version

Crea dedicated user and directories:

# Neo4j runs as neo4j user by default
sudo useradd -r -s /bin/false neo4j 2>/dev/null || true

# Crea data directory
sudo mkdir -p /var/lib/neo4j/data
sudo mkdir -p /var/lib/neo4j/logs
sudo mkdir -p /var/lib/neo4j/import

# Set ownership
sudo chown -R neo4j:neo4j /var/lib/neo4j
sudo chmod 755 /var/lib/neo4j

Habilita and start Neo4j:

# Habilita automatic startup
sudo systemctl enable neo4j

# Inicia servicio
sudo systemctl start neo4j

# Check status
sudo systemctl status neo4j

# Monitorea logs
sudo journalctl -u neo4j -f

Initial Configuración

Configura Neo4j for production use. Edit the configuration file:

sudo nano /etc/neo4j/neo4j.conf

Configura essential settings:

# Database almacenamiento location
dbms.directories.data=/var/lib/neo4j/data
dbms.directories.imports=/var/lib/neo4j/import
dbms.directories.logs=/var/lib/neo4j/logs

# Red configuration
server.default_listen_address=0.0.0.0
server.default_advertised_address=192.168.1.10

# Bolt protocol puerto (native driver protocol)
server.bolt.listen_address=0.0.0.0:7687
server.bolt.advertised_address=192.168.1.10:7687

# HTTP API puerto (optional, disable for security)
server.http.listen_address=0.0.0.0:7474
server.http.enabled=true

# HTTPS configuration (recommended for production)
server.https.enabled=true
server.https.listen_address=0.0.0.0:7473
server.https.advertised_address=192.168.1.10:7473

# SSL/TLS settings
dbms.ssl.policy.bolt.enabled=true
dbms.ssl.policy.bolt.private_key=private.key
dbms.ssl.policy.bolt.public_certificate=public.crt
dbms.ssl.policy.bolt.client_auth=NONE

# Memory configuration
dbms.memory.heap.initial_size=2G
dbms.memory.heap.max_size=4G
dbms.memory.pagecache.size=4G

# Query optimization
dbms.query_exec_timeout=60s
dbms.transaction.timeout=60s

# Performance tuning
dbms.cypher_compiler=auto
dbms.cypher.min_replan_interval=10s
dbms.query.cardinality.input_prediction_cache_size=1000

# Logging
server.logs.debug.level=INFO
server.logs.query.enabled=true
server.logs.query.parameter_logging_enabled=false

# Security
dbms.security.auth_enabled=true
dbms.security.procedures.unrestricted=apoc.*

# Respalda settings
dbms.backup.enabled=true

Generate SSL certificates for HTTPS:

# Crea certificate directory
sudo mkdir -p /etc/neo4j/certificates
sudo chown neo4j:neo4j /etc/neo4j/certificates

# Generate self-signed certificate
sudo openssl req -x509 -newkey rsa:4096 -keyout /etc/neo4j/certificates/private.key \
  -out /etc/neo4j/certificates/public.crt -days 365 -nodos \
  -subj "/CN=192.168.1.10"

# Set permissions
sudo chown neo4j:neo4j /etc/neo4j/certificates/*
sudo chmod 600 /etc/neo4j/certificates/*

Set initial password:

# Conecta to Neo4j with default credentials
neo4j-admin dbms set-initial-password newpassword

# Or set via command
sudo systemctl stop neo4j
sudo neo4j-admin dbms set-initial-password yourpassword
sudo systemctl start neo4j

Reinicia Neo4j to apply configuration:

sudo systemctl restart neo4j

# Verifica servicio is running
sudo systemctl status neo4j

# Prueba connection
cypher-shell -u neo4j -p yourpassword "RETURN 1 as result"

Cypher Query Language

Conecta to Neo4j and execute Cypher queries:

# Conecta with cypher-shell
cypher-shell -u neo4j -p yourpassword -a bolt://192.168.1.10:7687

# Or use curl for HTTP API
curl -u neo4j:yourpassword -H "Content-Type: application/json" \
  -X POST http://192.168.1.10:7474/db/neo4j/tx -d '
{
  "statements": [
    {
      "statement": "RETURN 1 as result"
    }
  ]
}'

Crea nodos and relationships:

-- Crea a person nodo
CREATE (alice:Person {name: "Alice", age: 30, email: "[email protected]"})
RETURN alice;

-- Crea multiple nodos
CREATE (bob:Person {name: "Bob", age: 28}),
       (charlie:Person {name: "Charlie", age: 35})
RETURN bob, charlie;

-- Crea nodos with relationships
CREATE (alice:Person {name: "Alice"})-[:KNOWS]->(bob:Person {name: "Bob"})
RETURN alice, bob;

-- Crea complex graph pattern
CREATE (alice:Person {name: "Alice", email: "[email protected]"})
CREATE (bob:Person {name: "Bob", email: "[email protected]"})
CREATE (charlie:Person {name: "Charlie"})
CREATE (alice)-[:KNOWS {since: 2020}]->(bob)
CREATE (bob)-[:KNOWS {since: 2019}]->(charlie)
CREATE (alice)-[:COLLEAGUE_OF {company: "TechCorp"}]->(charlie)
RETURN alice, bob, charlie;

Query the graph:

-- Find all people
MATCH (p:Person) RETURN p;

-- Find person by name
MATCH (p:Person {name: "Alice"}) RETURN p;

-- Find people with age condition
MATCH (p:Person) WHERE p.age > 25 RETURN p.name, p.age ORDER BY p.age;

-- Find relationships between people
MATCH (a:Person)-[knows:KNOWS]->(b:Person) 
RETURN a.name, knows.since, b.name;

-- Find friends of friends (two-hop relationships)
MATCH (alice:Person {name: "Alice"})-[:KNOWS]->(friend)-[:KNOWS]->(friend_of_friend)
RETURN friend_of_friend.name;

-- Count relationships
MATCH (p:Person)-[r:KNOWS]->() 
RETURN p.name, COUNT(r) as friend_count;

-- Find connected components
MATCH (p1:Person)-[:KNOWS*..2]->(p2:Person {name: "Alice"})
RETURN p1.name, p2.name;

-- Agrega results
MATCH (p:Person) 
RETURN AVG(p.age) as avg_age, MAX(p.age) as max_age, COUNT(p) as total;

Modify data:

-- Actualiza nodo properties
MATCH (p:Person {name: "Alice"})
SET p.age = 31, p.updated_at = timestamp()
RETURN p;

-- Add relationship
MATCH (alice:Person {name: "Alice"}), (bob:Person {name: "Bob"})
CREATE (alice)-[:COLLEAGUE_OF {since: 2021}]->(bob)
RETURN alice, bob;

-- Elimina nodo and relationships
MATCH (p:Person {name: "Charlie"})
DETACH DELETE p;

-- Remueve relationship
MATCH (alice:Person {name: "Alice"})-[rel:KNOWS]->(bob:Person {name: "Bob"})
DELETE rel
RETURN alice, bob;

-- Merge (create or update)
MERGE (p:Person {email: "[email protected]"})
ON CREATE SET p.name = "Alice", p.created_at = timestamp()
ON MATCH SET p.updated_at = timestamp()
RETURN p;

Data Modeling

Design effective graph schemas:

-- Crea domain model for social red
CREATE (alice:Person {id: "1", name: "Alice", email: "[email protected]", age: 30})
CREATE (bob:Person {id: "2", name: "Bob", email: "[email protected]", age: 28})
CREATE (techcorp:Company {id: "c1", name: "TechCorp", founded: 2015})
CREATE (project:Project {id: "p1", name: "AI Initiative", budget: 1000000})

-- Crea relationships
CREATE (alice)-[:WORKS_FOR {start_date: 2020}]->(techcorp)
CREATE (bob)-[:WORKS_FOR {start_date: 2021}]->(techcorp)
CREATE (alice)-[:LEADS]->(project)
CREATE (bob)-[:CONTRIBUTES_TO {role: "Engineer"}]->(project)
CREATE (alice)-[:KNOWS {since: 2020}]->(bob)

-- Add location information
CREATE (sf:Location {name: "San Francisco", type: "City"})
CREATE (ny:Location {name: "New York", type: "City"})
CREATE (techcorp)-[:HEADQUARTERED_IN]->(sf)
CREATE (alice)-[:LIVES_IN]->(sf)
CREATE (bob)-[:LIVES_IN]->(ny);

Crea recommendation graph:

-- E-commerce recommendation model
CREATE (customer:Customer {id: "cust1", name: "Alice"})
CREATE (product1:Product {id: "p1", name: "Laptop", price: 1000, category: "Electronics"})
CREATE (product2:Product {id: "p2", name: "Mouse", price: 30, category: "Electronics"})
CREATE (product3:Product {id: "p3", name: "Book", price: 20, category: "Books"})

-- Purchases
CREATE (customer)-[:PURCHASED {date: "2024-01-15", rating: 5}]->(product1)
CREATE (customer)-[:PURCHASED {date: "2024-01-20", rating: 4}]->(product2)

-- Find similar customers through purchases
MATCH (c1:Customer)-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(c2:Customer)
WHERE c1.id <> c2.id
RETURN c1.name, c2.name, COUNT(p) as common_purchases
ORDER BY common_purchases DESC;

-- Recommend products based on similar purchases
MATCH (customer:Customer {id: "cust1"})-[:PURCHASED]->(p1:Product),
      (other:Customer)-[:PURCHASED]->(p1),
      (other)-[:PURCHASED]->(p2:Product)
WHERE NOT (customer)-[:PURCHASED]->(p2)
RETURN p2.name, COUNT(*) as recommendation_score
ORDER BY recommendation_score DESC
LIMIT 5;

Indexes and Constraints

Crea indexes for query performance:

-- Crea index on single property
CREATE INDEX FOR (p:Person) ON (p.email);

-- Crea compound index
CREATE INDEX FOR (p:Person) ON (p.age, p.email);

-- Crea unique constraint (also creates index)
CREATE CONSTRAINT unique_email FOR (p:Person) REQUIRE p.email IS UNIQUE;

-- Crea existence constraint
CREATE CONSTRAINT person_has_name FOR (p:Person) REQUIRE p.name IS NOT NULL;

-- Crea relationship property constraint
CREATE CONSTRAINT for ()-[rel:WORKS_FOR]-() REQUIRE rel.start_date IS NOT NULL;

-- List all indexes
CALL db.indexes() YIELD name, type, labelsOrTypes, properties, state;

-- List all constraints
CALL db.constraints() YIELD name, type, labelsOrTypes, properties;

-- Drop index
DROP INDEX index_name;

-- Drop constraint
DROP CONSTRAINT constraint_name;

Optimiza query performance:

-- Use EXPLAIN to see query plan
EXPLAIN MATCH (p:Person {email: "[email protected]"}) RETURN p;

-- Use PROFILE to see actual execution
PROFILE MATCH (p:Person {email: "[email protected]"}) RETURN p;

-- Check slow queries
CALL dbms.queryJmx('queries') YIELD queries
UNWIND queries as q
RETURN q.query, q.time ORDER BY q.time DESC;

-- Reuse indexes with WHERE clause
MATCH (p:Person)
WHERE p.age > 25
RETURN p;

-- Use indexed label scan
MATCH (p:Person)
WHERE p.email = "[email protected]"
RETURN p;

APOC Plugins

Instala and use the APOC library for extended functionality:

# Download APOC plugin
cd /var/lib/neo4j/plugins
sudo wget https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/4.4.0.0/apoc-4.4.0.0-all.jar

# Set permissions
sudo chown neo4j:neo4j apoc-*.jar

# Reinicia Neo4j
sudo systemctl restart neo4j

# Verifica APOC is loaded
cypher-shell -u neo4j -p yourpassword "CALL apoc.help('hello')"

Use common APOC procedures:

-- Path finding with cost
MATCH (alice:Person {name: "Alice"}), (charlie:Person {name: "Charlie"})
CALL apoc.algo.dijkstra(alice, charlie, 'KNOWS', 'weight')
YIELD path, weight
RETURN path, weight;

-- JSON operations
WITH {name: "Alice", age: 30} as data
RETURN apoc.convert.toJson(data) as json_str;

-- Crea relationships with properties
MATCH (p1:Person), (p2:Person)
WHERE p1 <> p2
CALL apoc.create.relationship(p1, 'KNOWS', {since: 2024}, p2)
YIELD rel
RETURN rel;

-- Export to CSV
MATCH (p:Person)
CALL apoc.export.csv.query("MATCH (p:Person) RETURN p.name, p.age", "/tmp/people.csv", {})
YIELD file, nodos, relationships, properties, time, rows
RETURN file, nodos, relationships, properties, time, rows;

-- Path expansion with depth
MATCH (alice:Person {name: "Alice"})
CALL apoc.path.expand(alice, 'KNOWS', '', 0, 3)
YIELD path
RETURN path;

-- Detect cycles
CALL apoc.algo.hasCycle() YIELD hasCycle, cycleNodes, cycleRels
RETURN hasCycle, cycleNodes, cycleRels;

-- Community detection
CALL apoc.algo.community.label.propagation() 
YIELD nodeId, community
RETURN community, COUNT(*) as size
ORDER BY size DESC;

Respalda and Recovery

Implement backup procedures:

# Perform offline backup (database must be stopped)
sudo systemctl stop neo4j

# Crea backup
sudo neo4j-admin backup --backup-dir=/backup --name=neo4j-$(date +%Y%m%d)

# Inicia database
sudo systemctl start neo4j

# Or perform online backup with script
#!/bin/bash
BACKUP_DIR="/backup"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)

# Trigger backup via API
curl -u neo4j:yourpassword -H "Content-Type: application/json" \
  -X POST http://localhost:7474/db/neo4j/admin/backup \
  -d '{"backup_dir": "'$BACKUP_DIR/neo4j-$TIMESTAMP'"}' 

# Verifica backup
ls -la $BACKUP_DIR/neo4j-$TIMESTAMP/

Restaura from backup:

# Detén Neo4j
sudo systemctl stop neo4j

# Restaura from backup
sudo neo4j-admin restore --from-backup=/backup/neo4j-20240101 --database=neo4j --force

# Verifica integrity
sudo neo4j-admin verifica-consistency --database=neo4j

# Inicia Neo4j
sudo systemctl start neo4j

# Verifica restoration
cypher-shell -u neo4j -p yourpassword "MATCH (n) RETURN COUNT(n) as node_count"

Bolt Configuración

Configura Bolt protocol for native driver connections:

# In neo4j.conf
server.bolt.listen_address=0.0.0.0:7687
server.bolt.advertised_address=192.168.1.10:7687
server.bolt.connection_max_lifetime=3600

# SSL/TLS for Bolt
dbms.ssl.policy.bolt.enabled=true
dbms.ssl.policy.bolt.base_directory=certificates/bolt
dbms.ssl.policy.bolt.private_key=private.key
dbms.ssl.policy.bolt.public_certificate=public.crt

Conecta via Bolt in Python:

from neo4j import GraphDatabase

driver = GraphDatabase.driver(
    "bolt://192.168.1.10:7687",
    auth=("neo4j", "yourpassword"),
    encrypted=True,
    trust="TRUST_ALL_CERTIFICATES"
)

def get_people(session):
    result = session.run("MATCH (p:Person) RETURN p.name, p.age")
    for record in result:
        print(f"{record['p.name']}: {record['p.age']}")

with driver.session() as session:
    get_people(session)

driver.close()

Performance Monitoreo

Monitorea Neo4j performance:

-- Check database statistics
CALL db.stats() YIELD clustered, mode, maxNodeId, maxRelId, nodesCreated, 
  nodesDeleted, relationshipsCreated, relationshipsDeleted;

-- Monitorea transaction log
CALL apoc.monitor.store() YIELD value;

-- Check query cache
CALL apoc.config.list() YIELD key, value WHERE key CONTAINS 'query';

-- Monitorea page cache
CALL apoc.monitor.kernel() YIELD value;

-- List running transactions
CALL dbms.listTransactions() YIELD database, transactionId, currentQuery, 
  currentQueryStartTime, requestedStatus;

-- Kill long-running transaction
CALL dbms.killTransaction("db-transaction-123") YIELD message;

Monitorea via JMX:

# Habilita JMX in neo4j-wrapper.conf
sudo nano /etc/neo4j/neo4j-wrapper.conf

# Add JMX settings
wrapper.java.additional=-Dcom.sun.management.jmxremote
wrapper.java.additional=-Dcom.sun.management.jmxremote.puerto=3637
wrapper.java.additional=-Dcom.sun.management.jmxremote.authenticate=false
wrapper.java.additional=-Dcom.sun.management.jmxremote.ssl=false

# Conecta with monitoring tools
jconsole 192.168.1.10:3637

Conclusión

Neo4j proporciona a powerful platform for applications where relationships are as important as the data itself. Its intuitive Cypher query language makes relationship traversal and pattern matching straightforward while maintaining excellent performance through intelligent indexing and query optimization. The APOC library extends Neo4j's capabilities with advanced algorithms, data processing, and integration features. By properly designing your graph schema, implementing appropriate indexes, and leveraging APOC procedures, you can build efficient applications for recommendation engines, social redes, knowledge graphs, and any scenario where relationship analysis drives business value.