Ollama
Run AI models like Llama, Mistral and Gemma on your server. OpenAI-compatible REST API. Complete privacy without external services.
Getting started
Ollama lets you run large language models (LLMs) like Llama, Mistral and Gemma directly on your server. OpenAI-compatible REST API for easy integration.
- Go to https://my.cubepath.com/deploy
- Select a VPS plan (recommended: gp.medium or higher)
- Under Operating System, choose "Ollama"
- Click Deploy
Once deployment is complete, Ollama will be available at http://YOUR-IP:11434. You'll need to download models before using it.
Deploy via CubePath Cloud API
Via CubeCLI
cubecli vps create \
--name ollama-server \
--plan gp.medium \
--template "Ollama" \
--location us-mia-1 \
--project <project-id>
Via API
curl -X POST https://api.cubepath.com/vps \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "ollama-server",
"plan": "gp.medium",
"template": "Ollama",
"location": "us-mia-1",
"project_id": "YOUR_PROJECT_ID"
}'
Technical information
Access:
- API URL: http://YOUR-IP:11434
- No web interface (API only)
Installed software:
- Docker
- Ollama (official ollama/ollama image)
Ports used:
- 11434: Ollama API
File locations:
- Data and models: /opt/ollama/
- Downloaded models: /opt/ollama/models/
System requirements:
- Minimum RAM: 8 GB (plan gp.small)
- Recommended RAM: 16 GB or more (plan gp.medium)
- Disk space: Minimum 20 GB (models take 2-8 GB each)
Recommended models:
- 8 GB RAM: llama3.2 (2GB), gemma (2GB), mistral (4GB)
- 16 GB RAM: llama3.1 (5GB), codellama (4GB)
- 32 GB+ RAM: llama3.1:70b, larger models
Useful commands
# Download a model
docker exec -it ollama ollama pull llama3.2
# List installed models
docker exec -it ollama ollama list
# Run model in chat mode
docker exec -it ollama ollama run llama3.2
# Remove a model
docker exec -it ollama ollama rm llama3.2
# View logs
docker logs ollama
# Restart Ollama
docker restart ollama
Using the API
# List available models
curl http://YOUR-IP:11434/api/tags
# Generate text
curl http://YOUR-IP:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"stream": false
}'