¡Únete a Nuestra Comunidad en Discord!

Haz clic en el botón de abajo para unirte a nuestra nueva comunidad de Discord.

Unirse a Discord
Ollama

Ollama

Run AI models like Llama, Mistral and Gemma on your server. OpenAI-compatible REST API. Complete privacy without external services.

Version 3.2 app 1-Click Install
Deploy Now
Ollama

Getting started

Ollama lets you run large language models (LLMs) like Llama, Mistral and Gemma directly on your server. OpenAI-compatible REST API for easy integration.

  1. Go to https://my.cubepath.com/deploy
  2. Select a VPS plan (recommended: gp.medium or higher)
  3. Under Operating System, choose "Ollama"
  4. Click Deploy

Once deployment is complete, Ollama will be available at http://YOUR-IP:11434. You'll need to download models before using it.

Deploy via CubePath Cloud API

Via CubeCLI

cubecli vps create \
  --name ollama-server \
  --plan gp.medium \
  --template "Ollama" \
  --location us-mia-1 \
  --project <project-id>

Via API

curl -X POST https://api.cubepath.com/vps \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "ollama-server",
    "plan": "gp.medium",
    "template": "Ollama",
    "location": "us-mia-1",
    "project_id": "YOUR_PROJECT_ID"
  }'

Technical information

Access:
- API URL: http://YOUR-IP:11434
- No web interface (API only)

Installed software:
- Docker
- Ollama (official ollama/ollama image)

Ports used:
- 11434: Ollama API

File locations:
- Data and models: /opt/ollama/
- Downloaded models: /opt/ollama/models/

System requirements:
- Minimum RAM: 8 GB (plan gp.small)
- Recommended RAM: 16 GB or more (plan gp.medium)
- Disk space: Minimum 20 GB (models take 2-8 GB each)

Recommended models:
- 8 GB RAM: llama3.2 (2GB), gemma (2GB), mistral (4GB)
- 16 GB RAM: llama3.1 (5GB), codellama (4GB)
- 32 GB+ RAM: llama3.1:70b, larger models

Useful commands

# Download a model
docker exec -it ollama ollama pull llama3.2

# List installed models
docker exec -it ollama ollama list

# Run model in chat mode
docker exec -it ollama ollama run llama3.2

# Remove a model
docker exec -it ollama ollama rm llama3.2

# View logs
docker logs ollama

# Restart Ollama
docker restart ollama

Using the API

# List available models
curl http://YOUR-IP:11434/api/tags

# Generate text
curl http://YOUR-IP:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Links