Build AI Workflows in n8n with Ollama and Claude on a VPS
Connect n8n to AI models through two paths: Ollama for free local inference and the Claude API for cloud intelligence. Build a content classification workflow with both, running on a self-hosted VPS.
n8n has native AI nodes built on LangChain. You can plug in a local model through Ollama or a cloud model through the Claude API. Both options connect the same way: as sub-nodes inside n8n's AI Agent node. This tutorial sets up both paths on a self-hosted VPS, builds a practical content classification workflow, and shows you how to swap between local and cloud inference by changing a single node.
If you haven't installed n8n yet, start with our guide on installing n8n with Docker Compose on a VPS.
What do you need before adding AI to n8n?
You need a running n8n instance on a VPS with Docker Compose, a domain with TLS, and SSH access. For Ollama, you need at least 8 GB of RAM on your VPS to run small models (7-8B parameters). For Claude, you need an Anthropic API key. No GPU is required for Ollama on CPU, but inference will be slower.
Prerequisites checklist:
- A VPS with at least 8 GB RAM (4 vCPU recommended). A Virtua Cloud VCS-8 works well.
- n8n running via Docker Compose (see the n8n installation guide)
- SSH access as a non-root user with sudo
- A domain pointing to your VPS with TLS configured (see our reverse proxy and auth guide)
- For the Claude path: an Anthropic Console account with an API key
How do you add Ollama to your n8n Docker Compose stack?
Add Ollama as a service in your existing Docker Compose file, on the same network as n8n. n8n reaches Ollama through Docker's internal DNS using the service name as hostname. No API key needed. Ollama stays on the internal network only, never exposed to the internet.
Open your existing docker-compose.yml and add the ollama service:
services:
# ... your existing n8n service ...
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
volumes:
- ollama_data:/root/.ollama
networks:
- n8n-network
environment:
- OLLAMA_HOST=0.0.0.0
deploy:
resources:
limits:
memory: 6G
reservations:
memory: 4G
healthcheck:
test: ["CMD", "ollama", "ps"]
interval: 30s
timeout: 10s
retries: 3
volumes:
# ... your existing volumes ...
ollama_data:
Key points about this config:
- No published ports. Ollama listens on 11434 inside the container, but we don't map it to the host. Only containers on
n8n-networkcan reach it. This prevents anyone on the internet from using your Ollama instance. OLLAMA_HOST=0.0.0.0tells Ollama to listen on all interfaces inside the container. Without this, it binds to localhost only and n8n can't reach it from another container.- Memory limits prevent Ollama from consuming all VPS RAM. Adjust based on your model size.
- Health check uses
ollama psto query the server. Docker restarts the container if Ollama becomes unresponsive. The Ollama image doesn't includecurl, so we use the built-in CLI instead.
If your VPS has an NVIDIA GPU and you've installed the NVIDIA Container Toolkit, add GPU passthrough:
ollama:
# ... same as above, plus:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Start the updated stack:
docker compose up -d ollama
Verify Ollama is running:
docker compose logs ollama --tail 20
The output shows Listening on [::]:11434. Now pull a model:
docker compose exec ollama ollama pull llama3.2:3b
This downloads the 3B parameter Llama 3.2 model (about 2 GB). For an 8 GB VPS, this is the safest starting point. See the model sizing table below for other options.
Verify the model loaded:
docker compose exec ollama ollama list
Test inference directly:
docker compose exec ollama ollama run llama3.2:3b "Say hello in one sentence"
You should get a response in a few seconds. If this works, Ollama is ready for n8n.
Which Ollama model should you pick for your VPS?
The right model depends on your available RAM. A rule of thumb: 1 billion parameters requires roughly 1 GB of RAM at Q4 quantization. On a VPS without a GPU, the model runs on CPU, which is slower but functional for batch and background workflows.
| Model | Parameters | Disk size | RAM needed | Best for |
|---|---|---|---|---|
| llama3.2:3b | 3B | ~2 GB | 4 GB | Light tasks, limited RAM |
| llama3.1:8b | 8B | ~4.9 GB | 8 GB | General purpose, 128K context |
| mistral:7b | 7B | ~4.4 GB | 7 GB | Fast inference, European model |
| qwen2.5:7b | 7B | ~4.7 GB | 8 GB | Multilingual, coding tasks |
| gemma3:4b | 4B | ~3.3 GB | 5 GB | Multimodal, good quality/size ratio |
On a 4 vCPU, 8 GB RAM VPS (like the Virtua Cloud VCS-8), llama3.2:3b runs with headroom for n8n and the OS. The 7-8B models fit but leave less room. For those, consider a 16 GB VPS.
Pull your chosen model before continuing. All steps below work with any model in the table.
How do you connect n8n to the Claude API?
Create an Anthropic credential in n8n with your API key. Then use the Anthropic Chat Model sub-node inside any AI Agent workflow. n8n handles the API calls natively. No HTTP Request node needed.
Generate an API key
- Go to Anthropic Console > Settings > API Keys
- Click Create Key
- Name it something identifiable like
n8n-vps - Copy the key immediately. You won't see it again.
Store the key securely. Don't paste it in files on disk. You'll enter it directly in n8n's credential manager, which encrypts it.
Add the credential in n8n
- In n8n, go to Credentials in the left sidebar
- Click Add Credential
- Search for Anthropic API
- Paste your API key
- Click Save
n8n tests the connection on save. A "Connection tested successfully" message appears. If it fails, check that your API key is valid and your VPS can reach https://api.anthropic.com (outbound HTTPS must not be blocked by your firewall).
How does n8n's AI Agent system work?
n8n's AI capabilities are built on LangChain. The architecture uses two types of nodes: root nodes (also called cluster nodes) that define the agent's behavior, and sub-nodes that provide specific capabilities like the language model, memory, and tools. Understanding this structure helps you build and debug workflows.
Root nodes:
- AI Agent node: the main orchestrator. It receives input, sends it to the language model, can use tools, and returns a response. This is what you'll use most.
- Basic LLM Chain: simpler than the Agent. Takes input, sends to LLM, returns output. No tool use, no reasoning loop.
Sub-nodes (attach to root nodes):
- Chat Model (Ollama Chat Model or Anthropic Chat Model): the LLM that generates responses
- Memory (Window Buffer Memory, etc.): stores conversation history
- Tools (HTTP Request, Code, Calculator, etc.): actions the agent can take
- Output Parser: structures the LLM response into usable data
The important point: swapping between Ollama and Claude means swapping one sub-node. The rest of the workflow stays identical. This is why n8n's architecture works well for testing both local and cloud inference.
How do you build an AI classification workflow in n8n?
This workflow receives content via a webhook, sends it to an LLM for classification and summarization, then routes the result based on urgency. It's a practical pattern for email triage, support ticket routing, or content moderation. We'll build it first with Ollama, then swap to Claude.
Step 1: Create the webhook trigger
- Create a new workflow in n8n
- Add a Webhook node as the trigger
- Set the HTTP method to
POST - Set the path to something like
classify - Under Response, select "Respond to Webhook" (we'll add that node later)
- Save and note the test webhook URL
The webhook will receive JSON like this:
{
"title": "Server disk full alert",
"body": "Production server db-01 has reached 95% disk usage. Immediate action required.",
"source": "monitoring"
}
Step 2: Add the AI Agent node with Ollama
- Add an AI Agent node after the webhook
- In the Agent's settings, set the system prompt:
You are a content classifier. For each incoming message, respond with valid JSON only:
{
"urgency": "high" or "low",
"category": "infrastructure" or "security" or "billing" or "general",
"summary": "one sentence summary"
}
Do not include any text outside the JSON object.
-
Connect an Ollama Chat Model sub-node to the Agent:
- Click the Chat Model connector on the AI Agent
- Select Ollama Chat Model
- In the credentials dropdown, click Create New
- Set the Base URL to
http://ollama:11434(the Docker service name) - Save the credential
- Select your model (e.g.,
llama3.2:3b)
-
Connect the webhook output to the AI Agent input. Map the message text using an expression:
Title: {{ $json.title }}
Body: {{ $json.body }}
Source: {{ $json.source }}
Step 3: Parse and route the response
- Add an IF node after the AI Agent
- Set the condition: check if the AI response contains
"urgency": "high"or parse the JSON and check theurgencyfield - True branch (high urgency): add a notification node (Slack, email, or HTTP Request to your alerting endpoint)
- False branch (low urgency): add a different action (log to a spreadsheet, send a digest email, etc.)
- Add a Respond to Webhook node at the end of each branch to return the classification result
Step 4: Test the workflow
Activate the workflow for testing. Send a test request:
curl -X POST https://your-n8n-domain.com/webhook-test/classify \
-H "Content-Type: application/json" \
-d '{
"title": "Server disk full alert",
"body": "Production server db-01 reached 95% disk usage. Immediate action required.",
"source": "monitoring"
}'
Check the n8n execution history. The result shows:
- The webhook received the payload
- The AI Agent sent it to Ollama
- Ollama returned a JSON classification
- The IF node routed it to the correct branch
Look at the execution time on the AI Agent node. With Ollama on CPU (llama3.2:3b), expect 3-8 seconds depending on your VPS specs. Fine for background automation, too slow for real-time user-facing responses.
How does the same workflow run on Ollama vs Claude?
Swapping from Ollama to Claude takes about 30 seconds. The workflow structure stays identical. Only the Chat Model sub-node changes.
- Click the AI Agent node
- Delete the Ollama Chat Model sub-node
- Add an Anthropic Chat Model sub-node instead
- Select your Anthropic credential
- Choose the model (e.g.,
claude-sonnet-4-6) - Run the same test curl command
Side-by-side comparison:
| Aspect | Ollama (llama3.2:3b, CPU) | Claude (claude-sonnet-4-6) |
|---|---|---|
| Response time | 3-8 seconds | 0.5-1.5 seconds |
| JSON formatting | Occasionally adds text outside JSON | Follows JSON-only instruction reliably |
| Classification accuracy | Good for clear-cut cases | Better with ambiguous or nuanced content |
| Cost per request | Free | Per-token (see Anthropic pricing) |
| Data privacy | Content never leaves your VPS | Content sent to Anthropic's API |
The output format is identical. Your IF node and routing logic don't need changes. This makes it practical to use Ollama for development and testing, then switch to Claude for production workflows that need speed or better reasoning.
When should you use a local LLM vs a cloud API?
Use Ollama when data privacy matters, you want zero API costs, or you process batch jobs where latency is acceptable. Use Claude when you need fast responses, strong reasoning, or handle real-time user-facing workflows. You can swap models in n8n by changing one sub-node, so this isn't a permanent decision.
Choose Ollama when:
- Sensitive data can't leave your infrastructure (medical records, financial data, internal documents)
- You run batch processing where a few seconds per request is fine (nightly email digests, log analysis)
- You want predictable costs. After the VPS cost, inference is free no matter how many requests
- You're prototyping and iterating quickly without worrying about API bills
Choose Claude when:
- You need sub-second responses for user-facing features (chatbots, real-time classification)
- The task requires strong reasoning or nuanced understanding (legal document analysis, complex summarization)
- You process low volume but high-value requests where quality matters more than cost
- You need very long context windows (Claude Sonnet supports up to 1M tokens)
Hybrid approach: Many production setups use both. Route simple, high-volume tasks to Ollama. Route complex, low-volume tasks to Claude. n8n's IF node can inspect the incoming data and choose the right path.
How do you add RAG with Qdrant to your n8n AI workflow?
RAG (Retrieval-Augmented Generation) lets your AI workflow search through your own documents before generating a response. Add Qdrant as a vector store to your Docker Compose stack. n8n has native Qdrant nodes that connect to the AI Agent as a tool.
Add Qdrant to your docker-compose.yml:
qdrant:
image: qdrant/qdrant:latest
container_name: qdrant
restart: unless-stopped
volumes:
- qdrant_data:/qdrant/storage
networks:
- n8n-network
environment:
- QDRANT__SERVICE__API_KEY=${QDRANT_API_KEY}
volumes:
qdrant_data:
Generate a strong API key for Qdrant:
openssl rand -base64 32
Add the key to your .env file:
QDRANT_API_KEY=<your-generated-key>
Set restrictive permissions on the .env file:
chmod 600 .env
Start Qdrant:
docker compose up -d qdrant
In n8n, you can now:
- Add a Qdrant Vector Store node as a tool for the AI Agent
- Create an Ollama Embeddings sub-node (or use another embedding model) to vectorize your documents
- Build an ingestion workflow that loads documents into Qdrant
- The AI Agent will search Qdrant for relevant context before generating its response
RAG is a deep topic. For a full walkthrough, see our guide on self-hosting AI agents on a VPS. The n8n self-hosted AI starter kit bundles n8n, Ollama, Qdrant, and PostgreSQL in a single Docker Compose file. It's a good reference for RAG architecture, though it's designed for proof-of-concept rather than production.
What are the resource requirements for AI workflows on a VPS?
Running Ollama on a VPS requires enough RAM for the model plus the OS and other services. n8n itself uses about 300-500 MB. Docker overhead adds another 200-300 MB. The rest goes to Ollama and your chosen model. CPU count affects inference speed but not whether the model loads.
Resource planning table:
| VPS size | Available for Ollama | Best model fit | Workflow type |
|---|---|---|---|
| 4 GB RAM | ~2.5 GB | llama3.2:3b (tight) | Light classification only |
| 8 GB RAM | ~5-6 GB | llama3.2:3b or mistral:7b | General automation |
| 16 GB RAM | ~12-13 GB | llama3.1:8b or qwen2.5:14b | Complex agents, RAG |
| 32 GB RAM | ~28 GB | Multiple models loaded | Production multi-agent |
If you only use the Claude API path (no Ollama), a 4 GB VPS is enough for n8n. The LLM runs on Anthropic's infrastructure.
Monitor resource usage after deploying:
docker stats --no-stream
This shows CPU and memory consumption per container. Watch the Ollama container during inference. Memory usage spikes when processing a request and drops afterward.
Check Ollama logs for performance issues:
docker compose logs ollama --tail 50
If you see out-of-memory errors, switch to a smaller model or increase your VPS RAM.
Troubleshooting
n8n can't connect to Ollama:
- Verify both containers are on the same Docker network:
docker network inspect n8n-network - Check the Ollama credential base URL is
http://ollama:11434(notlocalhost) - Check Ollama is running:
docker compose exec ollama ollama ps - Check
OLLAMA_HOST=0.0.0.0is set in the Ollama environment
Ollama is slow or unresponsive:
- Check memory:
docker stats ollama - Try a smaller model. The 3B models are significantly faster than 8B on CPU.
- If inference takes more than 30 seconds, the model may be too large for your RAM. Ollama swaps to disk, which kills performance.
Claude API returns errors:
- Verify your API key in n8n credentials (re-enter it if needed)
- Check outbound HTTPS from your VPS:
curl -I https://api.anthropic.com - Look at the n8n execution log for the specific error message. Common issues: expired key, rate limit, insufficient credits.
AI Agent returns garbled or non-JSON output:
- Improve the system prompt. Be explicit about the output format.
- Claude follows formatting instructions more reliably than small local models.
- Add an Output Parser sub-node (Structured Output Parser) to enforce JSON schema.
- With Ollama, larger models (8B+) follow instructions better than 3B models.
Where are the logs?
# n8n logs
docker compose logs n8n --tail 100
# Ollama logs
docker compose logs ollama --tail 100
# All services
docker compose logs --tail 50
n8n also keeps execution history in its web UI under Executions in the left sidebar. Each execution shows the input/output of every node, which is the fastest way to debug workflow issues.
Copyright 2026 Virtua.Cloud. All rights reserved. This content is original work by the Virtua.Cloud team. Reproduction, republication, or redistribution without written permission is prohibited.