n8n has native AI nodes built on LangChain. You can plug in a local model through Ollama or a cloud model through the Claude API. Both options connect the same way: as sub-nodes inside n8n's AI Agent node. This tutorial sets up both paths on a self-hosted VPS, builds a practical content classification workflow, and shows you how to swap between local and cloud inference by changing a single node.

If you haven't installed n8n yet, start with our guide on installing n8n with Docker Compose on a VPS.

What do you need before adding AI to n8n?

You need a running n8n instance on a VPS with Docker Compose, a domain with TLS, and SSH access. For Ollama, you need at least 8 GB of RAM on your VPS to run small models (7-8B parameters). For Claude, you need an Anthropic API key. No GPU is required for Ollama on CPU, but inference will be slower.

Prerequisites checklist:

A VPS with at least 8 GB RAM (4 vCPU recommended). A Virtua Cloud VCS-8 works well.
n8n running via Docker Compose (see the n8n installation guide)
SSH access as a non-root user with sudo
A domain pointing to your VPS with TLS configured (see our reverse proxy and auth guide)
For the Claude path: an Anthropic Console account with an API key

How do you add Ollama to your n8n Docker Compose stack?

Add Ollama as a service in your existing Docker Compose file, on the same network as n8n. n8n reaches Ollama through Docker's internal DNS using the service name as hostname. No API key needed. Ollama stays on the internal network only, never exposed to the internet.

Open your existing docker-compose.yml and add the ollama service:

services:
  # ... your existing n8n service ...

  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    volumes:
      - ollama_data:/root/.ollama
    networks:
      - n8n-network
    environment:
      - OLLAMA_HOST=0.0.0.0
    deploy:
      resources:
        limits:
          memory: 6G
        reservations:
          memory: 4G
    healthcheck:
      test: ["CMD", "ollama", "ps"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  # ... your existing volumes ...
  ollama_data:

Key points about this config:

No published ports. Ollama listens on 11434 inside the container, but we don't map it to the host. Only containers on n8n-network can reach it. This prevents anyone on the internet from using your Ollama instance.
OLLAMA_HOST=0.0.0.0 tells Ollama to listen on all interfaces inside the container. Without this, it binds to localhost only and n8n can't reach it from another container.
Memory limits prevent Ollama from consuming all VPS RAM. Adjust based on your model size.
Health check uses ollama ps to query the server. Docker restarts the container if Ollama becomes unresponsive. The Ollama image doesn't include curl, so we use the built-in CLI instead.

If your VPS has an NVIDIA GPU and you've installed the NVIDIA Container Toolkit, add GPU passthrough:

  ollama:
    # ... same as above, plus:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Start the updated stack:

docker compose up -d ollama

Verify Ollama is running:

docker compose logs ollama --tail 20

The output shows Listening on [::]:11434. Now pull a model:

docker compose exec ollama ollama pull llama3.2:3b

This downloads the 3B parameter Llama 3.2 model (about 2 GB). For an 8 GB VPS, this is the safest starting point. See the model sizing table below for other options.

Verify the model loaded:

docker compose exec ollama ollama list

Test inference directly:

docker compose exec ollama ollama run llama3.2:3b "Say hello in one sentence"

You should get a response in a few seconds. If this works, Ollama is ready for n8n.

Which Ollama model should you pick for your VPS?

The right model depends on your available RAM. A rule of thumb: 1 billion parameters requires roughly 1 GB of RAM at Q4 quantization. On a VPS without a GPU, the model runs on CPU, which is slower but functional for batch and background workflows.

Model	Parameters	Disk size	RAM needed	Best for
llama3.2:3b	3B	~2 GB	4 GB	Light tasks, limited RAM
llama3.1:8b	8B	~4.9 GB	8 GB	General purpose, 128K context
mistral:7b	7B	~4.4 GB	7 GB	Fast inference, European model
qwen2.5:7b	7B	~4.7 GB	8 GB	Multilingual, coding tasks
gemma3:4b	4B	~3.3 GB	5 GB	Multimodal, good quality/size ratio

On a 4 vCPU, 8 GB RAM VPS (like the Virtua Cloud VCS-8), llama3.2:3b runs with headroom for n8n and the OS. The 7-8B models fit but leave less room. For those, consider a 16 GB VPS.

Pull your chosen model before continuing. All steps below work with any model in the table.

How do you connect n8n to the Claude API?

Create an Anthropic credential in n8n with your API key. Then use the Anthropic Chat Model sub-node inside any AI Agent workflow. n8n handles the API calls natively. No HTTP Request node needed.

Generate an API key

Go to Anthropic Console > Settings > API Keys
Click Create Key
Name it something identifiable like n8n-vps
Copy the key immediately. You won't see it again.

Store the key securely. Don't paste it in files on disk. You'll enter it directly in n8n's credential manager, which encrypts it.

Add the credential in n8n

In n8n, go to Credentials in the left sidebar
Click Add Credential
Search for Anthropic API
Paste your API key
Click Save

n8n tests the connection on save. A "Connection tested successfully" message appears. If it fails, check that your API key is valid and your VPS can reach https://api.anthropic.com (outbound HTTPS must not be blocked by your firewall).

How does n8n's AI Agent system work?

n8n's AI capabilities are built on LangChain. The architecture uses two types of nodes: root nodes (also called cluster nodes) that define the agent's behavior, and sub-nodes that provide specific capabilities like the language model, memory, and tools. Understanding this structure helps you build and debug workflows.

Root nodes:

AI Agent node: the main orchestrator. It receives input, sends it to the language model, can use tools, and returns a response. This is what you'll use most.
Basic LLM Chain: simpler than the Agent. Takes input, sends to LLM, returns output. No tool use, no reasoning loop.

Sub-nodes (attach to root nodes):

Chat Model (Ollama Chat Model or Anthropic Chat Model): the LLM that generates responses
Memory (Window Buffer Memory, etc.): stores conversation history
Tools (HTTP Request, Code, Calculator, etc.): actions the agent can take
Output Parser: structures the LLM response into usable data

The important point: swapping between Ollama and Claude means swapping one sub-node. The rest of the workflow stays identical. This is why n8n's architecture works well for testing both local and cloud inference.

How do you build an AI classification workflow in n8n?

This workflow receives content via a webhook, sends it to an LLM for classification and summarization, then routes the result based on urgency. It's a practical pattern for email triage, support ticket routing, or content moderation. We'll build it first with Ollama, then swap to Claude.

Step 1: Create the webhook trigger

Create a new workflow in n8n
Add a Webhook node as the trigger
Set the HTTP method to POST
Set the path to something like classify
Under Response, select "Respond to Webhook" (we'll add that node later)
Save and note the test webhook URL

The webhook will receive JSON like this:

{
  "title": "Server disk full alert",
  "body": "Production server db-01 has reached 95% disk usage. Immediate action required.",
  "source": "monitoring"
}

Step 2: Add the AI Agent node with Ollama

Add an AI Agent node after the webhook
In the Agent's settings, set the system prompt:

You are a content classifier. For each incoming message, respond with valid JSON only:
{
  "urgency": "high" or "low",
  "category": "infrastructure" or "security" or "billing" or "general",
  "summary": "one sentence summary"
}
Do not include any text outside the JSON object.

Connect an Ollama Chat Model sub-node to the Agent:
- Click the Chat Model connector on the AI Agent
- Select Ollama Chat Model
- In the credentials dropdown, click Create New
- Set the Base URL to http://ollama:11434 (the Docker service name)
- Save the credential
- Select your model (e.g., llama3.2:3b)
Connect the webhook output to the AI Agent input. Map the message text using an expression:

Title: {{ $json.title }}
Body: {{ $json.body }}
Source: {{ $json.source }}

Step 3: Parse and route the response

Add an IF node after the AI Agent
Set the condition: check if the AI response contains "urgency": "high" or parse the JSON and check the urgency field
True branch (high urgency): add a notification node (Slack, email, or HTTP Request to your alerting endpoint)
False branch (low urgency): add a different action (log to a spreadsheet, send a digest email, etc.)
Add a Respond to Webhook node at the end of each branch to return the classification result

Step 4: Test the workflow

Activate the workflow for testing. Send a test request:

curl -X POST https://your-n8n-domain.com/webhook-test/classify \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Server disk full alert",
    "body": "Production server db-01 reached 95% disk usage. Immediate action required.",
    "source": "monitoring"
  }'

Check the n8n execution history. The result shows:

The webhook received the payload
The AI Agent sent it to Ollama
Ollama returned a JSON classification
The IF node routed it to the correct branch

Look at the execution time on the AI Agent node. With Ollama on CPU (llama3.2:3b), expect 3-8 seconds depending on your VPS specs. Fine for background automation, too slow for real-time user-facing responses.

How does the same workflow run on Ollama vs Claude?

Swapping from Ollama to Claude takes about 30 seconds. The workflow structure stays identical. Only the Chat Model sub-node changes.

Click the AI Agent node
Delete the Ollama Chat Model sub-node
Add an Anthropic Chat Model sub-node instead
Select your Anthropic credential
Choose the model (e.g., claude-sonnet-4-6)
Run the same test curl command

Side-by-side comparison:

Aspect	Ollama (llama3.2:3b, CPU)	Claude (claude-sonnet-4-6)
Response time	3-8 seconds	0.5-1.5 seconds
JSON formatting	Occasionally adds text outside JSON	Follows JSON-only instruction reliably
Classification accuracy	Good for clear-cut cases	Better with ambiguous or nuanced content
Cost per request	Free	Per-token (see Anthropic pricing)
Data privacy	Content never leaves your VPS	Content sent to Anthropic's API

The output format is identical. Your IF node and routing logic don't need changes. This makes it practical to use Ollama for development and testing, then switch to Claude for production workflows that need speed or better reasoning.

When should you use a local LLM vs a cloud API?

Use Ollama when data privacy matters, you want zero API costs, or you process batch jobs where latency is acceptable. Use Claude when you need fast responses, strong reasoning, or handle real-time user-facing workflows. You can swap models in n8n by changing one sub-node, so this isn't a permanent decision.

Choose Ollama when:

Sensitive data can't leave your infrastructure (medical records, financial data, internal documents)
You run batch processing where a few seconds per request is fine (nightly email digests, log analysis)
You want predictable costs. After the VPS cost, inference is free no matter how many requests
You're prototyping and iterating quickly without worrying about API bills

Choose Claude when:

You need sub-second responses for user-facing features (chatbots, real-time classification)
The task requires strong reasoning or nuanced understanding (legal document analysis, complex summarization)
You process low volume but high-value requests where quality matters more than cost
You need very long context windows (Claude Sonnet supports up to 1M tokens)

Hybrid approach: Many production setups use both. Route simple, high-volume tasks to Ollama. Route complex, low-volume tasks to Claude. n8n's IF node can inspect the incoming data and choose the right path.

How do you add RAG with Qdrant to your n8n AI workflow?

RAG (Retrieval-Augmented Generation) lets your AI workflow search through your own documents before generating a response. Add Qdrant as a vector store to your Docker Compose stack. n8n has native Qdrant nodes that connect to the AI Agent as a tool.

Add Qdrant to your docker-compose.yml:

  qdrant:
    image: qdrant/qdrant:latest
    container_name: qdrant
    restart: unless-stopped
    volumes:
      - qdrant_data:/qdrant/storage
    networks:
      - n8n-network
    environment:
      - QDRANT__SERVICE__API_KEY=${QDRANT_API_KEY}

volumes:
  qdrant_data:

Generate a strong API key for Qdrant:

openssl rand -base64 32

Add the key to your .env file:

QDRANT_API_KEY=<your-generated-key>

Set restrictive permissions on the .env file:

chmod 600 .env

Start Qdrant:

docker compose up -d qdrant

In n8n, you can now:

Add a Qdrant Vector Store node as a tool for the AI Agent
Create an Ollama Embeddings sub-node (or use another embedding model) to vectorize your documents
Build an ingestion workflow that loads documents into Qdrant
The AI Agent will search Qdrant for relevant context before generating its response

RAG is a deep topic. For a full walkthrough, see our guide on self-hosting AI agents on a VPS. The n8n self-hosted AI starter kit bundles n8n, Ollama, Qdrant, and PostgreSQL in a single Docker Compose file. It's a good reference for RAG architecture, though it's designed for proof-of-concept rather than production.

What are the resource requirements for AI workflows on a VPS?

Running Ollama on a VPS requires enough RAM for the model plus the OS and other services. n8n itself uses about 300-500 MB. Docker overhead adds another 200-300 MB. The rest goes to Ollama and your chosen model. CPU count affects inference speed but not whether the model loads.

Resource planning table:

VPS size	Available for Ollama	Best model fit	Workflow type
4 GB RAM	~2.5 GB	llama3.2:3b (tight)	Light classification only
8 GB RAM	~5-6 GB	llama3.2:3b or mistral:7b	General automation
16 GB RAM	~12-13 GB	llama3.1:8b or qwen2.5:14b	Complex agents, RAG
32 GB RAM	~28 GB	Multiple models loaded	Production multi-agent

If you only use the Claude API path (no Ollama), a 4 GB VPS is enough for n8n. The LLM runs on Anthropic's infrastructure.

Monitor resource usage after deploying:

docker stats --no-stream

This shows CPU and memory consumption per container. Watch the Ollama container during inference. Memory usage spikes when processing a request and drops afterward.

Check Ollama logs for performance issues:

docker compose logs ollama --tail 50

If you see out-of-memory errors, switch to a smaller model or increase your VPS RAM.

Troubleshooting

n8n can't connect to Ollama:

Verify both containers are on the same Docker network: docker network inspect n8n-network
Check the Ollama credential base URL is http://ollama:11434 (not localhost)
Check Ollama is running: docker compose exec ollama ollama ps
Check OLLAMA_HOST=0.0.0.0 is set in the Ollama environment

Ollama is slow or unresponsive:

Check memory: docker stats ollama
Try a smaller model. The 3B models are significantly faster than 8B on CPU.
If inference takes more than 30 seconds, the model may be too large for your RAM. Ollama swaps to disk, which kills performance.

Claude API returns errors:

Verify your API key in n8n credentials (re-enter it if needed)
Check outbound HTTPS from your VPS: curl -I https://api.anthropic.com
Look at the n8n execution log for the specific error message. Common issues: expired key, rate limit, insufficient credits.

AI Agent returns garbled or non-JSON output:

Improve the system prompt. Be explicit about the output format.
Claude follows formatting instructions more reliably than small local models.
Add an Output Parser sub-node (Structured Output Parser) to enforce JSON schema.
With Ollama, larger models (8B+) follow instructions better than 3B models.

Where are the logs?

# n8n logs
docker compose logs n8n --tail 100

# Ollama logs
docker compose logs ollama --tail 100

# All services
docker compose logs --tail 50

n8n also keeps execution history in its web UI under Executions in the left sidebar. Each execution shows the input/output of every node, which is the fastest way to debug workflow issues.