Self-Host AI Agents on a VPS: Complete Guide

AI agents write code, manage servers, automate workflows, and talk to external services on your behalf. Running them on managed platforms means paying subscriptions, handing your data to third parties, and accepting their rate limits.

A VPS changes that equation. Your agents run 24/7 on hardware you control. Your data stays on your server. Nobody throttles your API calls.

What are AI agents and how do they work?

An AI agent is an autonomous program that uses a large language model to decide what to do and then do it. Unlike a chatbot that answers one prompt at a time, an agent runs continuously. It keeps context across tasks, calls external tools, reads and writes files, runs shell commands, and chains actions together without waiting for human approval at each step.

In practice, an agent works in a loop:

Observe -- read input from a user message, a file change, a webhook, or a scheduled trigger
Reason -- the LLM decides what action to take given the current context and available tools
Act -- execute that action (run a command, call an API, edit a file, send a message)
Evaluate -- check the result and decide whether the task is complete or needs another iteration

The LLM itself typically runs remotely via API (Anthropic, OpenAI, or a self-hosted model). What runs on your VPS is the agent harness: the code that manages the loop, tool execution, memory, and communication channels. This is why most agents need surprisingly little local compute. The heavy inference happens elsewhere.

Some agents also support local models through Ollama or vLLM. In that case, your VPS needs a GPU or significantly more RAM. But for most self-hosting scenarios, a 2-4 GB VPS handles the agent harness while the LLM provider handles inference.

Why self-host AI agents instead of using managed platforms?

Self-hosting on a VPS costs less than managed subscriptions, keeps your data on your server, removes rate limits, and runs 24/7 without depending on your laptop. You pick which models to call, which tools to install, and how the agent behaves. Managed platforms decide all of that for you.

Option	Pricing model	What you get
ChatGPT Plus	Flat monthly subscription. See OpenAI pricing	Web chat, limited agent features, OpenAI controls your data
Claude Pro	Flat monthly subscription. See Anthropic pricing	Web/desktop chat, usage caps, data processed by Anthropic
Claude Max	Tiered monthly subscription. See Anthropic pricing	Higher limits, still cloud-only
Managed agent platform	Per-seat or usage-based, varies by vendor	Vendor lock-in, opaque infrastructure, data leaves your control
VPS + API keys	From 5 €/mo + pay-as-you-go API usage	Full control, your data stays on your server, no rate limits beyond API tier

You still pay for LLM API calls, but you control which model you call, how often, and what data you send. No middleman markup.

Beyond cost: why self-hosting matters

Data sovereignty. Your prompts, agent memory, and outputs never leave your server. For anyone handling client data, GDPR-regulated information, or proprietary code, this is not optional. Managed platforms process your data on their infrastructure under their terms. Secure Your AI Agent Server: Sandboxing, Firewalls, and Monitoring

No rate limits. Managed platforms throttle heavy users. On your VPS, the only limits are your API tier with the LLM provider and your server resources.

24/7 uptime. Agents that monitor, automate, or respond to events need to run continuously. A VPS stays on when your laptop sleeps.

Full customization. Install any tool or library you want. No waiting for a platform to add support for the MCP server you need.

What types of AI agents can you self-host?

The agent world in 2026 splits into four categories: coding agents, general-purpose assistants, workflow automation tools, and custom agents you build yourself.

Agent	Purpose	Min RAM	Needs GPU?	Protocol support	Difficulty
Claude Code	Coding, refactoring, git workflows	2 GB	No	MCP (native)	Low
OpenClaw	General assistant, messaging, automation	4 GB (8 GB with browser)	No	MCP, custom skills	Medium
Hermes Agent	Persistent memory assistant	2 GB	No	MCP, agentskills.io	Low
n8n	Workflow automation with AI nodes	2 GB (4 GB recommended)	No	HTTP, webhooks	Medium
Custom agent	Whatever you build	Varies	Optional	Whatever you implement	High

What is Claude Code and why run it on a VPS?

Claude Code is Anthropic's agentic coding tool. It lives in your terminal, reads your entire codebase, edits files, runs commands, manages git workflows, and spawns sub-agents for parallel tasks. It uses Claude Opus 4.6 as its reasoning engine and scores 80.8% on SWE-bench Verified.

Running Claude Code on a VPS means your coding agent works around the clock. It can run CI pipelines, monitor repositories, handle scheduled refactoring tasks, and respond to webhooks. You keep your codebase on a server you control instead of pulling it through a managed platform.

Claude Code supports MCP natively. You can connect it to databases, APIs, file systems, and custom tools through MCP servers running on the same VPS. It also supports agent teams: multiple Claude Code sessions coordinating on a shared project, with one session acting as team lead.

Resource-wise, Claude Code is lightweight. The agent harness needs about 2 GB of RAM. All inference happens via Anthropic's API. Run Claude Code on a VPS: Install, Secure, and Persist Sessions

What is OpenClaw?

OpenClaw (formerly Clawdbot/Moltbot) is the most popular open-source AI agent, with over 250,000 GitHub stars as of March 2026. Created by Peter Steinberger, it is a general-purpose assistant that connects to messaging platforms like Signal, Telegram, Discord, and WhatsApp.

Unlike Claude Code, which focuses on coding, OpenClaw acts as a personal assistant. It manages files, sends emails, controls APIs, automates workflows, and browses the web. Supported LLM backends include Claude, GPT, DeepSeek, and local models through Ollama.

Self-hosting OpenClaw requires more resources than Claude Code. The minimum is 2 vCPUs and 4 GB of RAM. If you enable browser automation (Playwright), plan for 8 GB because each browser instance consumes 1-2 GB on its own. Storage should be NVMe SSD: OpenClaw is I/O-sensitive during Docker operations.

Security warning: OpenClaw has faced serious security issues. Palo Alto Networks identified a "lethal trifecta" of risks: access to private data, exposure to untrusted content, and the ability to perform external communications while retaining memory. In early 2026, Koi Security audited 2,857 skills on ClawHub and found 341 malicious ones, roughly one in eight packages. Treat OpenClaw's skill ecosystem as untrusted. Audit every skill before installing it, and run OpenClaw in a sandboxed environment. Deploy OpenClaw Securely on a VPS

What is Hermes Agent?

Hermes Agent is an open-source AI agent built by Nous Research, released in February 2026. What sets it apart is persistent memory: Hermes remembers your preferences, projects, and environment across sessions. When it solves a hard problem, it writes a reusable skill document so it never forgets the solution.

Hermes runs on a €5/month VPS. It ships with 40+ built-in tools and connects to Telegram, Discord, Slack, WhatsApp, Signal, and CLI through a single gateway process. All data stays on your machine. No telemetry, no tracking.

Skills follow the agentskills.io open standard, so they are portable and searchable across agents. The longer Hermes runs, the more capable it becomes. MIT licensed. Self-Host Hermes Agent on a VPS

Workflow automation: n8n with AI nodes

n8n is not an AI agent in itself, but it becomes one when you add AI nodes. You can build workflows that call LLMs, process responses, and trigger actions based on AI decisions. Think of it as the glue layer: connect your AI agent to 400+ integrations without writing custom code for each one.

Self-hosting n8n requires 2 vCPUs and 4 GB of RAM for production use. Use PostgreSQL instead of SQLite for anything beyond testing. If you run a vector database (Qdrant, Pinecone) alongside n8n, add another 2-4 GB of RAM.

How do agent protocols work? (MCP, A2A, ANP)

Three protocols define how AI agents communicate in 2026. They are not competing standards but complementary layers. Each solves a different problem, and knowing what they do helps you plan your self-hosted setup.

Protocol	Created by	What it does	When you need it
MCP (Model Context Protocol)	Anthropic	Connects an agent to tools and data sources	Always. This is how your agent reads files, queries databases, calls APIs
A2A (Agent-to-Agent)	Google (now Linux Foundation)	Lets agents delegate tasks to other agents	When you run multiple agents that need to collaborate
ANP (Agent Network Protocol)	Community/AAIF	Agent discovery and routing across networks	When agents need to find and authenticate with agents outside your server

MCP: agent-to-tools

MCP is a JSON-RPC protocol that standardizes how an agent accesses external capabilities. Instead of hardcoding API calls, you run MCP servers that expose tools (read a database, fetch a URL, execute a query) and the agent connects to them as a client.

MCP has crossed 97 million monthly SDK downloads (Python + TypeScript combined) as of February 2026. Every major AI provider supports it: Anthropic, OpenAI, Google, Microsoft, Amazon.

On a self-hosted VPS, MCP servers run as local processes. Your agent connects to them over stdio or HTTP. You control which tools are available, what permissions they have, and what data they can access. No third-party servers involved. Build and Self-Host a Custom MCP Server on a VPS

A2A: agent-to-agent

A2A enables peer-to-peer task delegation between agents. One agent can ask another to perform a task, track its progress, and receive the result. Google created it in April 2025, donated it to the Linux Foundation in June 2025, and in December 2025 the Agentic AI Foundation (AAIF) became its permanent home alongside MCP.

You need A2A when you run multiple agents with different specializations. For example: a coding agent that delegates documentation tasks to a writing agent, or a monitoring agent that triggers a deployment agent when tests pass.

ANP: agent discovery

ANP handles discovery and routing. It lets agents find each other across organizational boundaries, authenticate, and establish communication channels. Think of it as DNS for agents.

For most self-hosted setups with agents running on a single VPS, you will not need ANP yet. It becomes relevant when your agents need to interact with agents on other servers or in other organizations. AI Agent Protocols Explained: MCP, A2A, and ANP

What server specs do AI agents need?

Your VPS runs the agent harness, tools, and whatever local services you add (databases, message queues, web servers). The LLM runs remotely.

Setup	vCPU	RAM	Storage	Monthly cost (Virtua)
Single agent (Claude Code or Hermes)	1	2 GB	40 GB SSD	€12
OpenClaw (text only)	2	4 GB	80 GB NVMe	€28
OpenClaw + browser automation	4	8 GB	160 GB NVMe	€56
Multiple agents + database	4	8 GB	160 GB SSD	€48
n8n + vector DB + agent	4	8 GB	160 GB NVMe	€56
Full stack (3+ agents, DB, monitoring)	6	12 GB	240 GB NVMe	€84

When do you need a GPU? Only if you run a local LLM (Ollama, vLLM) instead of using an API. For models like Llama 3 or Mistral, you need at least 16 GB of VRAM. Most self-hosted agent setups do not need a GPU because the inference happens at the API provider.

Storage matters. Use SSD or NVMe. Agents that use Docker (OpenClaw, n8n) are I/O-sensitive during container operations. HDD causes noticeable lag on container starts and workspace operations.

Leave headroom. Keep at least 30% of RAM free under typical load. Agents can spike during complex reasoning chains or when processing large context windows. If your VPS starts swapping, agent response times degrade fast.

How do you secure a self-hosted AI agent?

AI agents are not normal applications. They execute arbitrary code based on LLM output. An agent with shell access can do anything your user account can do. A prompt injection attack can turn your coding agent into a data exfiltration tool. That reality shapes every decision in this section.

Treat agents as untrusted code

The LLM that drives your agent processes external input: user messages, file contents, API responses, web pages. Any of these can contain prompt injection payloads. Assume that at some point, your agent will try to do something it should not.

Principle of least privilege. Run each agent as a dedicated system user with minimal permissions. Never run agents as root. Give the agent user access only to the directories and commands it needs.

# Create a dedicated user for your agent
sudo useradd -r -m -s /bin/bash agent-claude
sudo chmod 700 /home/agent-claude

Sandbox agent execution

A standard Docker container is not a security boundary. Containers share the host kernel, and a motivated attacker (or a confused LLM) can escape a permissive container. For real isolation:

MicroVMs (Firecracker, Kata Containers): Each agent gets its own kernel. Strongest isolation. Best for agents that execute untrusted code.
gVisor: Intercepts syscalls in user space. Lighter than microVMs but stronger than bare containers. Good middle ground.
Hardened containers: Acceptable for trusted agents only. Use --read-only, --no-new-privileges, drop all capabilities, mount minimal volumes.

Network isolation

Agents should not have unrestricted network access. An agent that can reach any IP can exfiltrate data or participate in attacks.

# Allow only the specific API endpoints your agent needs
sudo ufw default deny outgoing
sudo ufw allow out to any port 443 proto tcp  # HTTPS for API calls
sudo ufw allow out to any port 53 proto udp   # DNS
sudo ufw enable

Refine this further by restricting outbound connections to specific IP ranges for your LLM provider. Block everything else.

Manage secrets properly

Never hardcode API keys in agent configuration files. Use environment files with restricted permissions.

# Create a secrets file
sudo mkdir -p /etc/agent-claude
echo "ANTHROPIC_API_KEY=sk-ant-..." | sudo tee /etc/agent-claude/env > /dev/null
sudo chmod 600 /etc/agent-claude/env
sudo chown agent-claude:agent-claude /etc/agent-claude/env

Reference this in a systemd unit with EnvironmentFile=/etc/agent-claude/env. The key never appears in process listings or config files readable by other users.

Monitor and log everything

Agents that run autonomously can behave unexpectedly. Log all agent actions and review them regularly.

# Watch agent logs in real time
journalctl -u agent-claude -f

# Check for unusual outbound connections
ss -tnp | grep agent-claude

Set up alerts for unusual patterns: high CPU usage, unexpected network connections, rapid file system changes, or agents running commands outside their normal scope.

For a complete security hardening walkthrough, see Secure Your AI Agent Server: Sandboxing, Firewalls, and Monitoring.

How to get started with your first self-hosted agent

Pick one agent and get it running. Do not try to set up the entire stack at once.

If you want a coding assistant: Start with Claude Code. Install via npm, authenticate, and you have a working agent in minutes. Needs the least resources of any option here. Run Claude Code on a VPS: Install, Secure, and Persist Sessions

If you want a personal assistant on your messaging apps: Deploy OpenClaw. It takes more setup (Docker, messaging platform configuration, skill selection) but gives you the most versatile general-purpose agent. Budget 4-8 hours for initial setup. Deploy OpenClaw Securely on a VPS

If you want a persistent memory agent: Try Hermes. Single-command install, MIT licensed, and it gets better the longer it runs. Self-Host Hermes Agent on a VPS

If you want AI-powered workflow automation: Set up n8n with AI nodes. Connect your existing tools and services through visual workflows. Best for non-coding automation tasks.

Your first steps

Regardless of which agent you choose:

Provision a VPS. Start with 4 GB RAM if you are unsure. You can resize later.
Secure the server. SSH keys only, firewall enabled, non-root user created. Do this before installing anything else.
Install the agent. Follow the specific tutorial for your chosen agent.
Restrict permissions. Run the agent as a dedicated user. Limit network access. Store secrets in protected files.
Test from outside. Verify the agent works by connecting from your local machine, not just from the server itself.
Set up monitoring. At minimum, watch logs with journalctl. Ideally, set up resource alerts.

Every tutorial linked in this guide includes verification at each stage.

Related Articles