Self-Host AI Agents on a VPS
A practical guide to running AI agents like Claude Code, OpenClaw, and Hermes on your own VPS. Covers agent types, infrastructure sizing, communication protocols, security, and cost.
AI agents write code, manage servers, automate workflows, and talk to external services on your behalf. Running them on managed platforms means paying subscriptions, handing your data to third parties, and accepting their rate limits.
A VPS changes that equation. Your agents run 24/7 on hardware you control. Your data stays on your server. Nobody throttles your API calls.
This guide covers what AI agents are, which ones you can self-host, what hardware they need, how they communicate, and how to lock them down. Each section links to a hands-on tutorial.
What are AI agents and how do they work?
An AI agent is an autonomous program that uses a large language model to decide what to do and then do it. Unlike a chatbot that answers one prompt at a time, an agent runs continuously. It keeps context across tasks, calls external tools, reads and writes files, runs shell commands, and chains actions together without waiting for human approval at each step.
In practice, an agent works in a loop:
- Observe -- read input from a user message, a file change, a webhook, or a scheduled trigger
- Reason -- the LLM decides what action to take given the current context and available tools
- Act -- execute that action (run a command, call an API, edit a file, send a message)
- Evaluate -- check the result and decide whether the task is complete or needs another iteration
The LLM itself typically runs remotely via API (Anthropic, OpenAI, or a self-hosted model). What runs on your VPS is the agent harness: the code that manages the loop, tool execution, memory, and communication channels. This is why most agents need surprisingly little local compute. The heavy inference happens elsewhere.
Some agents also support local models through Ollama or vLLM. In that case, your VPS needs a GPU or significantly more RAM. But for most self-hosting scenarios, a 2-4 GB VPS handles the agent harness while the LLM provider handles inference.
Why self-host AI agents instead of using managed platforms?
Self-hosting on a VPS costs less ($5-14/month base vs $20-50+/month subscriptions), keeps your data on your server, removes rate limits, and runs 24/7 without depending on your laptop. You pick which models to call, which tools to install, and how the agent behaves. Managed platforms decide all of that for you.
Here is how the costs compare:
| Option | Monthly cost | What you get |
|---|---|---|
| ChatGPT Plus | $20 | Web chat, limited agent features, OpenAI controls your data |
| Claude Pro | $20 | Web/desktop chat, usage caps, data processed by Anthropic |
| Claude Max | $100-200 | Higher limits, still cloud-only |
| Managed agent platform | $30-50+ | Vendor lock-in, opaque infrastructure, data leaves your control |
| VPS + API keys | $5-14/mo + API usage | Full control, your data stays on your server, no rate limits beyond API tier |
The VPS cost is the base. You still pay for LLM API calls, but you control exactly which model you call, how often, and what data you send. There is no middleman markup.
Beyond cost: why self-hosting matters
Data sovereignty. Your prompts, agent memory, and outputs never leave your server. For anyone handling client data, GDPR-regulated information, or proprietary code, this is not optional. Managed platforms process your data on their infrastructure under their terms.
No rate limits. Managed platforms throttle heavy users. On your VPS, the only limits are your API tier with the LLM provider and your server resources.
24/7 uptime. Agents that monitor, automate, or respond to events need to run continuously. A VPS stays on when your laptop sleeps.
Full customization. Install any tool or library you want. No waiting for a platform to add support for the MCP server you need.
What types of AI agents can you self-host?
The agent world in 2026 splits into four categories: coding agents, general-purpose assistants, workflow automation tools, and custom agents you build yourself.
| Agent | Purpose | Min RAM | Needs GPU? | Protocol support | Difficulty |
|---|---|---|---|---|---|
| Claude Code | Coding, refactoring, git workflows | 2 GB | No | MCP (native) | Low |
| OpenClaw | General assistant, messaging, automation | 4 GB (8 GB with browser) | No | MCP, custom skills | Medium |
| Hermes Agent | Persistent memory assistant | 2 GB | No | MCP, agentskills.io | Low |
| n8n | Workflow automation with AI nodes | 2 GB (4 GB recommended) | No | HTTP, webhooks | Medium |
| Custom agent | Whatever you build | Varies | Optional | Whatever you implement | High |
What is Claude Code and why run it on a VPS?
Claude Code is Anthropic's agentic coding tool. It lives in your terminal, reads your entire codebase, edits files, runs commands, manages git workflows, and spawns sub-agents for parallel tasks. It uses Claude Opus 4.6 as its reasoning engine and scores 80.8% on SWE-bench Verified.
Running Claude Code on a VPS means your coding agent works around the clock. It can run CI pipelines, monitor repositories, handle scheduled refactoring tasks, and respond to webhooks. You keep your codebase on a server you control instead of pulling it through a managed platform.
Claude Code supports MCP natively. You can connect it to databases, APIs, file systems, and custom tools through MCP servers running on the same VPS. It also supports agent teams: multiple Claude Code sessions coordinating on a shared project, with one session acting as team lead.
Resource-wise, Claude Code is lightweight. The agent harness needs about 2 GB of RAM. All inference happens via Anthropic's API. [-> run-claude-code-vps]
What is OpenClaw?
OpenClaw (formerly Clawdbot/Moltbot) is the most popular open-source AI agent, with over 250,000 GitHub stars as of March 2026. Created by Peter Steinberger, it is a general-purpose assistant that connects to messaging platforms like Signal, Telegram, Discord, and WhatsApp.
Unlike Claude Code, which focuses on coding, OpenClaw acts as a personal assistant. It manages files, sends emails, controls APIs, automates workflows, and browses the web. It supports multiple LLM backends: Claude, GPT, DeepSeek, and local models through Ollama.
Self-hosting OpenClaw requires more resources than Claude Code. The minimum is 2 vCPUs and 4 GB of RAM. If you enable browser automation (Playwright), plan for 8 GB because each browser instance consumes 1-2 GB on its own. Storage should be NVMe SSD: OpenClaw is I/O-sensitive during Docker operations.
Security warning: OpenClaw has faced serious security issues. Palo Alto Networks identified a "lethal trifecta" of risks: access to private data, exposure to untrusted content, and the ability to perform external communications while retaining memory. In early 2026, Koi Security audited 2,857 skills on ClawHub and found 341 malicious ones, roughly one in eight packages. Treat OpenClaw's skill ecosystem as untrusted. Audit every skill before installing it, and run OpenClaw in a sandboxed environment.
What is Hermes Agent?
Hermes Agent is an open-source AI agent built by Nous Research, released in February 2026. What sets it apart is persistent memory: Hermes remembers your preferences, projects, and environment across sessions. When it solves a hard problem, it writes a reusable skill document so it never forgets the solution.
Hermes runs on a $5/month VPS. It ships with 40+ built-in tools and connects to Telegram, Discord, Slack, WhatsApp, Signal, and CLI through a single gateway process. All data stays on your machine. No telemetry, no tracking.
Skills follow the agentskills.io open standard, so they are portable and searchable across agents. The longer Hermes runs, the more capable it becomes. MIT licensed.
Workflow automation: n8n with AI nodes
n8n is not an AI agent in itself, but it becomes one when you add AI nodes. You can build workflows that call LLMs, process responses, and trigger actions based on AI decisions. Think of it as the glue layer: connect your AI agent to 400+ integrations without writing custom code for each one.
Self-hosting n8n requires 2 vCPUs and 4 GB of RAM for production use. Use PostgreSQL instead of SQLite for anything beyond testing. If you run a vector database (Qdrant, Pinecone) alongside n8n, add another 2-4 GB of RAM.
How do agent protocols work? (MCP, A2A, ANP)
Three protocols define how AI agents communicate in 2026. They are not competing standards but complementary layers. Each solves a different problem, and knowing what they do helps you plan your self-hosted setup.
| Protocol | Created by | What it does | When you need it |
|---|---|---|---|
| MCP (Model Context Protocol) | Anthropic | Connects an agent to tools and data sources | Always. This is how your agent reads files, queries databases, calls APIs |
| A2A (Agent-to-Agent) | Google (now Linux Foundation) | Lets agents delegate tasks to other agents | When you run multiple agents that need to collaborate |
| ANP (Agent Network Protocol) | Community/AAIF | Agent discovery and routing across networks | When agents need to find and authenticate with agents outside your server |
MCP: agent-to-tools
MCP is a JSON-RPC protocol that standardizes how an agent accesses external capabilities. Instead of hardcoding API calls, you run MCP servers that expose tools (read a database, fetch a URL, execute a query) and the agent connects to them as a client.
MCP has crossed 97 million monthly SDK downloads (Python + TypeScript combined) as of February 2026. Every major AI provider supports it: Anthropic, OpenAI, Google, Microsoft, Amazon.
On a self-hosted VPS, MCP servers run as local processes. Your agent connects to them over stdio or HTTP. You control which tools are available, what permissions they have, and what data they can access. No third-party servers involved.
A2A: agent-to-agent
A2A enables peer-to-peer task delegation between agents. One agent can ask another to perform a task, track its progress, and receive the result. Google created it in April 2025, donated it to the Linux Foundation in June 2025, and in December 2025 the Agentic AI Foundation (AAIF) became its permanent home alongside MCP.
You need A2A when you run multiple agents with different specializations. For example: a coding agent that delegates documentation tasks to a writing agent, or a monitoring agent that triggers a deployment agent when tests pass.
ANP: agent discovery
ANP handles discovery and routing. It lets agents find each other across organizational boundaries, authenticate, and establish communication channels. Think of it as DNS for agents.
For most self-hosted setups with agents running on a single VPS, you will not need ANP yet. It becomes relevant when your agents need to interact with agents on other servers or in other organizations. [-> ai-agent-protocols-explained]
What server specs do AI agents need?
Most AI agents are lighter than you would think. The LLM runs remotely via API. Your VPS only runs the agent harness, tools, and whatever local services you add (databases, message queues, web servers).
Here are tested minimums for common setups:
| Setup | vCPU | RAM | Storage | Monthly cost (Virtua) |
|---|---|---|---|---|
| Single agent (Claude Code or Hermes) | 1 | 2 GB | 40 GB SSD | €12 |
| OpenClaw (text only) | 2 | 4 GB | 80 GB NVMe | €28 |
| OpenClaw + browser automation | 4 | 8 GB | 160 GB NVMe | €56 |
| Multiple agents + database | 4 | 8 GB | 160 GB SSD | €48 |
| n8n + vector DB + agent | 4 | 8 GB | 160 GB NVMe | €56 |
| Full stack (3+ agents, DB, monitoring) | 6 | 12 GB | 240 GB NVMe | €84 |
When do you need a GPU? Only if you run a local LLM (Ollama, vLLM) instead of using an API. For models like Llama 3 or Mistral, you need at least 16 GB of VRAM. Most self-hosted agent setups do not need a GPU because the inference happens at the API provider.
Storage matters. Use SSD or NVMe. Agents that use Docker (OpenClaw, n8n) are I/O-sensitive during container operations. HDD causes noticeable lag on container starts and workspace operations.
Leave headroom. Keep at least 30% of RAM free under typical load. Agents can spike during complex reasoning chains or when processing large context windows. If your VPS starts swapping, agent response times degrade fast.
How do you secure a self-hosted AI agent?
AI agents are not normal applications. They execute arbitrary code based on LLM output. An agent with shell access can do anything your user account can do. A prompt injection attack can turn your coding agent into a data exfiltration tool. That reality shapes every decision in this section.
Treat agents as untrusted code
The LLM that drives your agent processes external input: user messages, file contents, API responses, web pages. Any of these can contain prompt injection payloads. Assume that at some point, your agent will try to do something it should not.
Principle of least privilege. Run each agent as a dedicated system user with minimal permissions. Never run agents as root. Give the agent user access only to the directories and commands it needs.
# Create a dedicated user for your agent
sudo useradd -r -m -s /bin/bash agent-claude
sudo chmod 700 /home/agent-claude
Sandbox agent execution
A standard Docker container is not a security boundary. Containers share the host kernel, and a motivated attacker (or a confused LLM) can escape a permissive container. For real isolation:
- MicroVMs (Firecracker, Kata Containers): Each agent gets its own kernel. Strongest isolation. Best for agents that execute untrusted code.
- gVisor: Intercepts syscalls in user space. Lighter than microVMs but stronger than bare containers. Good middle ground.
- Hardened containers: Acceptable for trusted agents only. Use
--read-only,--no-new-privileges, drop all capabilities, mount minimal volumes.
Network isolation
Agents should not have unrestricted network access. An agent that can reach any IP can exfiltrate data or participate in attacks.
# Allow only the specific API endpoints your agent needs
sudo ufw default deny outgoing
sudo ufw allow out to any port 443 proto tcp # HTTPS for API calls
sudo ufw allow out to any port 53 proto udp # DNS
sudo ufw enable
Refine this further by restricting outbound connections to specific IP ranges for your LLM provider. Block everything else.
Manage secrets properly
Never hardcode API keys in agent configuration files. Use environment files with restricted permissions.
# Create a secrets file
sudo mkdir -p /etc/agent-claude
echo "ANTHROPIC_API_KEY=sk-ant-..." | sudo tee /etc/agent-claude/env > /dev/null
sudo chmod 600 /etc/agent-claude/env
sudo chown agent-claude:agent-claude /etc/agent-claude/env
Reference this in a systemd unit with EnvironmentFile=/etc/agent-claude/env. The key never appears in process listings or config files readable by other users.
Monitor and log everything
Agents that run autonomously can behave unexpectedly. Log all agent actions and review them regularly.
# Watch agent logs in real time
journalctl -u agent-claude -f
# Check for unusual outbound connections
ss -tnp | grep agent-claude
Set up alerts for unusual patterns: high CPU usage, unexpected network connections, rapid file system changes, or agents running commands outside their normal scope.
For a complete security hardening walkthrough, see .
How to get started with your first self-hosted agent
Pick one agent and get it running. Do not try to set up the entire stack at once.
If you want a coding assistant: Start with Claude Code. Install via npm, authenticate, and you have a working agent in minutes. Needs the least resources of any option here. [-> run-claude-code-vps]
If you want a personal assistant on your messaging apps: Deploy OpenClaw. It takes more setup (Docker, messaging platform configuration, skill selection) but gives you the most versatile general-purpose agent. Budget 4-8 hours for initial setup.
If you want a persistent memory agent: Try Hermes. Single-command install, MIT licensed, and it gets better the longer it runs.
If you want AI-powered workflow automation: Set up n8n with AI nodes. Connect your existing tools and services through visual workflows. Best for non-coding automation tasks.
Your first steps
Regardless of which agent you choose:
- Provision a VPS. Start with 4 GB RAM if you are unsure. You can resize later.
- Secure the server. SSH keys only, firewall enabled, non-root user created. Do this before installing anything else.
- Install the agent. Follow the specific tutorial for your chosen agent.
- Restrict permissions. Run the agent as a dedicated user. Limit network access. Store secrets in protected files.
- Test from outside. Verify the agent works by connecting from your local machine, not just from the server itself.
- Set up monitoring. At minimum, watch logs with
journalctl. Ideally, set up resource alerts.
Every tutorial linked in this guide includes verification at each stage. Start with one agent, get comfortable, then expand.
Copyright 2026 Virtua.Cloud. All rights reserved. This content is original work by the Virtua.Cloud team. Reproduction, republication, or redistribution without written permission is prohibited.
Ready to try it yourself?
Deploy your own server in seconds. Linux, Windows, or FreeBSD.
See VPS Plans