raatti.net – Notes from the Terminal

AI Capacity Is Getting Full

Jani Karlsson — Tue, 05 May 2026 17:28:47 +0000

Something shifted in the AI developer space over the past few weeks. Subscription limits tightened, usage-based billing appeared where flat rates used to be, and companies that were handing out unlimited inference started quietly pulling back. Streaming developers have been sounding the alarm — AI prices have been heavily subsidized, and that era is ending. But the reason isn’t what most people think. It’s not greed. It’s physics.

“You don’t pause signups because you want to make more money. You pause signups because you don’t have capacity.”
— Theo (t3.gg)

“There is not enough compute in the world right now. That is the problem. We are working with a limited resource and the limited resource isn’t money.”
— Theo (t3.gg)

“Things just can’t be as free as they once were, and the amount of usage you’re going to be getting is clearly and obviously going down.”
— ThePrimeagen

“How could you tell every employee to maximally use AI? By the way, we’re judging you on AI usage. Oh my gosh, you’re using too much AI.”
— ThePrimeagen (on Uber)

I’ve been using Claude Code daily for months now. I’ve watched my own usage patterns shift from “let me ask it a quick question” to “here, take this entire codebase and figure out what’s wrong.” The productivity gains are real — genuinely, meaningfully real. But “it works” and “it’s sustainable” aren’t the same thing, and the last few weeks have made that painfully clear.

The narrative most people are running with is that AI companies got greedy. Anthropic tested removing Claude Code from the $20 plan — obviously they want your money. GitHub Copilot switched to token billing — obviously Microsoft is squeezing developers. Uber burned its annual AI budget in four months — obviously the economics don’t work. Cue the “AI bubble is popping” takes.

That framing is wrong. Not partially wrong — fundamentally wrong about what the problem even is. These companies don’t care about your $20 or even your $200 subscription. What they care about is that they’re running out of GPUs, and you’re sitting on compute they need for enterprise customers who pay full rate.

It’s Compute, Not Money

The real constraint isn’t financial — it’s physical. There aren’t enough GPUs in the world to serve the current demand. Data center GPUs are sold out for months with lead times stretching 36-52 weeks. All three major HBM suppliers — SK Hynix, Samsung, Micron — have their 2025-2026 production fully booked. And the GPUs that do exist can’t all be powered — Microsoft literally has Nvidia cards sitting in inventory because they don’t have enough megawatts to turn them on. Half of planned US data center builds have been delayed or cancelled because of power grid limitations. High-voltage transformer lead times have stretched from 12-18 months to 36-48 months.

When GitHub pauses Copilot signups, when Anthropic restricts Claude Code on the $20 tier — these aren’t revenue plays. These are companies rationing a physical resource they’ve run out of. The subscription tiers were always marketing. Now the marketing is eating the product.

Consider the 7.5x message multiplier for GPT-5.5 on Copilot. GPT-5.5 actually ends up cheaper per run than 5.4 because it uses far fewer tokens despite being 2x more expensive per token. If this were about cost, the multiplier would be lower than 5.4’s, not 7.5x higher. The multiplier exists because Microsoft needs those GPUs for enterprise Azure customers, not because 5.5 costs 7.5x more to run. These numbers are compute rationing, not pricing. Once you see that, every other move by every other company clicks into place.

The Subsidization Problem Is Real — But It’s Not What You Think

Here’s what’s wild about the current moment. Cursor’s internal auditing reportedly shows you can get up to $5,000 of inference from a $200/month Claude Code subscription. Anthropic is losing money on some heavy users purely from electricity costs — before you even count GPU depreciation, training costs, or salaries. OpenAI lost $5 billion on $3.7 billion in revenue last year. They spend $1.35 for every dollar they earn. Anthropic hit $9 billion in annualized revenue by end of 2025 while spending $9.7 billion — and still had to raise $30 billion in February 2026 after raising $13 billion in September 2025.

The bubble crowd looks at these numbers and says “see, it’s all fake.” But the subsidy isn’t hitting a financial wall — it’s hitting a physical one. You can raise another $30 billion. What you can’t do is fabricate GPUs faster than TSMC’s CoWoS packaging capacity allows, or conjure gigawatts of power capacity that takes years to build. Data center occupancy is projected to hit 95%+ by late 2026, up from 85% in 2023.

The Uber story illustrates this perfectly. They rolled Claude Code out in December, 84% of their 5,000 engineers adopted it, monthly API costs hit $500 to $2,000 per engineer, and by April the annual budget was gone. But here’s what most people miss — those engineers were paying full API rates, not the subsidized subscription prices. If they’d each been on $200/month personal subscriptions, Anthropic would be eating the difference. The enterprise pricing is closer to the real cost. The consumer pricing is the fantasy.

The Enterprise Copilot Problem — And Why Direct API Access Might Be the Answer

GitHub Copilot is the default AI coding tool for most enterprises. It’s integrated into VS Code, it has SSO and policy controls, it’s easy to roll out to hundreds of developers. At $19/user/month for Business or $39/user/month for Enterprise, it looks like a reasonable line item. But that flat pricing was always a fiction — and now that fiction is ending.

Flat pricing existed as a sales tactic. People buy flat rates because uncertainty kills purchases. Tell an enterprise CTO “it’ll cost somewhere between $5 and $500 per developer per month depending on usage” and they’ll think about it forever. Tell them “$19/user/month, done” and they sign. The problem is that Copilot’s internal costs nearly doubled week-over-week since January as agentic workflows exploded, and the flat rate became untenable. This isn’t enshittification — it’s GitHub trying to make the numbers work for the first time.

Starting June 1st, Copilot moves to usage-based billing — you get AI credits matching your subscription ($19 in credits for Business, $39 for Enterprise), and anything beyond that is billed by token consumption at listed API rates. Code completions stay unlimited, but chat, CLI, cloud agents, and Spaces all consume credits.

But here’s what enterprises should actually be asking: if you’re paying per-token anyway, why pay the middleman?

Copilot Enterprise at $39/user/month gives you 1,000 premium requests with a markup on the underlying API costs. For a 50-developer team, that’s $23,400/year before overages. The alternative is buying API access directly — Claude Code via Anthropic’s API, or OpenAI’s API — and plugging it into open-source harnesses, VS Code extensions, or terminal tools like Claude Code CLI. You lose some of the managed platform convenience (SSO policies, usage dashboards, PR review integration), but you pay actual API rates instead of Copilot’s marked-up credit system.

For teams that are heavy users — and the whole point of deploying AI to your engineering org is to make them heavy users — the direct API path can be significantly cheaper. You also get to choose your models without multiplier penalties (the 7.5x multiplier for GPT-5.5 on Copilot doesn’t exist when you buy direct). And there’s a longer-term strategic concern worth thinking about: once AI becomes core engineering infrastructure and the market consolidates, whoever controls your AI pipeline has VMware-style lock-in power. Someone will become the Broadcom of AI. Going direct to the API layer at least reduces your exposure to a single intermediary deciding to jack up prices once you’re locked in.

This isn’t to say Copilot is bad — for teams starting out with AI coding tools, the managed experience is worth something. But the move to usage-based billing changes the calculus. If you’re paying per token either way, evaluate whether the Copilot wrapper is worth the premium over direct API access. For a lot of enterprises, especially ones already doing significant inference volumes, it won’t be.

Google Isn’t Winning Either

There’s a popular narrative that Google is the “real winner” here — rich enough to pour $200 billion into AI and still make money, free from the hype machine because they don’t need investor capital. That’s wrong.

Google was subsidizing harder than anyone. Free AI Overviews on every search query — signed out, incognito, unlimited. Subsidized Opus 4.5 access through Gemini subscriptions. People were using Gemini specifically to get cheap access to Anthropic models. And Google had to clamp down faster and more aggressively than anyone else — banning users who built plugins to track usage, restricting API access, walking back the generosity at every turn. Google was arguably the most extreme example of the compute crunch because they were giving away more free inference than anyone.

The reason you don’t think of Google as struggling is because their models haven’t been competitive enough for developers to notice the restrictions. But behind the scenes, Google is as compute-constrained as everyone else — possibly more so, with rumors that they’re using CPUs for some training and inference because that’s what they have available. Google makes their own TPUs and they’re still behind on capacity.

The Cost of Intelligence IS Dropping

Before this reads too much like a doom piece — models are getting dramatically more efficient. The Artificial Analysis intelligence index numbers tell a compelling story: GPT-5.5 medium matches 5.4x-high’s intelligence level at less than half the cost per benchmark run ($1,200 vs $2,800). GPT-5.5 low delivers comparable scores to Claude Sonnet 4.6 at a sixth of the price. And the key insight is that per-token pricing is misleading — what matters is cost per problem solved. GPT-5.5 is 2x more expensive per token but uses so many fewer tokens that it ends up cheaper overall.

The frontier keeps getting more expensive to push, but any given level of capability gets cheaper over time. Gartner projects inference costs for trillion-parameter models will drop 90% by 2030. Whether that timeline is right or not, the direction is clear. The question is whether the industry can survive the next two to three years of physical constraints before efficiency gains bail them out.

What This Means If You’re Building With AI

If you’re using AI tools through personal subscriptions — Claude Pro, Copilot, Cursor — expect the generosity to keep shrinking. You’re getting an incredible deal right now, and the companies providing it are losing money on you. That’s not sustainable. Usage-based pricing is the future, and Copilot’s June 1st switch is just the start.

If you’re making enterprise decisions about AI adoption, budget conservatively. The current API prices are still subsidized. Plan for 30-50% increases over the next 18 months as these companies move toward unit economics that don’t require raising billions every quarter. For every $1 billion spent training a model, organizations face $15-20 billion in inference costs over its production lifetime. And for the love of god, don’t tell your entire engineering org to “maximize AI usage” without understanding what that means for your token bill. Uber learned that the hard way.

If you’re watching this from the outside trying to decide whether AI is real or a bubble — it’s both, and that’s not a contradiction. The technology is genuinely transformative. The economics are genuinely unsustainable at current pricing. Those two things coexist. The dot-com bubble popped and the internet was still the most important technology of the century. AI tools will get more restricted and more expensive in the short term, and they’ll get dramatically cheaper and more capable in the long term. The companies that survive the squeeze will be the ones with the most compute, not the most hype.

My Take

I’m not cutting back on AI usage. The productivity gains are too significant to walk away from, even as the free ride winds down. But I am paying closer attention to which models I use for what, how many tokens my workflows actually consume, and whether I’m getting real value from that inference or just burning tokens because they’re cheap.

The era of “unlimited AI for $20/month” is ending. What replaces it will be more honest about the actual costs — and that’s probably healthier for everyone. The current pricing was always a distortion. The question is how rough the transition period gets before efficiency gains, new fab capacity, and power infrastructure catch up to demand.

This isn’t the end of the AI subsidy economy — this is the start of compute restrictions really affecting the way companies think. Meaningful new HBM3e capacity is expected online in late 2026, but it won’t clear the backlog immediately. Based on everything I’ve seen, we’re looking at 18-24 months of increasing restrictions before things start to ease. Late 2027, early 2028 is when new HBM capacity, additional data center power, and continued model efficiency improvements should start meaningfully closing the gap.

Until then — use AI deliberately, watch your token spend, and don’t build your workflow around subsidized pricing that won’t last.

References & Source Material

Videos That Sparked This Post

“The AI Economy is about to change” — ThePrimeagen — Covers Anthropic’s painted door test, Copilot’s pricing shift, and Uber’s budget blowout. Frames it as a money/subsidization problem.
“Prime is (mostly) right about AI” — Theo (t3.gg) — The essential reframe: the real constraint is compute, not money. Deep dive into subscription subsidization economics, Google’s hidden compute crisis, and why the 7.5x Copilot multiplier proves it’s about GPU rationing.
“GitHub CoPilot AI Pricing Increasing” — Eli the Computer Guy — Enterprise-focused perspective on Copilot’s pricing shift. Argues this isn’t enshittification but rational pricing correction. Warning about the “Broadcom of AI” lock-in scenario once the market consolidates.

Data & Sources

GPU/Compute Supply:
– GPU lead times 36-52 weeks, TSMC CoWoS fully allocated
– HBM production sold out through 2026
– Big Five committed $600-630B capex for 2026

Power Grid Bottleneck:
– Microsoft has GPUs sitting idle — can’t power them
– Half of US data center builds delayed/cancelled
– Transformer lead times stretched to 36-48 months
– Data center occupancy projected >95% by late 2026
– AI to drive 165% increase in power demand by 2030

Uber:
– Burned entire 2026 AI budget by April
– 84% of 5,000 engineers adopted Claude Code, CTO “back to the drawing board”
– $500-$2,000/month API costs per engineer

Subsidization & Losses:
– Anthropic: $9B annualized revenue, $9.7B total spend (2025)
– Anthropic Series G: $30B raised February 2026
– OpenAI: $3.7B revenue, lost ~$5B
– Ed Zitron’s “Subprime AI Crisis” analysis
– $1B training = $15-20B inference costs over model lifetime
– Budget for 30-50% API price increases

GitHub Copilot:
– Moving to usage-based billing June 1st
– Developer reactions to pricing change
– Copilot internal costs leaked — nearly doubled since January
– Models and pricing details

Direct API Alternatives:
– Claude Code vs Copilot comparison
– AI coding tools pricing compared 2026
– Claude Code vs Copilot vs Cursor comparison

Efficiency Improvements:
– Gartner: inference costs to drop 90% by 2030
– AI compute shortage challenges bubble narrative

Headscale vs Tailscale vs NetBird vs Cloudflare Mesh for Private Networking

Jani Karlsson — Fri, 17 Apr 2026 19:56:13 +0000

I’ve been running Tailscale across my Hetzner nodes and desktop for a while now. It works. It’s painless. You install the client, sign in, and your devices find each other — done. But “it works” and “I’m happy with it” aren’t the same thing, and lately I’ve been thinking harder about what Tailscale’s coordination server actually knows about my infrastructure and whether I could do better on privacy and performance.

Then Cloudflare dropped Mesh this week. NetBird keeps shipping features. Headscale hit a level of maturity that makes it genuinely viable. Suddenly there are four real options for private mesh networking, each with a fundamentally different philosophy about who controls what and each at different level of maturity.

This isn’t a “which one is best” article — that question is meaningless without context. Instead, I’m going to compare all four on equal footing: architecture, installation, features, performance, security, and operational burden. At the end, I’ll explain what I’m actually going to do with my own infrastructure and why.

The Four Contenders

Before diving into details, here’s the landscape at a glance. All four solve the same fundamental problem — connecting your devices into a private network — but they approach it from very different directions.

Tailscale is the incumbent. Cloud-hosted coordination server, peer-to-peer WireGuard data plane, polished experience. You trade some metadata visibility for zero operational burden. It just works.

Headscale is the lightest self-hosting path. It’s an open-source reimplementation of Tailscale’s coordination server — you replace only the control plane while keeping the official Tailscale clients. Single binary, narrow scope, minimal ops.

NetBird is the fully independent option. Own client, own management server, own signal server, own relay infrastructure. Everything is open source (Apache 2.0), everything is self-hostable. More moving parts, but zero dependency on Tailscale Inc. for anything.

Cloudflare Mesh is the new entrant, announced April 14, 2026. Edge-routed through Cloudflare’s global network rather than peer-to-peer. Zero self-hosting, deep integration with Cloudflare One, but all your mesh traffic passes through Cloudflare’s infrastructure.

	Tailscale	Headscale	NetBird	CF Mesh
Philosophy	Convenience-first	Minimal self-hosting	Maximum independence	Platform integration
Data plane	P2P WireGuard	P2P WireGuard	P2P WireGuard	Edge-routed (Cloudflare)
Self-hostable	No (client only)	Yes (server)	Yes (everything)	No
Client	Tailscale	Tailscale (same)	Own client	Cloudflare One / Mesh Node
Free tier	100 devices / 3 users	Unlimited	Unlimited (self-hosted)	50 nodes / 50 users
License	Client: BSD; Server: proprietary	BSD	Apache 2.0	Proprietary

Architecture: How They Actually Work

All four create an overlay network that lets your devices communicate as if they were on the same LAN. The differences are in how they coordinate that network and where the traffic flows.

Tailscale and Headscale: The Coordination Model

Tailscale and Headscale share the same architecture because Headscale is literally a reimplementation of Tailscale’s server. The model works like this: a coordination server distributes WireGuard public keys, IP address assignments, and ACL policies to all nodes in your network. When node A wants to talk to node B, it gets B’s public key and endpoint information from the coordination server, then establishes a direct WireGuard tunnel. The actual traffic flows directly between the two nodes — peer-to-peer, encrypted, never touching the coordination server.

The coordination server’s role is purely administrative: it’s a phone book, not a post office. It tells nodes how to find each other, but it doesn’t carry their mail. This is a critical distinction for the security discussion later — the coordination server knows who is in your network and where they are, but it never sees what they’re saying to each other.

When direct peer-to-peer connections fail — symmetric NAT, strict corporate firewalls, mobile networks stuck behind carrier-grade NAT — traffic falls back to DERP relay servers. DERP (Designated Encrypted Relay for Packets) is Tailscale’s relay protocol. The relay forwards encrypted WireGuard packets between nodes that can’t reach each other directly. With Tailscale’s hosted service, these relays are shared public infrastructure distributed globally — free to use but with no performance guarantees. With Headscale, you can enable an embedded DERP server on the same box, keeping even fallback relay traffic on your own infrastructure.

The key difference between Tailscale and Headscale: Tailscale’s coordination server is proprietary and cloud-hosted. Headscale gives you the exact same architecture with the coordination server on your own box. Same clients, same protocol, same data plane — different trust model for the control plane.

NetBird: The Independent Stack

NetBird uses WireGuard for the data plane just like Tailscale, but everything else is built from scratch — own client, own server infrastructure, own relay system. The server side has three components instead of one:

The Management Server handles authentication, policy distribution, and peer coordination. Think of it as the equivalent of the Tailscale/Headscale coordination server, but with a built-in web dashboard and API-driven policy management instead of JSON config files.

The Signal Server handles WebRTC signaling for peer discovery — it’s what lets nodes find each other and negotiate direct connections. This is a separate concern from policy management, which is why it’s a separate service.

The TURN/Relay Server provides NAT traversal fallback, similar to Tailscale’s DERP but using the Coturn implementation (or since v0.29.0, a newer WebSocket-based relay). When peers can’t connect directly, traffic goes through here.

This three-component architecture is more complex to deploy and maintain than Headscale’s single binary. But it gives you complete independence from the Tailscale ecosystem — if Tailscale Inc. changes their client license, breaks API compatibility, or goes in a direction you don’t like, NetBird is entirely unaffected. You’re running a fully independent stack.

NetBird also supports kernel-level WireGuard on Linux by default (with userspace as fallback), which can give a performance edge on older hardware where Tailscale’s userspace optimizations haven’t caught up yet.

Cloudflare Mesh: The Edge Model

Cloudflare Mesh is architecturally different from the other three. There’s no peer-to-peer — every connection routes through Cloudflare’s global edge network across 330+ cities. This eliminates NAT traversal problems entirely (agents connect outbound to Cloudflare, no inbound ports needed), but it means all your private mesh traffic transits Cloudflare’s infrastructure. The underlying protocol isn’t publicly disclosed — they don’t confirm WireGuard.

Architecture summary
Tailscale/Headscale/NetBird: your traffic goes directly between your devices (peer-to-peer WireGuard). Only coordination metadata touches the server.Cloudflare Mesh: all traffic routes through Cloudflare’s edge network. No peer-to-peer path exists.

Installation: Getting Started with Each

Let’s get practical. Here’s what it actually takes to set up each solution from scratch on Ubuntu/Debian servers — the kind of thing you’d do on a Hetzner VPS.

Tailscale

The easiest of the four. One command to add the repo, one to install, one to authenticate:

curl -fsSL https://tailscale.com/install.sh | sh
tailscale up

A browser window opens, you sign in with your SSO provider (Google, GitHub, Microsoft, etc.), and the device joins your tailnet. Repeat on every device. That’s it — your nodes can now reach each other by Tailscale IP or MagicDNS name.

For a headless server, use an auth key instead:

# Generate auth key in Tailscale admin console first
tailscale up --authkey=tskey-auth-xxxxx

To advertise a subnet route or enable an exit node:

# Advertise a local network through this node
tailscale up --advertise-routes=192.168.1.0/24

# Use this node as an exit node (route all internet traffic through it)
tailscale up --advertise-exit-node

Then approve the routes in the Tailscale admin console. ACL policies are managed through a HuJSON file in the admin dashboard — powerful but requires learning the syntax.

Total time: under 5 minutes per node. No server to deploy, no config files to write, no TLS certs to manage. The trade-off is clear: maximum convenience, zero control over the coordination infrastructure.

Headscale

You need a server with a public IP, a domain name pointing to it, and TLS certificates. Let’s set it up:

# Download the latest release
wget https://github.com/juanfont/headscale/releases/latest/download/headscale_linux_amd64 -O /usr/local/bin/headscale
chmod +x /usr/local/bin/headscale

# Create config directory and default config
mkdir -p /etc/headscale
headscale generate config > /etc/headscale/config.yaml

Edit the config — the key settings:

# /etc/headscale/config.yaml (key sections)
server_url: https://hs.yourdomain.net:443
listen_addr: 0.0.0.0:443
tls_cert_path: /etc/letsencrypt/live/hs.yourdomain.net/fullchain.pem
tls_key_path: /etc/letsencrypt/live/hs.yourdomain.net/privkey.pem

# Custom DNS domain — one of the big wins
dns:
  base_domain: mesh.yourdomain.net
  magic_dns: true
  nameservers:
    global:
      - 1.1.1.1
      - 9.9.9.9

# Enable embedded DERP for self-hosted relay
derp:
  server:
    enabled: true
    region_id: 999
    stun_listen_addr: 0.0.0.0:3478

Create a systemd service and start it:

cat > /etc/systemd/system/headscale.service << 'EOF'
[Unit]
Description=Headscale coordination server
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/headscale serve
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now headscale

Create a user (Headscale’s equivalent of a tailnet namespace):

headscale users create myuser

Now on your client machines, install Tailscale as usual but point it at your Headscale server:

curl -fsSL https://tailscale.com/install.sh | sh
tailscale up --login-server https://hs.yourdomain.net

To enable ACLs, create a policy file:

# /etc/headscale/acl.yaml
acls:
  - action: accept
    src:
      - "myuser"
    dst:
      - "myuser:*"

# Auto-approve routes and exit nodes
autoApprovers:
  routes:
    "192.168.1.0/24":
      - "myuser"
  exitNode:
    - "myuser"

Enable it in your config:

policy:
  path: /etc/headscale/acl.yaml

To remove Tailscale’s public DERP servers and use only your own (for full sovereignty):

# In config.yaml
derp:
  urls: []  # Remove default Tailscale DERP list
  server:
    enabled: true
    region_id: 999
    stun_listen_addr: 0.0.0.0:3478
  auto_update_enabled: false

Total time: 30–60 minutes including TLS cert setup. Ongoing maintenance: binary updates (check GitHub releases), Let’s Encrypt cert renewal (automate with certbot), and occasional config tweaks. The binary itself is tiny and resource-light — Headscale barely registers in htop alongside your other services.

CGNAT breaks this
Headscale needs a public IP. If your ISP uses Carrier-Grade NAT, you’ll need a VPS — which somewhat undermines the self-hosting argument since you’re then trusting a VPS provider. Though the VPS only sees encrypted WireGuard traffic and coordination data, which is a much narrower trust surface than Tailscale’s full view of your network.

NetBird

Self-hosted NetBird requires three server components. The quickest path is Docker Compose:

# Download the setup script
curl -fsSL https://github.com/netbirdio/netbird/releases/latest/download/getting-started-with-zitadel.sh -o setup.sh
chmod +x setup.sh

# Run setup — this configures Management, Signal, TURN, and Dashboard
./setup.sh

The script walks you through configuring your domain, OIDC provider (Zitadel by default, or bring your own), and relay infrastructure. Under the hood it creates a docker-compose.yml with all the services:

# What gets deployed:
# - Management Server (HTTPS/gRPC) — handles auth, policy, coordination
# - Signal Server — WebRTC signaling for peer discovery
# - Coturn — TURN/STUN relay for NAT traversal
# - Dashboard — web UI for managing your network
# - Zitadel — OIDC identity provider (optional, can use external)

On client machines, install the NetBird client:

curl -fsSL https://pkgs.netbird.io/install.sh | sh
netbird up --management-url https://netbird.yourdomain.net

Total time: several hours for initial setup, including OIDC configuration and relay server setup. The Docker Compose approach simplifies deployment but there’s still meaningful configuration involved.

Once running, the NetBird Dashboard gives you a web UI for managing peers, groups, and access policies — no JSON files to edit. Create groups like “servers” and “desktops”, then define rules:

# Example policy (configured via Dashboard UI or API):
# Allow "desktops" group to reach "servers" group on SSH and HTTPS
Source: desktops
Destination: servers
Protocol: TCP
Ports: 22, 443

NetBird’s posture checks let you go further — deny mesh access if a device doesn’t meet requirements:

# Posture check examples (via Dashboard):
# - Minimum OS version: Ubuntu 22.04+
# - Required process running: crowdstrike-falcon
# - Block if: OS version < threshold

Since v0.65 (February 2026), NetBird includes a built-in reverse proxy for exposing internal services publicly — auto TLS via Let’s Encrypt, custom domains, path-based routing. This is the self-hosted answer to Tailscale Funnel that Headscale doesn’t have.

NetBird cloud alternative
If self-hosting the full stack sounds like too much, NetBird offers a managed cloud option. Free tier available, paid plans from $5/user/month. You get the NetBird client and dashboard without deploying any server infrastructure. The self-hosted complexity is the main barrier — if you don’t need posture checks or the reverse proxy, Headscale is a much simpler self-hosting path.

Cloudflare Mesh

If you’re already a Cloudflare customer, this is the fastest path to a mesh network after Tailscale.

For servers (Linux VMs), deploy a Mesh Node — a lightweight headless agent:

# Install cloudflared if not already present
curl -fsSL https://pkg.cloudflare.com/cloudflared-ascii.pub | gpg --dearmor -o /usr/share/keyrings/cloudflared-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/cloudflared-archive-keyring.gpg] https://pkg.cloudflare.com/cloudflared $(lsb_release -cs) main" | tee /etc/apt/sources.list.d/cloudflared.list
apt update && apt install cloudflared

# Register as a Mesh Node via Cloudflare Zero Trust dashboard
# or via CLI with a connector token from the dashboard

For desktops and mobile, install the Cloudflare One client (WARP) — available for macOS, Windows, iOS, Android. Enable the Mesh network in your Zero Trust dashboard, and enrolled devices can reach each other by their Mesh IPs.

Every enrolled device gets a private Mesh IP. Any participant can reach any other by IP — client-to-client, not just client-to-server. Subnet routing is supported via CIDR routes through Mesh nodes, with active-passive replicas for HA.

The Zero Trust integration is where Cloudflare Mesh shines for existing customers. Gateway policies you already have configured apply to Mesh traffic automatically. Device posture checks validate connecting devices. DNS filtering and traffic inspection are built in. If you’ve already invested time in Cloudflare Access rules, you don’t duplicate anything — it all just works across Mesh traffic too.

Still very new
Cloudflare Mesh was announced on April 14, 2026 — days ago as of this writing. Mesh DNS (automatic hostname resolution like postgres-staging.mesh) is still on the roadmap, not shipped. Docker container support is “expected later 2026.” Linux desktop client situation is unclear — Mesh Nodes are headless server connectors. The 50-node free tier cap is a hard limit. Expect rough edges and missing features for a while.

Total time: 15–30 minutes if you already have a Cloudflare account. Zero ongoing server maintenance — Cloudflare handles everything. The trade-off: all your private mesh traffic routes through their infrastructure.

Installation Summary

	Tailscale	Headscale	NetBird	CF Mesh
Setup time	5 min	30–60 min	2–4 hours	15–30 min
Server required	No	Yes (public IP)	Yes (public IP)	No
Components	Client only	1 binary + client	3+ services + client	Agent only
TLS certs needed	No	Yes	Yes	No
OIDC setup	Built-in SSO	Optional	Required	Via CF Access
Ongoing maintenance	None	Low (single binary)	Medium (3+ services)	None

Feature Comparison

Here’s where the differences get concrete. I’ve mapped out the features that actually matter for a sysadmin running a homelab or small business infrastructure.

Feature	Tailscale	Headscale	NetBird	CF Mesh
MagicDNS / Internal DNS	Yes (`.ts.net`)	Yes (custom domain)	Yes (embedded resolver)	Planned (“Mesh DNS”)
Custom DNS domain	No	Yes	Yes (self-hosted)	TBD
Subnet routing	Yes	Yes	Yes (automated)	Yes (CIDR routes)
Exit nodes	Yes	Yes	Yes	Not mentioned
ACLs	HuJSON files	HuJSON (API)	UI-driven policies	CF Access/Gateway
SSO / OIDC	Yes	Yes	Yes	Via CF Access
Admin UI	Yes (polished)	Community only	Yes (built-in)	CF dashboard
Public service exposure	Funnel/Serve	No (planned)	Yes (reverse proxy v0.65+)	No
File sharing	Taildrop	Taildrop	No	No
SSH without keys	Yes (Tailscale SSH)	Yes (Tailscale SSH)	No	No
Posture checks	No	No	Yes (OS, processes, EDR)	Via CF device posture
SCIM provisioning	Enterprise only	No	Yes	Via CF Identity
HA routes/exit nodes	Premium ($18/user/mo)	Yes	Yes (all plans)	Yes
Network flow logs	Yes	No (planned)	Yes (self-hosted)	Via CF Gateway
Dynamic ACLs	Yes	No	Via group policies	Via CF Access
Multiple networks	Yes (tailnets)	No (single)	Yes	Yes (via CF One)
Relay infrastructure	Shared DERP (public)	Embedded DERP (own)	Coturn/WebSocket (own)	Cloudflare edge (always)

A few things stand out. Headscale and NetBird both let you configure a custom DNS domain for your mesh — server.mesh.yourdomain.net instead of server.tail12ab.ts.net. That sounds minor until you’ve typed the wrong tailnet hash for the hundredth time in an SSH config. It’s one of those quality-of-life things that compounds across every script, bookmark, and muscle memory pattern.

NetBird’s posture checks are unique — the ability to deny mesh access based on device state (OS version, running processes, EDR status) is something neither Tailscale nor Headscale offer. If you’re in a regulated environment, this matters.

The Funnel/Serve gap is notable. Tailscale’s proprietary features for exposing services publicly don’t exist in Headscale. NetBird closes this gap with a built-in reverse proxy since v0.65 (February 2026) — auto TLS via Let’s Encrypt, custom domains, path-based routing. If you rely on Tailscale Funnel today, NetBird is the only self-hosted alternative that covers this without a separate Cloudflare Tunnel or nginx setup.

Cloudflare Mesh inherits the entire Cloudflare One security stack — Gateway policies, Access rules, device posture checks — which is either a massive advantage (if you’re already a Cloudflare customer) or irrelevant (if you’re not). But Mesh DNS for automatic hostname resolution is still on the roadmap, not shipped. For a product announced this week, that’s expected, but it means you’re working with IPs for now.

One more thing that surprised me looking at this matrix: HA routes and exit nodes. Tailscale gates this behind their Premium tier at $18/user/month. Both Headscale and NetBird include it for free, across all plans. If you need redundant exit nodes or failover routing — and for a business setup, you probably do — this is a real cost difference that adds up.

The admin UI situation is also worth noting. Tailscale has a polished web dashboard. NetBird has a built-in dashboard that’s functional and improving. Headscale has… community options. headscale-ui exists but doesn’t cover all features, and the primary interface is CLI. If you’re comfortable with headscale commands and don’t need a GUI, this is fine. If you’re managing the network with non-CLI people, it’s a pain point.

Finally, network flow logs — the ability to see what’s communicating with what across your mesh. Tailscale has this. NetBird has this (on your own infrastructure, which is great for auditing). Headscale doesn’t, and it’s only planned. If compliance or troubleshooting visibility matters to you, this is a gap.

One feature worth highlighting for SSH-heavy users: Tailscale SSH lets you SSH into nodes without managing SSH keys at all — authentication goes through Tailscale’s identity layer, so you just ssh user@node and it works. No authorized_keys files, no key rotation, no agent forwarding headaches. Headscale supports this too (same Tailscale client), which is a nice bonus if you’re migrating. NetBird and Cloudflare Mesh don’t have an equivalent — you manage SSH keys the traditional way.

Performance: The Numbers

This is where opinions meet data. NetBird published a detailed benchmark in April 2026 comparing all three P2P solutions against Cloudflare Mesh: Cloudflare Mesh vs NetBird vs Tailscale: Performance Compared (also available as a YouTube video). Real iperf3 tests across multiple regions and providers.

Test setup: NetBird 0.68.3, Tailscale 1.96.4, Cloudflare WARP 2026.3.846.0, iperf3 3.16 on Ubuntu 24.04 LTS. Both NetBird and Tailscale running in userspace mode for fair comparison.

One gap in these benchmarks: they only measure throughput (Mbps) and UDP packet loss — no latency, jitter, or RTT data. For interactive use cases like SSH, latency matters more than bandwidth. Given that Cloudflare Mesh adds two extra hops (client → edge → destination) compared to P2P’s direct tunnel, you’d expect higher latency on CF Mesh for regional connections. For intercontinental routes, Cloudflare’s optimized backbone might actually reduce latency compared to public internet routing. Until someone publishes proper latency benchmarks, treat the throughput numbers as only part of the performance picture.

The headline: NetBird ≈ Tailscale, P2P >> Cloudflare (regionally)

NetBird and Tailscale perform “basically the same when it comes to network performance” — results fluctuate within normal network variation. Both use peer-to-peer WireGuard, so this makes sense. The real story is P2P versus edge-routed.

Since Headscale uses the exact same Tailscale clients and WireGuard data plane, its performance is identical to Tailscale for direct connections. The only difference is relay performance, where your own DERP server replaces Tailscale’s shared public relays.

European and regional routes: P2P dominates

Route	NetBird	Tailscale	CF Mesh	Verdict
Hetzner DE → Hetzner DE	1,260 / 1,260 Mbps	1,300 / 1,220 Mbps	250 / 290 Mbps	P2P ~5x faster
Helsinki → Germany	810 / 746 Mbps	842 / 750 Mbps	249 / 349 Mbps	P2P ~2-3x faster
Hetzner → GCP EU-West3	1,410 / 1,220 Mbps	1,380 / 1,390 Mbps	273 / 387 Mbps	P2P ~3-5x faster
AWS US East → West	358 / 423 Mbps	347 / 339 Mbps	162 / 300 Mbps	P2P wins
Berlin residential → Hetzner	50 / 492 Mbps	44 / 466 Mbps	47 / 248 Mbps	P2P 2x on download

(Format: upload / download)

For anyone running Hetzner-to-Hetzner — which includes my setup — P2P solutions deliver 1,200+ Mbps while Cloudflare Mesh tops out at 250–350 Mbps on the same routes. Not even close.

Intercontinental routes: Cloudflare’s backbone wins

Route	NetBird	Tailscale	CF Mesh	Verdict
Japan → Berlin residential	81 / 28 Mbps	24 / 8 Mbps	158 / 43 Mbps	CF ~1.5-2x faster
Japan → Hetzner Nuremberg	48 / 32 Mbps	47 / 21 Mbps	224 / 269 Mbps	CF ~5-8x faster
Berlin → AWS US East	39 / 186 Mbps	37 / 181 Mbps	44 / 287 Mbps	CF wins on download
Hetzner DE → AWS US East	209 / 148 Mbps	206 / 154 Mbps	198 / 215 Mbps	Roughly even

(Format: upload / download)

On long-distance international routes, Cloudflare’s optimized backbone genuinely outperforms P2P — sometimes dramatically. The Japan-to-Europe route shows 5-8x faster throughput through Cloudflare’s edge than direct WireGuard. Their network finds better paths than the public internet.

UDP: Where Cloudflare Mesh falls apart

This is the number that should make you pause. At 300 Mbps fixed rate, Hetzner Germany → AWS US West:

Solution	Sent	Received	Packet Loss
NetBird	300 Mbps	295 Mbps	1.2%
Cloudflare Mesh	300 Mbps	257 Mbps	14%

14% packet loss. If you’re running VoIP, video conferencing, gaming, or anything real-time through Cloudflare Mesh, you’re going to have a bad time. This is an inherent trade-off of edge-routed architecture — every packet takes two extra hops (to and from Cloudflare’s edge) compared to a direct WireGuard tunnel.

Relay performance: The hidden differentiator

Direct P2P connections perform identically across Tailscale, Headscale, and NetBird — they’re all WireGuard. The performance gap shows up when direct connections fail and traffic falls back to relays. This happens more than you’d think: symmetric NAT, strict corporate firewalls, mobile networks, double-NAT.

On Tailscale’s free tier, relay goes through shared public DERP servers with no throughput guarantees. Users regularly report dramatically slower speeds through DERP compared to direct connections. With Headscale’s embedded DERP or NetBird’s self-hosted TURN, your relay performance is bounded by your own server’s bandwidth — which for a decent Hetzner node is going to crush a congested shared relay.

This is one of the most tangible day-to-day improvements of self-hosting: not theoretical privacy benefits, but actual throughput when you need relay.

Performance implications for different use cases

Let’s cut through the numbers and talk about what actually matters for specific scenarios:

Server-to-server backups and replication: If you’re running ZFS send/recv, rsync, or database replication between Hetzner nodes, P2P WireGuard (Tailscale/Headscale/NetBird) is the clear winner. 1,200+ Mbps versus 250-350 Mbps through Cloudflare Mesh is not a marginal difference — it’s the difference between a 10-minute backup job and an hour-long one. Don’t route bulk data through Cloudflare Mesh if you have any other option.

SSH and web admin: Performance differences are completely irrelevant. An SSH session or web dashboard uses kilobits per second, not gigabits. Use whichever tool is most convenient to connect — the extra latency through Cloudflare’s edge is imperceptible for interactive use.

VoIP and video calls: Cloudflare Mesh’s 14% UDP packet loss is a non-starter. If you’re running any kind of real-time communication through your mesh, stick with P2P solutions. This is an inherent limitation of edge-routed architecture, not something likely to be “fixed” — every packet takes extra hops.

Accessing services from abroad: If you’re travelling in Asia and need to reach European servers, Cloudflare Mesh genuinely wins. Their backbone finds better intercontinental paths than the public internet. The 5-8x speed improvement on the Japan-to-Europe route is not a rounding error — it’s Cloudflare’s core competency showing.

Mobile on unreliable networks: This is where self-hosted relay matters most. If your phone is on hotel WiFi with strict NAT and can’t establish a direct WireGuard connection, the relay is all you have. Tailscale’s shared DERP might work or might be frustratingly slow. Your own Headscale DERP on a good server will give you consistent, fast relay. Cloudflare Mesh will also be consistent, since edge routing is always the path.

Tailscale’s userspace performance evolution
Worth noting: Tailscale has made major optimizations to their userspace WireGuard. On modern bare metal (i5-12400), they hit 13.0 Gbps — actually surpassing kernel WireGuard (11.8 Gbps). The old “Tailscale is slow because userspace” narrative is outdated. Source

Security and Privacy: The Sovereignty Spectrum

This is the core question for a lot of people evaluating these tools: who sees what?

What the coordination server knows

Even with Tailscale’s hosted service, your actual traffic is encrypted end-to-end via WireGuard and never passes through their servers (except briefly via DERP relays when direct connections fail). But the coordination server necessarily knows:

Every device in your network — public keys, hostnames, OS
When each device connects and disconnects
IP addresses of every node — public-facing IPs reveal physical location
Network topology — which nodes exist and how they’re grouped
ACL policies — your entire access control structure

For many users this metadata is benign. For others — regulated industries, privacy-conscious setups, strict data sovereignty requirements — this is sensitive information worth controlling.

The spectrum

These four tools sit on a clear sovereignty spectrum:

Level 1: Full sovereignty — Headscale with self-hosted DERP, or self-hosted NetBird. You control everything: coordination server, relay infrastructure, and (with NetBird) even the client. Network metadata never leaves your infrastructure. Even relay traffic stays on your boxes. This is the maximum-control option.

Level 2: Metadata exposure only — Tailscale. Traffic is peer-to-peer WireGuard, encrypted end-to-end. Only coordination metadata touches Tailscale’s servers. Tailscale’s position is that the coordination server is a low-trust component — it distributes public keys and policies, never private keys or traffic. This is architecturally true, but “low trust” isn’t “zero trust.”

Level 3: Full traffic through third party — Cloudflare Mesh. All traffic routes through Cloudflare’s edge. You’re not just exposing metadata — Cloudflare’s infrastructure is the transport layer for all your private network communication. Encrypted, yes, but passing through their network.

Self-hosted DERP: Closing the last gap

With Headscale, enabling the embedded DERP server and removing Tailscale’s public DERP servers from the config means even relay traffic never leaves your infrastructure. This is a frequently overlooked benefit: it’s not just coordination metadata that stays on your box, but relay path data too.

Key configuration details: embedded DERP is disabled by default — you must enable it. Tailscale’s public DERP servers are included as fallback by default — remove them for full sovereignty, but accept that your server becomes the only relay option. DERP access can be restricted to your tailnet members only.

The “already-in-Cloudflare” counterargument

The sovereignty concern with Cloudflare Mesh assumes they’re a new third party in your stack. But if you already run your public infrastructure through Cloudflare — DNS, CDN, WAF, DDoS protection, Cloudflare Tunnel — the trust decision is already made. All your WWW traffic already transits their network. Adding Mesh extends an existing trust boundary rather than creating a new one.

For these users, the practical benefits are real: unified trust boundary across public and private networking, shared security policies (Access rules, Gateway policies, device posture checks apply automatically), and secure server access without exposing SSH ports. Instead of having Cloudflare for public + Tailscale for private + possibly separate firewall rules for SSH access, everything consolidates into one provider, one set of policies, one dashboard.

What self-hosting doesn’t change

A few things stay the same regardless of whether you self-host:

WireGuard encryption is identical. The security of actual traffic is the same across Tailscale, Headscale, and NetBird — it’s WireGuard in all cases. Self-hosting doesn’t make the encryption stronger or weaker.

Client software trust. With Headscale, you’re still running Tailscale’s client code. You’ve replaced the server, not the client. With NetBird, you’re running their client — fully open source, but you’re still trusting code you probably haven’t audited line by line. The trust model shifts, it doesn’t disappear.

Operational security is on you. Self-hosting means you own the security hardening: firewall rules, updates, access controls, monitoring. A poorly secured Headscale server is worse than Tailscale’s hosted service, because you’ve added a point of failure without the team of security engineers Tailscale has maintaining theirs.

Cost Comparison

Let’s talk money. The pricing models are fundamentally different, and the right choice depends heavily on your scale.

	Tailscale	Headscale	NetBird	CF Mesh
Free tier	100 devices, 3 users	Unlimited (self-hosted)	Unlimited (self-hosted)	50 nodes, 50 users
First paid tier	$6/user/month (Starter)	Free forever	$5/user/month (Team cloud)	TBD
HA routes/exit nodes	$18/user/month (Premium)	Free	Free (all plans)	Free
SCIM provisioning	Enterprise (custom pricing)	N/A	Team plan ($5/user/mo)	Via CF Identity
Hidden costs (self-hosted)	None	Server + domain + time	Server + domain + more time	None

For a solo homelab with a handful of devices, all four are effectively free. The economics diverge at scale:

At 10 users / 50 devices: Tailscale Starter costs $60/month. Headscale is free (assuming you have a server). NetBird self-hosted is free; cloud is $50/month. Cloudflare Mesh is free under the 50-node cap.

At 50 users / 200 devices: Tailscale Starter jumps to $300/month. Headscale: still free. NetBird cloud: $250/month. Cloudflare Mesh: exceeds the free tier, paid pricing TBD.

If you need HA routes (and for business use, you probably do), Tailscale’s Premium tier at $18/user/month makes it expensive fast. Both Headscale and NetBird include HA for free.

The hidden cost of self-hosting is your time — setup, maintenance, troubleshooting. For a single sysadmin who already manages servers, this is marginal. For a team without dedicated ops, it’s a real factor. Tailscale’s pricing buys you freedom from operational burden. Whether that’s worth $6-18/user/month depends on how you value your time.

Practical Concerns

Self-hosting requirements

Concern	Headscale	NetBird
Public IP needed	Yes	Yes
CGNAT compatible	No (need VPS)	No (need VPS)
Server components	1 (single binary)	3+ (management, signal, relay)
Initial setup time	~30–60 min	Several hours
TLS certs required	Yes	Yes
OIDC provider needed	Optional	Required
Ongoing maintenance	Low (updates, backups)	Medium (3 services, relay infra)

OPSEC: Don’t reveal your infrastructure

This applies to both Headscale and NetBird self-hosted: your coordination server needs a public DNS entry, which is visible to anyone. Don’t run it on the same server as your main public-facing services. If headscale.example.com resolves to the same IP as www.example.com, you’ve publicly announced “this IP is also my mesh coordination server” — painting a target and revealing your infrastructure topology.

Better: a dedicated VM with a hostname that doesn’t obviously tie back to your main domain. hs.unrelated-domain.net reveals nothing about what else you run. Someone scanning your web server shouldn’t learn it’s also the brain of your private mesh network.

Scale

Tailscale is battle-tested at massive scale. Headscale has known instability beyond ~300 nodes (CLI timeouts, unreliable pings) — fine for homelab and small business, but a ceiling to know about. NetBird’s scale limits aren’t well documented yet. Cloudflare Mesh caps at 50 nodes on the free tier.

Client configuration friction

Tailscale and Cloudflare Mesh: install → authenticate → done. Simple for anyone.

Headscale: install the regular Tailscale client, then set a custom control plane URL that’s buried in a debug menu on mobile. Fine for sysadmins, annoying if you’re managing devices for less technical family members.

NetBird: own client, own onboarding flow. Not harder, but different — and switching from an existing Tailscale deployment means replacing the client on every device rather than just changing a URL.

When Each Tool Makes Sense

Choose Tailscale when:

Operational simplicity matters most — it just works
You rely on Funnel or Serve for exposing services
You need multiple tailnets or device sharing across organizations
You’re behind CGNAT and don’t want a VPS
You’re scaling beyond a few hundred devices
Polished admin UI and documentation matter to you

Choose Headscale when:

You want the lightest self-hosting lift — single binary, keep your existing Tailscale clients
Data sovereignty with minimal operational overhead
You have an existing server that can take on the role (free incremental cost)
Custom MagicDNS domain matters (server.mesh.yourdomain.net)
You want self-hosted DERP relay for both privacy and performance
A single tailnet is fine for your needs

Choose NetBird when:

Full self-hosting AND you want features Headscale lacks — dashboard, posture checks, reverse proxy, HA on free tier
Regulated environments where posture checks and SCIM provisioning matter
You want self-hosted Funnel/Serve equivalent (built-in reverse proxy since v0.65)
Maximum independence from any single vendor, including Tailscale
You’re starting fresh and don’t care about Tailscale client compatibility

Choose Cloudflare Mesh when:

You’re already a Cloudflare customer — extends existing trust boundary
Zero ops overhead — no servers, relays, or certs to maintain
Quick access to servers without exposing ports
Long-distance international routes where CF’s backbone excels
You don’t need data sovereignty or already trust Cloudflare with your traffic
Avoid if: UDP-sensitive workloads, regional high-throughput needs

Decision Framework: Choosing for Your Situation

Rather than picking a “winner,” here’s how to think about which tools fit your infrastructure. The decision depends on three things: what you already have, what you’re willing to maintain, and where you draw your trust boundaries.

If you’re starting from zero

Just start with Tailscale. Seriously. Get your mesh working, understand what you actually use it for, learn where it helps and where it frustrates you. Then evaluate alternatives from a position of experience, not theory. The free tier covers 100 devices and 3 users — more than enough to learn on.

If you’re running Tailscale and want to self-host

Headscale is the obvious path. Same clients, config-change migration, single binary to maintain. You lose Funnel/Serve and the admin UI, you gain custom DNS, own DERP, and full metadata control. If your Tailscale frustrations are about relay speed, DNS naming, or metadata — Headscale addresses all three.

If you also need posture checks, a built-in admin dashboard, SCIM provisioning, or a self-hosted Funnel/Serve equivalent — and you’re willing to accept the operational complexity of a three-component stack — NetBird is the more feature-complete self-hosted option.

If you’re already a Cloudflare customer

Cloudflare Mesh is worth trying as a secondary access path at minimum. It extends a trust boundary you already live inside, costs nothing within the 50-node free tier, and requires zero infrastructure. Whether it becomes your primary mesh depends on your throughput and UDP sensitivity requirements — the regional performance penalty and packet loss are real limitations.

If you have an underutilized server

This changes the economics entirely. The cost argument for managed services evaporates when the box is already paid for and maintained. Headscale’s incremental operational cost on an existing server is close to zero — it’s a single binary that barely shows up in htop. Self-hosting becomes the obvious choice when you’re not adding a monthly bill to do it.

If resilience matters

Run two. Any combination of the above gives you independent control planes with no shared failure modes. The hybrid approach — self-hosted primary plus managed backup — is something most comparison articles never consider, but it’s how experienced sysadmins actually build infrastructure.

Other Tools Worth Mentioning

One more name comes up in these conversations, but it solves a different problem:

Pangolin is a self-hosted tunneling tool using Traefik for reverse proxying. It’s geared toward selective service exposure rather than full mesh networking — more of a self-hosted Cloudflare Tunnel competitor. If you don’t need a full mesh VPN and just want to expose specific services, it’s worth a look. (Source)

My Scenario: What I’m Actually Going to Do

Theory is nice. Here’s my actual situation and what I’m planning.

I run two Hetzner nodes. One is my main web server — public-facing, all traffic already goes through Cloudflare (DNS, CDN, WAF, Tunnel). The other used to be my secondary DNS server, but DNS has moved to Cloudflare too. Now it mostly runs IRC, and I’m considering Matrix or Mattermost on it. It’s underutilized but already paid for and in my maintenance rotation.

I’m currently running Tailscale across everything. It works, but I’ve been bothered by three things: the relay speed on the free tier when direct connections fail (it can be atrocious), the fact that my MagicDNS names are Tailscale-assigned hashes, and the metadata exposure — Tailscale knows my full device inventory, connection patterns, and network topology.

My plan: both Headscale and Cloudflare Mesh.

Headscale on the Hetzner Cloud node

This is my server-to-server backbone. Zero additional cost — the box is already paid for. It adds Headscale alongside IRC/chat as an incremental workload, not a whole new server. I get full sovereignty over coordination and relay, custom MagicDNS domain, and DERP performance bounded by my own Hetzner bandwidth (which is excellent). The Headscale instance won’t be on the same hostname or IP as my main web server — OPSEC matters.

Since I’m already running Tailscale clients everywhere, switching to Headscale is a config change per device: point it at my new coordination server instead of Tailscale’s. No client reinstallation needed.

Cloudflare Mesh for desktop-to-server access

My daily workflow is SSH from my desktop to Hetzner nodes. For this, Cloudflare Mesh is perfect: install the Cloudflare One client on my desktop, enroll the servers as Mesh Nodes, done. No firewall rules to punch, no inbound ports to expose — agents connect outbound to Cloudflare’s edge. Since all my public traffic already flows through Cloudflare, this doesn’t add a new trust relationship.

More importantly, it’s my backup path. If I somehow get fail2banned from my own Headscale node (it happens), or the Hetzner VM goes down for maintenance, or I break something while setting up Mattermost — Cloudflare Mesh still gets my desktop to my servers through a completely independent control plane.

What I gain over current Tailscale

Spelling it out, because these are the specific things that bother me today and how each piece of the new setup addresses them:

Relay speed: Tailscale’s free DERP relays have been the single biggest pain point. When a direct connection fails — and it does, especially on mobile networks or behind hotel WiFi — the fallback relay speed drops off a cliff. With my own embedded DERP on the Headscale node, relay performance is bounded by my Hetzner connection, which is excellent. This alone justifies the switch.

Custom DNS domain: Instead of server.tail12ab.ts.net, I get server.mesh.mydomain.net. Every SSH config, every script, every bookmark becomes cleaner and consistent with my existing DNS. It’s the kind of thing that sounds trivial but compounds across hundreds of daily interactions.

Metadata sovereignty: My full device inventory, connection patterns, public IPs, and network topology stay on my infrastructure instead of Tailscale’s coordination server. Whether this matters depends on your threat model — for me, managing both personal and business infrastructure through the same mesh, it does.

Backup access path: If I lock myself out of my own Headscale node (it happens — fat-finger a firewall rule, fail2ban yourself, botch an update), Cloudflare Mesh gives me a completely independent way back in. This is something I don’t have today with Tailscale as my only mesh.

Zero additional cost: The ex-DNS Hetzner node is already in my budget. Headscale is free. Cloudflare Mesh’s free tier covers my needs easily. The total incremental cost of this migration is zero euros per month.

Why both?

Two independent mesh networks with zero shared failure modes. Headscale for the things that matter — server-to-server throughput, metadata sovereignty, custom DNS, fast relay. Cloudflare Mesh for the things where convenience matters — quick desktop access, no ports to manage, and a resilient backup path that doesn’t depend on anything I run.

If Cloudflare has an outage or changes their terms, my Headscale mesh keeps running. If my Headscale node goes down, Cloudflare Mesh still gets me to my servers. Neither depends on the other. That’s the kind of resilience you can’t get from going all-in on a single solution.

The migration path

The beauty of the Headscale approach is that migration from Tailscale is incremental, not a big-bang cutover. The steps:

Set up Headscale on the ex-DNS Hetzner node. Get it running, configure the custom DNS domain, enable embedded DERP, set up ACLs.
Generate pre-auth keys and start moving nodes one at a time: tailscale up --login-server https://hs.mydomain.net --authkey=...
Test thoroughly with a couple of nodes before moving the rest. Existing Tailscale connections keep working on nodes you haven’t migrated yet — nothing breaks during the transition.
Once everything is on Headscale, remove Tailscale’s public DERP servers from the config for full sovereignty.
In parallel, enroll the same nodes in Cloudflare Mesh as a backup path. The two networks operate independently — no conflicts.

If something goes wrong, tailscale up without --login-server points the node back to Tailscale’s hosted service. The safety net is always there.

What I’m not doing, and why

I’m not going with NetBird, even though it has more features than Headscale. The three-component server stack is more complexity than I want to maintain for a hybrid homelab/business setup, and I don’t need posture checks or SCIM provisioning. If I were starting from scratch with no Tailscale clients deployed, or if I were in a regulated environment requiring device compliance enforcement, NetBird would be the stronger choice. For my use case — moving an existing Tailscale deployment to self-hosted with minimal disruption — Headscale’s single binary and Tailscale client compatibility is the path of least resistance.

I’m also not going all-in on Cloudflare Mesh as my primary network. The performance data is clear: for Hetzner-to-Hetzner traffic (my bread and butter), P2P WireGuard is 3-5x faster. And 14% UDP packet loss rules out Cloudflare Mesh for anything latency-sensitive. But as a secondary access path that costs zero infrastructure and lives in a trust boundary I’m already inside? Perfect.

The bottom line

Most comparison articles assume you’re picking one tool. In practice, sysadmins layer tools based on trust boundaries and existing infrastructure. The question isn’t “which mesh VPN is best?” — it’s “where are my trust boundaries and what do I already have?”

If you already have a box sitting around, Headscale is practically free to add. If you’re already a Cloudflare customer, Mesh is practically free to try. If you need maximum independence, NetBird gives you everything self-hosted at the cost of more ops. And if none of this matters to you and you just want things to work, Tailscale remains excellent at what it does.

You probably don’t need to choose one. You need to decide what matters to you — sovereignty, convenience, performance, resilience — and pick the right tool for each layer. The tools are getting good enough that the “just pick one” era is over. Mix, match, and build something that reflects how you actually think about your infrastructure.

The mesh VPN space in 2026 is genuinely good. Tailscale proved the concept and made it mainstream. Headscale proved you could self-host the control plane without losing the client experience. NetBird proved you could build a fully independent stack with enterprise features. And now Cloudflare is proving that edge-routed mesh has a place alongside peer-to-peer, especially for users already in their ecosystem.

Competition is making all of these better. Tailscale’s userspace performance improvements came from being pushed by alternatives. NetBird’s reverse proxy feature directly addresses Tailscale Funnel’s proprietary lock-in. Cloudflare entering the space forces everyone to think about pricing and convenience more seriously. Whatever you choose, you’re choosing well — the floor has risen dramatically, and the differences are increasingly about philosophy and fit rather than basic capability.

I’ll write a follow-up once I’ve actually migrated and lived with the Headscale + Cloudflare Mesh setup for a few weeks. Theory is cheap; experience is what matters.

Sources and Further Reading

Cloudflare Mesh vs NetBird vs Tailscale: Performance Compared (NetBird) — also as YouTube video
Introducing Cloudflare Mesh (Cloudflare blog, April 2026)
Headscale Features
NetBird Documentation
Surpassing 10Gb/s with Tailscale (Tailscale blog)
Tailscale vs NetBird vs Headscale: Mesh VPN 2026 (PkgPulse)
Tailscale vs Pangolin vs Headscale (Medium)
I switched to Headscale instead of Tailscale, but most people probably shouldn’t (XDA)
Cloudflare Mesh Documentation
Headscale GitHub
NetBird GitHub
awesome-tunneling — comprehensive list of tunneling solutions

Self-hosted Git with Forgejo on RHEL

Jani Karlsson — Sat, 11 Apr 2026 14:27:00 +0000

I keep my own hardware, my own backups, and my own rules. GitHub is fine for open source — but for personal projects, config experiments, and anything I might eventually pipe into my own Nextcloud instance, I wanted the whole chain on my own server. No Microsoft in the middle, no AI training on my commits, no dependency on someone else’s uptime, my infra is up – my stuff is up.

This is a straight install guide for Forgejo on RHEL (10) — single binary, systemd service, MariaDB backend, Apache reverse proxy, and direct SSH access on port 2222. If you’re running a similar LAMP stack, this drops in cleanly.

What is Forgejo

Forgejo is a community-driven fork of Gitea — a lightweight, self-hosted Git forge. Web UI, issue tracker, pull requests, CI webhooks, the works. Single Go binary, no runtime dependencies, runs happily on modest hardware. It’s what Gitea should have stayed being before the commercial drift.

Prerequisites
This guide assumes RHEL 10 (or AlmaLinux/Rocky equivalent), Apache already running, MariaDB already running, and a domain with DNS pointing at your server. Adjust paths for other distros as needed.

Step 1: Create the git user

Forgejo runs as a dedicated system user. SSH git operations will also authenticate through this user.

useradd --system --shell /bin/bash --comment "Forgejo" --create-home --home-dir /home/git git

Create the directory structure Forgejo expects:

mkdir -p /var/lib/forgejo/{custom,data,log,repos}
chown -R git:git /var/lib/forgejo
chmod -R 750 /var/lib/forgejo

mkdir /etc/forgejo
chown root:git /etc/forgejo
chmod 770 /etc/forgejo

Step 2: Download the Forgejo binary

Grab the latest release from forgejo.org/releases. Check the current stable version before running this — Forgejo has short support windows (typically six to eight weeks per release).

FORGEJO_VERSION="14.0.3"
wget -O /usr/local/bin/forgejo \
  "https://codeberg.org/forgejo/forgejo/releases/download/v${FORGEJO_VERSION}/forgejo-${FORGEJO_VERSION}-linux-amd64"

chmod +x /usr/local/bin/forgejo

Verify the binary runs:

forgejo --version

Step 3: Create the MariaDB database

mysql -u root -p

CREATE DATABASE forgejo CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
CREATE USER 'forgejo'@'127.0.0.1' IDENTIFIED BY 'STRONG_PASSWORD_HERE';
GRANT ALL PRIVILEGES ON forgejo.* TO 'forgejo'@'127.0.0.1';
FLUSH PRIVILEGES;
EXIT;

Why 127.0.0.1 and not localhost?
In MariaDB, localhost means “connect via Unix socket” while 127.0.0.1 means “connect via TCP”. Forgejo connects via TCP to 127.0.0.1:3306, so the grant must match. If you create a user with @localhost, Forgejo won’t be able to authenticate.

charset matters
utf8mb4 is required. Plain utf8 in MariaDB is a broken 3-byte subset that chokes on emoji and some Unicode. Don’t skip the collation line.

Step 4: Create the app.ini configuration

Forgejo reads its config from /etc/forgejo/app.ini. Create it as root, then we’ll lock it down after first run.

cat > /etc/forgejo/app.ini << 'EOF'
APP_NAME = git.raatti.net
RUN_USER = git
RUN_MODE = prod
WORK_PATH = /var/lib/forgejo

[server]
DOMAIN           = git.raatti.net
HTTP_ADDR        = 127.0.0.1
HTTP_PORT        = 3001
ROOT_URL         = https://git.raatti.net/
DISABLE_SSH      = false
SSH_DOMAIN       = git.raatti.net
SSH_PORT         = 2222
START_SSH_SERVER = true
OFFLINE_MODE     = true

[database]
DB_TYPE  = mysql
HOST     = 127.0.0.1:3306
NAME     = forgejo
USER     = forgejo
PASSWD   = STRONG_PASSWORD_HERE
CHARSET  = utf8mb4

[repository]
ROOT = /var/lib/forgejo/repos

[log]
MODE      = file
LEVEL     = info
ROOT_PATH = /var/lib/forgejo/log

[security]
INSTALL_LOCK        = false
SECRET_KEY          =
INTERNAL_TOKEN      =

[service]
DISABLE_REGISTRATION = true
REQUIRE_SIGNIN_VIEW  = true

[mailer]
ENABLED = false
EOF

chown root:git /etc/forgejo/app.ini
chmod 640 /etc/forgejo/app.ini

Note on INSTALL_LOCK
INSTALL_LOCK = false here is intentional — it tells Forgejo to show the first-run setup wizard. After you complete the wizard in Step 8, Forgejo automatically sets this to true in app.ini. If it stays false, the installer will appear on every page load.

Note on port 3001
We use port 3001 instead of the default 3000 to avoid conflicts with other services (Grafana, for example, defaults to 3000). Pick any unused high port.

Note on registration
DISABLE_REGISTRATION = true and REQUIRE_SIGNIN_VIEW = true lock the instance down to invited users only. You’ll create your admin account during first-run setup, then this takes effect. Personal forge, not a public service.

Step 5: Create the systemd service

cat > /etc/systemd/system/forgejo.service << 'EOF'
[Unit]
Description=Forgejo - Beyond coding. We Forge.
After=network.target mariadb.service

[Service]
Type=simple
User=git
Group=git
WorkingDirectory=/var/lib/forgejo
ExecStart=/usr/local/bin/forgejo web --config /etc/forgejo/app.ini
Restart=on-failure
RestartSec=5s
EnvironmentFile=-/etc/forgejo/forgejo.env

PrivateTmp=true
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now forgejo

Confirm it’s running:

systemctl status forgejo
ss -tlnp | grep 3001

Step 6: Apache reverse proxy and Cloudflare Tunnel

We need two things: an Apache vhost to reverse proxy to Forgejo, and a Cloudflare Tunnel route so the outside world can reach it without opening firewall ports.

SELinux configuration

If you have SELinux enforcing (you should), Apache needs permission to make network connections to the Forgejo backend:

# Allow Apache to connect to network services
setsebool -P httpd_can_network_connect 1

# Label port 3001 as an HTTP port
semanage port -a -t http_port_t -p tcp 3001

If port 3001 is already labeled for something else, use -m instead of -a to modify it.

Apache vhost

Add a new vhost for git.raatti.net. Apache listens on 80 — Cloudflare Tunnel handles TLS termination at the edge.

cat > /etc/httpd/conf.d/git.raatti.net.conf << 'EOF'

    ServerName git.raatti.net

    ProxyPreserveHost On
    ProxyRequests Off

    ProxyPass        / http://127.0.0.1:3001/
    ProxyPassReverse / http://127.0.0.1:3001/

    RequestHeader set X-Forwarded-Proto "https"
    RequestHeader set X-Real-IP %{REMOTE_ADDR}s

    ErrorLog  /var/log/httpd/forgejo_error.log
    CustomLog /var/log/httpd/forgejo_access.log combined

EOF

apachectl configtest && systemctl reload httpd

Cloudflare Tunnel

If you’re already running Cloudflare Tunnel (see my Cloudflare Tunnel guide), you need to do two things: create the DNS route and add the ingress rule. Both are required — if you skip either one, you’ll get a 404.

First, create the DNS route so Cloudflare knows to send traffic for this hostname to your tunnel:

cloudflared tunnel route dns your-tunnel-name git.raatti.net

Then open /etc/cloudflared/config.yml in your editor and add a new ingress rule for the git hostname. Insert it before the catch-all http_status:404 rule at the end. For example, if your existing config looks like this:

tunnel: your-tunnel-id
credentials-file: /etc/cloudflared/credentials.json

ingress:
  - hostname: raatti.net
    service: http://localhost:80
  - hostname: www.raatti.net
    service: http://localhost:80
  - service: http_status:404

Add the git.raatti.net line so it becomes:

tunnel: your-tunnel-id
credentials-file: /etc/cloudflared/credentials.json

ingress:
  - hostname: raatti.net
    service: http://localhost:80
  - hostname: www.raatti.net
    service: http://localhost:80
  - hostname: git.raatti.net
    service: http://localhost:80
  - service: http_status:404

The DNS route tells Cloudflare’s edge to send traffic to your tunnel. The ingress rule tells cloudflared where to forward it locally. Without the ingress rule, cloudflared receives the request but doesn’t match any hostname and returns 404.

Restart cloudflared to pick up the config change:

systemctl restart cloudflared

The tunnel routes traffic through Cloudflare’s network to your server via an outbound connection — no inbound firewall rules needed for HTTP/HTTPS. Your origin stays invisible.

Why localhost:80 and not localhost:3001?
The tunnel points at Apache (port 80), which then proxies to Forgejo (port 3001). This keeps all HTTP routing in one place and lets Apache handle headers, logging, and any future vhost complexity.

Step 7: Restrict SSH to the Tailscale interface

Forgejo’s built-in SSH server listens on port 2222. Rather than opening this to the public internet, we bind it exclusively to the Tailscale network interface — so only trusted devices on your tailnet can reach it. No exposure, no port scanners, no brute force attempts.

First, find your Tailscale interface name and IP:

ip addr show tailscale0
# or: tailscale ip -4

Update app.ini to bind the SSH server to the Tailscale IP only:

[server]
; ... other settings ...
SSH_LISTEN_HOST  = 100.x.x.x   # your Tailscale IP
SSH_PORT         = 2222

Now open port 2222 in firewalld, but scoped to the Tailscale interface only — not the public zone:

# Add tailscale0 to the trusted or internal zone (not public)
firewall-cmd --permanent --zone=trusted --add-interface=tailscale0
firewall-cmd --permanent --zone=trusted --add-port=2222/tcp
firewall-cmd --reload

Confirm port 2222 is not reachable from the public zone:

firewall-cmd --zone=public --list-ports   # 2222 should NOT appear here
firewall-cmd --zone=trusted --list-ports  # 2222 should appear here

If you have SELinux enforcing, label the port:

semanage port -a -t ssh_port_t -p tcp 2222

Why Tailscale and not a firewall allowlist?
Tailscale uses WireGuard under the hood and authenticates devices with your identity provider — only enrolled devices can join the network at all. There’s no open port on the public internet for anyone to probe. Compared to an IP allowlist (which breaks when your ISP changes your address), it’s both more secure and more convenient.

Step 8: First-run setup

Navigate to https://git.raatti.net. The installer will appear. Most fields are pre-filled from your app.ini — verify the database credentials and set your admin account.

Create admin account now
Since we set DISABLE_REGISTRATION = true, you won’t be able to create accounts after the initial setup. Scroll down to the “Administrator Account Settings” section and create your admin user before clicking “Install Forgejo”.

Be patient on first install
The first-run setup creates all database tables and initial data. This can take several minutes — you may see a 502 gateway timeout on your first attempt. Wait 5–6 minutes and refresh. Don’t click “Install” multiple times.

After completing the wizard, Forgejo writes the generated SECRET_KEY and INTERNAL_TOKEN into app.ini, and sets INSTALL_LOCK = true automatically. Lock the file down once that’s done:

chmod 640 /etc/forgejo/app.ini

Step 9: SSH key setup for git access

On your client machine, make sure it’s enrolled in Tailscale, then add the SSH config so git uses port 2222 via the Tailscale IP transparently:

# ~/.ssh/config
Host git.raatti.net
    User git
    Port 2222
    HostName 100.x.x.x     # your server's Tailscale IP
    IdentityFile ~/.ssh/id_ed25519

In the Forgejo web UI, add your public key under Settings → SSH / GPG Keys. Then test:

ssh -T git@git.raatti.net

You should see: Hi username! You’ve successfully authenticated…

Clone URL pattern for your repos will be:

git clone git@git.raatti.net:username/repo.git
# or via HTTPS (available publicly through Cloudflare Tunnel):
git clone https://git.raatti.net/username/repo.git

SSH vs HTTPS access model
SSH (push/pull over port 2222) is Tailscale-only — trusted devices only. HTTPS read access goes through Cloudflare Tunnel and is gated by REQUIRE_SIGNIN_VIEW = true, so unauthenticated visitors see nothing. You get a private forge that’s invisible to the internet at the transport layer.

Migrating from GitHub

Forgejo has a built-in migration tool under + → New Migration → GitHub. It pulls the repo, issues, labels, milestones, and wiki — all you need is a GitHub personal access token. For private repos you want to fully exit, it’s the cleanest path.

For repos you want to keep public, Forgejo supports push mirrors — your Forgejo instance is the source of truth, and it pushes automatically to GitHub (or GitLab, Codeberg, wherever). You stay in control of the canonical copy while maintaining a public face wherever your audience is.

Keeping Forgejo updated automatically

Forgejo stable releases have short support windows — typically around six to eight weeks. Staying current isn’t optional; an unsupported release won’t get security patches. Rather than tracking this manually, a weekly systemd timer handles it cleanly.

The approach: check the Codeberg API for the latest release, compare against the installed version, download and verify if newer, swap the binary, restart the service. Everything logged. No action taken if already current.

Create the update script:

cat > /usr/local/sbin/forgejo-update.sh << 'SCRIPT'
#!/usr/bin/env bash
set -euo pipefail

LOG=/var/log/forgejo-update.log
BIN=/usr/local/bin/forgejo
ARCH=linux-amd64

log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG"; }

LATEST=$(curl -fsSL \
  "https://codeberg.org/api/v1/repos/forgejo/forgejo/releases?limit=10&pre-release=false" \
  | grep -oP '"tag_name":\s*"v\K[0-9]+\.[0-9]+\.[0-9]+' \
  | head -1)

if [[ -z "$LATEST" ]]; then
  log "ERROR: Could not fetch latest version from Codeberg API"
  exit 1
fi

CURRENT=$("$BIN" --version 2>&1 | grep -oE '[0-9]+\.[0-9]+\.[0-9]+' | head -1)

if [[ -z "$CURRENT" ]]; then
  log "ERROR: Could not determine installed Forgejo version"
  exit 1
fi

log "Installed: v${CURRENT}  |  Latest: v${LATEST}"

if [[ "$CURRENT" == "$LATEST" ]]; then
  log "Already up to date, nothing to do."
  exit 0
fi

log "Updating Forgejo v${CURRENT} -> v${LATEST}"

DOWNLOAD_URL="https://codeberg.org/forgejo/forgejo/releases/download/v${LATEST}/forgejo-${LATEST}-${ARCH}"
SHA_URL="${DOWNLOAD_URL}.sha256"
TMPBIN=$(mktemp /tmp/forgejo.XXXXXX)

log "Downloading binary..."
curl -fsSL -o "$TMPBIN" "$DOWNLOAD_URL"
curl -fsSL "$SHA_URL" | awk '{print $1}' > /tmp/forgejo.sha256.expected
echo "$(sha256sum $TMPBIN | awk '{print $1}')" > /tmp/forgejo.sha256.actual

if ! diff -q /tmp/forgejo.sha256.expected /tmp/forgejo.sha256.actual >/dev/null 2>&1; then
  log "ERROR: SHA256 checksum mismatch, aborting update."
  rm -f "$TMPBIN" /tmp/forgejo.sha256.*
  exit 1
fi

log "Checksum OK. Swapping binary and restarting service..."
chmod +x "$TMPBIN"
mv -f "$TMPBIN" "$BIN"
rm -f /tmp/forgejo.sha256.*

systemctl restart forgejo
sleep 3

if systemctl is-active --quiet forgejo; then
  log "Forgejo restarted successfully on v${LATEST}."
else
  log "ERROR: Forgejo failed to restart after update. Check: journalctl -u forgejo"
  exit 1
fi
SCRIPT

chmod +x /usr/local/sbin/forgejo-update.sh

Create the systemd service and timer:

cat > /etc/systemd/system/forgejo-update.service << 'EOF'
[Unit]
Description=Forgejo automatic update
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/local/sbin/forgejo-update.sh
EOF

cat > /etc/systemd/system/forgejo-update.timer << 'EOF'
[Unit]
Description=Weekly Forgejo update check

[Timer]
OnCalendar=Mon *-*-* 03:00:00
RandomizedDelaySec=1800
Persistent=true

[Install]
WantedBy=timers.target
EOF

systemctl daemon-reload
systemctl enable --now forgejo-update.timer

Confirm the timer is scheduled:

systemctl list-timers forgejo-update.timer

Run it manually for the first time to confirm everything works:

systemctl start forgejo-update.service
tail -f /var/log/forgejo-update.log

What this does and doesn’t do
The script updates to the latest stable release only — pre-releases are excluded. It verifies the SHA256 checksum before swapping the binary. If the checksum fails or the service doesn’t come back up, it logs the error and exits. Your app.ini and data directories are untouched — only the binary is replaced.

Major version upgrades
Forgejo occasionally requires a database migration on major version bumps (e.g. v14 → v15). The update script handles patch and minor releases safely, but review the release notes before a major version jump. If you’d rather approve major upgrades manually, add a version check: if [[ "${LATEST%%.*}" != "${CURRENT%%.*}" ]]; then log "Major version bump detected, skipping."; exit 0; fi

Backing up to Nextcloud with rclone

The goal: daily git bundles of all repos, synced to Nextcloud over WebDAV. The Nextcloud URL and credentials stay in a protected config file — nothing exposed in scripts or logs. rclone handles retries, resume on interrupted transfers, and bandwidth limiting.

Install rclone

rclone is available in EPEL, or grab the latest static binary directly:

# From EPEL
dnf install rclone

# Or latest from rclone.org
curl -O https://downloads.rclone.org/rclone-current-linux-amd64.zip
unzip rclone-current-linux-amd64.zip
cp rclone-*-linux-amd64/rclone /usr/local/bin/
chmod +x /usr/local/bin/rclone

Configure the Nextcloud remote

Run the interactive config as root (since the backup script runs as root):

rclone config

Follow the prompts:

n) New remote
name> nextcloud
Storage> webdav
url> https://nxYYYY.your-storageshare.de/remote.php/dav/files/USERNAME/
vendor> nextcloud
user> your-username
password> (enter an app password, not your main password)
bearer_token> (leave blank)
Edit advanced config? n

This creates /root/.config/rclone/rclone.conf. The password is automatically obscured. Lock it down:

chmod 600 /root/.config/rclone/rclone.conf

Test the connection:

rclone lsd nextcloud:

You should see your Nextcloud folders listed.

Finding the correct WebDAV URL
The WebDAV URL varies between Nextcloud providers. To find yours: log into Nextcloud, go to Files, click the Settings gear icon at the bottom left, and look for the WebDAV URL. For Hetzner Storage Share it’s https://nxYYYY.your-storageshare.de/remote.php/dav/files/USERNAME/. The trailing slash matters.

Use an app password
In Nextcloud, go to Settings → Security → Devices & sessions and create an app password specifically for rclone. This way you can revoke it independently without changing your main password, and it bypasses 2FA. Regular password login is often disabled for WebDAV.

Create the backup script

This script creates git bundles from all repositories and syncs them to Nextcloud. Bundles are self-contained — they include the full history and can recreate the repo from scratch.

cat > /usr/local/sbin/forgejo-backup.sh << 'SCRIPT'
#!/usr/bin/env bash
set -euo pipefail

REPO_ROOT="/var/lib/forgejo/repos"
BACKUP_DIR="/var/lib/forgejo/backups"
REMOTE="nextcloud:Backups/forgejo"
LOG="/var/log/forgejo-backup.log"
RETENTION_DAYS=7
BANDWIDTH="5M"

log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG"; }

log "=== Starting Forgejo backup ==="
mkdir -p "$BACKUP_DIR"

BUNDLE_COUNT=0
while IFS= read -r -d '' repo; do
    rel_path="${repo#$REPO_ROOT/}"
    rel_path="${rel_path%.git}"
    bundle_name="$(echo "$rel_path" | tr '/' '_')_$(date +%Y%m%d).bundle"
    bundle_path="$BACKUP_DIR/$bundle_name"
    log "Bundling: $rel_path"
    if git -C "$repo" bundle create "$bundle_path" --all 2>/dev/null; then
        ((BUNDLE_COUNT++)) || true
    else
        log "WARNING: Failed to bundle $rel_path (might be empty repo)"
    fi
done < <(find "$REPO_ROOT" -maxdepth 3 -type d -name "*.git" -print0)

log "Created $BUNDLE_COUNT bundles"
find "$BACKUP_DIR" -name "*.bundle" -mtime +"$RETENTION_DAYS" -delete

log "Syncing to Nextcloud..."
if rclone copy "$BACKUP_DIR" "$REMOTE" \
    --bwlimit "$BANDWIDTH" \
    --retries 5 \
    --retries-sleep 30s \
    --log-file "$LOG" \
    --log-level INFO; then
    log "Sync completed successfully"
else
    log "ERROR: Sync failed"
    exit 1
fi

rclone delete "$REMOTE" --min-age "${RETENTION_DAYS}d" --log-file "$LOG" --log-level INFO || true
log "=== Backup complete ==="
SCRIPT

chmod +x /usr/local/sbin/forgejo-backup.sh

Create the systemd timer

Run the backup daily at 4 AM:

cat > /etc/systemd/system/forgejo-backup.service << 'EOF'
[Unit]
Description=Forgejo backup to Nextcloud
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/local/sbin/forgejo-backup.sh
EOF

cat > /etc/systemd/system/forgejo-backup.timer << 'EOF'
[Unit]
Description=Daily Forgejo backup

[Timer]
OnCalendar=*-*-* 04:00:00
RandomizedDelaySec=900
Persistent=true

[Install]
WantedBy=timers.target
EOF

systemctl daemon-reload
systemctl enable --now forgejo-backup.timer

Confirm the timer is scheduled:

systemctl list-timers forgejo-backup.timer

Run it manually to test:

systemctl start forgejo-backup.service
tail -f /var/log/forgejo-backup.log

Restoring from a bundle

If you ever need to restore a repo from a bundle:

# Clone from bundle
git clone repo_name_20260406.bundle restored-repo

# Or restore into existing repo
cd existing-repo
git pull /path/to/repo_name_20260406.bundle

What gets backed up
Git bundles contain the complete repository history — all branches, all tags, all commits. What they don’t contain: issues, pull requests, wiki content, webhooks, or user settings. Those live in the MariaDB database. For a complete disaster recovery solution, also back up /etc/forgejo/app.ini and run mysqldump forgejo on a schedule.

Bandwidth limiting
The script defaults to 5 MB/s upload limit. Adjust the BANDWIDTH variable as needed. rclone also supports time-based limits like "08:00,1M 23:00,10M" to throttle during business hours.

Summary

Forgejo sits at around 50MB RAM at idle on this server — barely noticeable alongside the rest of the stack. The whole setup took under an hour from binary download to first commit pushed. If you’re already running Apache and MariaDB, there’s almost no additional moving parts.

The access model is deliberately layered. The web UI and HTTPS cloning go through Cloudflare Tunnel — always encrypted, origin hidden, login required. SSH access for push/pull is bound to the Tailscale interface only, invisible to the public internet entirely. Port 2222 simply doesn’t exist from the outside.

For repos that need a public face, Forgejo’s push mirror feature handles that cleanly — your server stays the source of truth, GitHub or Codeberg get a read-only copy. When the next GitHub acquisition scare or terms-of-service change rolls around, you’re already somewhere else. Your code stays on your hardware, under your control, and nobody else can bring it down.

Simplicity is a Feature: Migrating to Cloudflare Tunnel on Red Hat Linux

Jani Karlsson — Sun, 22 Mar 2026 12:18:33 +0000

“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.”
— Antoine de Saint-Exupéry

Unnecessary is the enemy of perfect. This is a principle I keep coming back to when managing server infrastructure. It is not enough to have something that works — if it works through unnecessary complexity, it is already a liability waiting to become a problem.

This post is about removing something that worked perfectly fine, because it did not need to exist.

For the full story of how that setup came to be — including the bandwidth mystery, wp-cron self-hammering, and compiling mod_evasive from source — see From Bandwidth Mystery to Hardened Origin.

The Old Setup

My server at raatti.net runs Red Hat Linux with Apache and php-fpm. Traffic goes through Cloudflare, which handles DDoS protection, caching, and TLS termination.

To prevent anyone from bypassing Cloudflare and hitting the origin directly, I maintained firewall rules that restricted HTTP and HTTPS access to Cloudflare’s published IP ranges:

rule family="ipv4" source ipset="cloudflare-ipv4" port port="80" protocol="tcp" accept
rule family="ipv6" source ipset="cloudflare-ipv6" port port="80" protocol="tcp" accept
rule family="ipv4" source ipset="cloudflare-ipv4" port port="443" protocol="tcp" accept
rule family="ipv6" source ipset="cloudflare-ipv6" port port="443" protocol="tcp" accept
rule family="ipv4" source address="127.0.0.1" port port="80" protocol="tcp" accept

But Cloudflare manages dozens of IP ranges across IPv4 and IPv6, and they update them. Which means either you script the updates, or you do them manually, or you quietly forget about it and hope nothing changes. None of these options are great. All of them are unnecessary.

Five rules. Two IP families. One nagging feeling that Cloudflare updated their ranges last Tuesday.

There is a better way: Cloudflare Tunnel.

What is Cloudflare Tunnel?

Instead of accepting inbound connections from Cloudflare’s IPs, your server initiates an outbound connection to Cloudflare’s network. Cloudflare then routes traffic through that connection to your origin. No inbound ports. No IP allowlists. No firewall rules to maintain for HTTP/HTTPS at all.

The tunnel is persistent, runs as a systemd service, and is free on Cloudflare’s free plan. It is production-ready — not to be confused with TryCloudflare quick tunnels, which are for development only.

Installation on Red Hat Linux

1. Add the Cloudflare repository and install cloudflared

# Add Cloudflare's RPM repository
sudo dnf config-manager --add-repo https://pkg.cloudflare.com/cloudflared.repo

# Install cloudflared
sudo dnf install -y cloudflared

Verify the installation:

cloudflared --version

2. Authenticate with Cloudflare

cloudflared tunnel login

This opens a browser window. Log in to your Cloudflare account and select the domain you want to use. A certificate (cert.pem) is saved to ~/.cloudflared/.

3. Create the tunnel

cloudflared tunnel create raatti-tunnel

Note the tunnel UUID from the output — you will need it shortly. You can also list tunnels at any time:

cloudflared tunnel list

4. Create the configuration file

Create the config directory and file:

sudo mkdir -p /etc/cloudflared
sudo nano /etc/cloudflared/config.yml

tunnel: 
credentials-file: /root/.cloudflared/.json

ingress:
  - hostname: raatti.net
    service: http://localhost:80
  - hostname: www.raatti.net
    service: http://localhost:80
  - service: http_status:404

The catch-all rule at the end is required — requests that do not match any hostname return a 404. cloudflared will refuse to start without it. It has standards.

Note: Traffic between cloudflared and Apache is local (localhost), so no TLS is needed there. Cloudflare handles TLS termination at the edge.

5. Route DNS to the tunnel

Your tunnel UUID is needed here. Find it with:

cloudflared tunnel list

If this is a fresh domain with no existing DNS records, let cloudflared create them automatically:

cloudflared tunnel route dns raatti-tunnel raatti.net
cloudflared tunnel route dns raatti-tunnel www.raatti.net

If you already have A records pointing to your server (like most migrations), cloudflared will error with “A record with that host already exists”. You have two options:

Option A — Delete and recreate (CLI):

Go to Cloudflare Dashboard → your domain → DNS → Records
Delete the existing A records for raatti.net and www.raatti.net
Then run the cloudflared tunnel route dns commands above as normal

Option B — Edit in place (UI):

Go to Cloudflare Dashboard → your domain → DNS → Records
Edit each existing A record:
- Change type from A to CNAME
- Set target to .cfargotunnel.com
- Make sure Proxy status is set to Proxied (orange cloud)
Save

Either way, the result is the same: a proxied CNAME pointing at your tunnel.

6. Install and start as a systemd service

sudo cloudflared service install
sudo systemctl enable --now cloudflared

Check that it is running:

sudo systemctl status cloudflared

You should see the tunnel connect and show a Healthy status in the Cloudflare dashboard under Zero Trust → Networks → Tunnels.

Simplifying the Firewall

This is the satisfying part. All those Cloudflare IP rules can go:

sudo firewall-cmd --permanent --remove-rich-rule='rule family="ipv4" source ipset="cloudflare-ipv4" port port="80" protocol="tcp" accept'
sudo firewall-cmd --permanent --remove-rich-rule='rule family="ipv6" source ipset="cloudflare-ipv6" port port="80" protocol="tcp" accept'
sudo firewall-cmd --permanent --remove-rich-rule='rule family="ipv4" source ipset="cloudflare-ipv4" port port="443" protocol="tcp" accept'
sudo firewall-cmd --permanent --remove-rich-rule='rule family="ipv6" source ipset="cloudflare-ipv6" port port="443" protocol="tcp" accept'
sudo firewall-cmd --reload

Verify what remains open:

sudo firewall-cmd --list-all

Note: the localhost rule for port 80 is intentionally kept — both cloudflared (proxying tunnel traffic to Apache) and WordPress cron make local HTTP requests to 127.0.0.1:80.

HTTP and HTTPS no longer need to be reachable from the internet at all. The tunnel uses outbound port 443, which is almost certainly already permitted by default. Your origin is now unreachable except through Cloudflare.

SELinux on Red Hat Linux

Red Hat Linux runs SELinux in enforcing mode by default. cloudflared works out of the box without any additional policy changes — it runs as a system service communicating over standard ports, which SELinux handles without complaint.

You can verify this yourself:

sudo ausearch -m avc -ts recent | grep cloudflared

No output means no denials. Nothing to do here. I was almost disappointed.

The Result

The Apache and php-fpm configuration did not change. The site works exactly as before. What changed is what is no longer there: no IP allowlist, no maintenance burden, no firewall rules for HTTP or HTTPS.

The server now has fewer open ports, no inbound web traffic, and one less thing to think about.

That is the point. Perfection is not adding better firewall rules. It is not needing them. Cloudflare Tunnel makes this possible — for free, I might add. Love Cloudflare

From Bandwidth Mystery to Hardened Origin: A Day of Server Security

Jani Karlsson — Fri, 20 Mar 2026 20:15:10 +0000

Date: March 20, 2026
Server: RHEL 10.1, bare metal, Apache, WordPress (raatti.net), Cloudflare

It Started with Bandwidth

The trigger was simple: unusual bandwidth usage, direction unknown. What followed was a full security audit that touched firewall architecture, WordPress internals, and in a moment of desperation, compiling software from source like it’s 2003 and we’re configuring Gentoo.

What We Found

1. Attack Traffic, Not Compromise

The top bandwidth consumers looked alarming at first: 350MB to /wp-login.php (credential stuffing), hits to /wp-content/themes/seotheme/db.php (known malware backdoor path), and hits to /wp-content/plugins/fix/up.php (generic webshell path).

All returned 301/404. The files didn’t exist. The server wasn’t compromised — it was being probed by the usual internet background radiation of bots, scanners, and script kiddies who’ve automated their disappointment. The bandwidth came from Apache politely responding to thousands of automated attack requests.

Lesson: High bandwidth doesn’t mean breach. Check status codes before panicking.

2. WordPress Cron Calling Its Own Public IP

The second-highest source IP by request count was the server itself — hammering wp-cron.php via its own public interface. Classic WordPress behavior: every page load can trigger a cron run, which calls back to the public URL instead of just running a cron job like a normal Unix citizen.

Fix, step 1: Disable the built-in cron in wp-config.php. Open the file and add this line before the /* That's all, stop editing! */ comment near the bottom:

define('DISABLE_WP_CRON', true);

Fix, step 2: Replace it with a real system cron that hits localhost with the correct Host header so Apache routes it to the right vhost:

*/5 * * * * curl -s "http://localhost/wp-cron.php?doing_wp_cron" -H "Host: www.raatti.net" > /dev/null 2>&1

Add that line to Apache’s crontab (crontab -u apache -e). The Host header is critical — without it Apache serves the default vhost and wp-cron runs in the wrong context.

Lesson: wp-cron.php should never call the public IP. Always route it via localhost.

What We Built

mod_evasive on RHEL 10

EPEL doesn’t yet carry mod_evasive for RHEL 10. Yes, we compiled software. No, this is not Gentoo. It took about 30 seconds and produced a proper RPM like a civilized person, rebuilding from the Fedora 43 source RPM. mod_evasive is a single C file against the Apache APR API — it compiled cleanly with no spec changes. It runs inside Apache’s own process, blocks on the request itself (not after a log poll), and has zero external dependencies.

dnf install -y rpm-build httpd-devel
curl -LO https://dl.fedoraproject.org/pub/fedora/linux/releases/43/Everything/source/tree/Packages/m/mod_evasive-2.4.0-2.fc43.src.rpm
rpmbuild --rebuild mod_evasive-2.4.0-2.fc43.src.rpm
dnf install -y ~/rpmbuild/RPMS/x86_64/mod_evasive-2.4.0-2.el10.x86_64.rpm

Note: the module installs as mod_evasive24.so — the IfModule directive must use mod_evasive24.c, not mod_evasive20.c. Create the log directory before starting Apache:

mkdir -p /var/log/mod_evasive
chown apache:apache /var/log/mod_evasive

Lesson: When a package isn’t in RHEL/EPEL yet, the nearest Fedora src.rpm is usually a clean rebuild. Verify the module name matches what the package actually installs.

Cloudflare-Only Origin Access via firewalld ipset

The goal: only Cloudflare IP ranges can reach ports 80 and 443. Direct connections to the origin IP get rejected. WordPress in particular becomes a much happier place when the only thing that can reach it is Cloudflare — no direct scans, no credential stuffing hitting Apache directly, no wasted resources responding to bots. The internet sees a Cloudflare IP; the origin stays invisible.

Step 1: Create the ipset and populate it

# Create the directory and fetch Cloudflare IP ranges
mkdir -p /etc/httpd/static
curl -s https://www.cloudflare.com/ips-v4 > /etc/httpd/static/cloudflare.lst

# Create the ipset
firewall-cmd --permanent --new-ipset=cloudflare-ipv4 --type=hash:net --option=family=inet
firewall-cmd --permanent --new-ipset=cloudflare-ipv6 --type=hash:net --option=family=inet6

# Populate IPv4 from file
firewall-cmd --permanent --ipset=cloudflare-ipv4 --add-entries-from-file=/etc/httpd/static/cloudflare.lst

# Fetch and populate IPv6 ranges (same approach as IPv4)
curl -s https://www.cloudflare.com/ips-v6 > /etc/httpd/static/cloudflare6.lst
firewall-cmd --permanent --ipset=cloudflare-ipv6 --add-entries-from-file=/etc/httpd/static/cloudflare6.lst

Step 2: Add the allow rules

# Allow Cloudflare IPv4 on 80 and 443
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source ipset="cloudflare-ipv4" port port="80" protocol="tcp" accept'
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source ipset="cloudflare-ipv4" port port="443" protocol="tcp" accept'

# Allow localhost (for wp-cron)
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="127.0.0.1" port port="80" protocol="tcp" accept'

# Allow Cloudflare IPv6
firewall-cmd --permanent --add-rich-rule='rule family="ipv6" source ipset="cloudflare-ipv6" port port="80" protocol="tcp" accept'
firewall-cmd --permanent --add-rich-rule='rule family="ipv6" source ipset="cloudflare-ipv6" port port="443" protocol="tcp" accept'

# Remove any unconditional http/https service allows and open ports
firewall-cmd --permanent --remove-service=http
firewall-cmd --permanent --remove-service=https
firewall-cmd --permanent --remove-port=80/tcp
firewall-cmd --permanent --remove-port=443/tcp

firewall-cmd --reload

Important: Do NOT add bare reject rules for ports 80/443. The zone’s default target already rejects unmatched traffic — adding explicit rejects causes them to land in filter_IN_public_deny, which fires before the allow rules. More on this below.

Step 3: Weekly sync cron to keep ranges current

cat > /etc/cron.weekly/update-cloudflare-ips << 'EOF'
#!/bin/bash
set -e

TMPFILE=$(mktemp)
LSTFILE=/etc/httpd/static/cloudflare.lst
LST6FILE=/etc/httpd/static/cloudflare6.lst

# First-run guard: if no existing list, create empty file
if [ ! -f "$LSTFILE" ]; then
    echo "First run detected, creating initial list"
    touch "$LSTFILE"
fi
if [ ! -f "$LST6FILE" ]; then
    touch "$LST6FILE"
fi

# Fetch current Cloudflare IPv4 ranges
if ! curl -sf https://www.cloudflare.com/ips-v4 > "$TMPFILE"; then
    echo "Failed to fetch Cloudflare IPs, aborting"
    rm -f "$TMPFILE"
    exit 1
fi

# Sanity check
COUNT=$(wc -l < "$TMPFILE")
if [ "$COUNT" -lt 10 ]; then
    echo "Suspiciously few ranges ($COUNT), aborting"
    rm -f "$TMPFILE"
    exit 1
fi

# Log what changed
echo "=== Removed IPv4 ranges ==="
comm -23 <(sort "$LSTFILE") <(sort "$TMPFILE")
echo "=== Added IPv4 ranges ==="
comm -13 <(sort "$LSTFILE") <(sort "$TMPFILE")

# Flush and repopulate IPv4
firewall-cmd --permanent --ipset=cloudflare-ipv4 --remove-entries-from-file="$LSTFILE"
firewall-cmd --permanent --ipset=cloudflare-ipv4 --add-entries-from-file="$TMPFILE"
cp "$TMPFILE" "$LSTFILE"

# Now sync IPv6 ranges
if curl -sf https://www.cloudflare.com/ips-v6 > "$TMPFILE"; then
    echo "=== Removed IPv6 ranges ==="
    comm -23 <(sort "$LST6FILE") <(sort "$TMPFILE")
    echo "=== Added IPv6 ranges ==="
    comm -13 <(sort "$LST6FILE") <(sort "$TMPFILE")
    firewall-cmd --permanent --ipset=cloudflare-ipv6 --remove-entries-from-file="$LST6FILE"
    firewall-cmd --permanent --ipset=cloudflare-ipv6 --add-entries-from-file="$TMPFILE"
    cp "$TMPFILE" "$LST6FILE"
else
    echo "Failed to fetch Cloudflare IPv6 ranges, skipping"
fi

rm -f "$TMPFILE"

firewall-cmd --reload
echo "Cloudflare IP ranges updated: IPv4=$COUNT ranges active"
EOF

chmod +x /etc/cron.weekly/update-cloudflare-ips

Lesson: Append-only IP allowlists accumulate stale entries. The weekly cron does a full flush+repopulate — not just append — to handle Cloudflare retiring old ranges.

The Hard-Learned Lesson: firewalld Rule Chain Order

This one hurt. After adding the Cloudflare ipset allow rules, bare reject rules were added for 80/443 to block everything else. The site immediately went down with Cloudflare 521. Smooth.

The root cause: firewalld silently routes rich rules into different nftables sub-chains depending on whether they have a source match. Source-less reject rules land in filter_IN_public_deny; source-specific accept rules land in filter_IN_public_allow. The execution order is deny then allow — so the bare port 443 reject fired before the Cloudflare ipset accept could even see the traffic.

The fix: remove the explicit reject rules entirely. The zone’s default target already rejects anything not matched by the allow chain.

firewall-cmd --permanent --remove-rich-rule='rule family="ipv4" port port="443" protocol="tcp" reject'
firewall-cmd --permanent --remove-rich-rule='rule family="ipv4" port port="80" protocol="tcp" reject'
firewall-cmd --permanent --remove-rich-rule='rule family="ipv6" port port="443" protocol="tcp" reject'
firewall-cmd --permanent --remove-rich-rule='rule family="ipv6" port port="80" protocol="tcp" reject'
firewall-cmd --reload

Lesson: In firewalld, source-less rich rules go into the deny chain which runs before the allow chain. Never add bare port reject rules alongside source-specific accept rules for the same port — the zone default target does the job safely and in the right order.

Bonus Finds Along the Way

ModSecurity already active — quietly caught a path traversal attempt (/.%2e/.%2e/.%2e/bin/sh) mid-session without anyone asking it to
WebDAV PROPFIND requests arriving from bots — not a threat per se, but if you’re not running WebDAV (you’re not), there’s no reason to respond to it. LimitExcept GET POST HEAD is your friend.

Key Takeaways

Start with bytes, not requests — top IPs by request count is misleading; top IPs by bytes transferred shows the real problem
wp-cron belongs on localhost — set DISABLE_WP_CRON to true in wp-config.php and replace with a system cron using the correct Host header
Compile from Fedora src.rpm when EPEL lags — usually works cleanly for simple C modules, takes less time than complaining about it
firewalld chain order is not what you expect — bare reject rules go into the deny chain, which runs before the allow chain
Explicit rejects are often redundant — the zone default target rejects unmatched traffic anyway, and does so in the right order
Append-only IP lists rot — always full sync with diff logging
mod_remoteip is a prerequisite for any IP-based rate limiting behind Cloudflare — without it you ban Cloudflare itself, which is a bad afternoon. See the Hetzner server guide for setup details.
Cloudflare-only origin access transforms WordPress security — when attackers can’t reach the origin directly, an enormous attack surface simply disappears

Update: Two days after this post, the entire Cloudflare IP allowlist approach was replaced by Cloudflare Tunnel — which eliminates the need for inbound firewall rules entirely. The ipset technique documented here is still valid for setups that can’t use tunnels, but if you can, the tunnel is simpler.

How to Build a Secure Ubuntu Web Server on Hetzner Cloud (The Right Way)

Jani Karlsson — Thu, 19 Mar 2026 20:50:22 +0000

So you’ve decided to spin up a cloud server. Brave soul. The internet is full of curious visitors — and by “curious visitors” we mean automated bots that will start hammering your SSH port approximately 4 seconds after your server gets a public IP. This guide walks through setting up a production-grade, reasonably paranoid Ubuntu 24.04 server on Hetzner Cloud: Apache 2.4 + PHP 8.3 FPM + MariaDB, protected by UFW, fail2ban, ModSecurity, and Cloudflare. Pour yourself a coffee or crack open a Battery Energy Drink. We have work to do.

The Golden Rule
Everything gets configured and locked down before we open the traffic gates. No half-built server exposed to the internet. We install, we configure, we test, then — and only then — we open the door.

Building a production-grade, reasonably paranoid server.

What We’re Building

A Hetzner Cloud VM running Ubuntu 24.04 LTS
SSH locked down to your own ISP IP range from day one
UFW firewall — SSH allow added before enabling, everything else locked until ready
Apache 2.4 + PHP 8.3 FPM
ModSecurity 2 with OWASP CRS — configured before opening web ports
mod_remoteip for Cloudflare — configured before activating Cloudflare proxy
MariaDB — hardened before any public traffic
Let’s Encrypt SSL via Certbot
fail2ban watching SSH and Apache — running before 443 opens
UFW opened for HTTP/HTTPS only after all of the above is done