Case Studies Infrastructure Repatriation

We Spent €11/month Testing Docker Swarm So You Don't Have To

Eduardo Sanchez December 8, 2025 9 min read RSS

Executive Summary: Load Test Results

We tested four architectures with identical code, identical load patterns (up-to 1200 concurrent users, 4.5 minutes), and identical Hetzner infrastructure. Here’s what we learned:

Test	Architecture	vCPU	Monthly Cost	RPS	Cost per 100 RPS	P95 Latency	Errors	Result
1	Single CAX11	2	€3.79	228	€1.66	5,303ms	0.80%	❌ Failed
2	2×CAX11 Swarm (balanced)	4	€7.58	354	€2.14	3,524ms	0.00%	✅ Passed
3	Single CAX21	4	€7.59	484	€1.57	2,462ms	0.00%	🏆 Winner
4	CAX21+CAX11 Swarm (asymmetric)	6	€11.38	343	€3.32	3,557ms	0.00%	❌ Worse than Test 2

Single-server architecture: Everything runs in Docker Compose.

Key Findings:

🏆 Single CAX21 wins everything:

37% more throughput than balanced Swarm (484 vs 354 RPS)
30% lower latency than balanced Swarm (2.5s vs 3.5s P95)
€0.01 more expensive than Swarm (€7.59 vs €7.58)
Zero operational complexity (no overlay networks, no orchestration)

📉 Distributed systems tax is real:

Traefik used 5× more CPU on Swarm (180% vs 36%) for less throughput
Overlay network overhead killed performance
More servers ≠ more performance (Test 4 proved this)

At small-to-medium scale (sub-500 RPS), simple beats distributed. Docker Compose on a single server outperformed Docker Swarm by 37% at the same cost. All of this with the total monthly cost of €7.59

The AWS equivalent (apples-to-apples) would be €100-120/month (single t4g.xlarge Graviton instance, self-managed)

Here’s what infrastructure repatriation taught us in detail.

The Setup

If you’re building a B2B SaaS startup, you’ve heard the pitch: “Start simple, then scale with AWS.” But simple on AWS means €5,000+/month once you add the managed services your investors expect.

We’re testing infrastructure repatriation for early-stage startups: moving workloads off expensive cloud platforms back to sustainable, predictable VPS infrastructure.

Our test case: FlagMeter—a usage quota tracker for B2B SaaS products. Simple stack: TypeScript, PostgreSQL, Valkey (Redis fork), deployed via Docker Compose. Exactly the kind of app where AWS cost spirals out of control.

The startup constraint: Keep monthly costs under €10 while proving you can handle real load. Save infrastructure budget for customer acquisition, not cloud markup.

The question: What’s the simplest architecture that handles 500 requests per second while staying sustainable?

Every accelerator, every tech advisor says: “Distributed is better. Docker Swarm for small scale, Kubernetes for serious work.” The playbook is gospel: separate concerns, isolate workloads, scale horizontally.

We ran four identical load tests to challenge this dogma. Same code, same load pattern (1200 concurrent users hammering /api/events for 4.5 minutes), same Hetzner Cloud servers. Real money, real infrastructure, real failures.

The FlagMeter Architecture

Here’s the simple, sustainable stack we tested:

    graph TB
	    subgraph Internet
	        CLIENT[Client Apps<br/>POST /api/events]
	    end
	
	    subgraph "Hetzner CAX21 (€7.59/mo)"
	        TRAEFIK[Traefik<br/>reverse proxy<br/>HTTPS termination]
	
	        subgraph "Application Stack"
	            DASH[Dashboard<br/>TanStack Start<br/>Node.js ]
	            WORKER[Worker<br/>queue consumer<br/>Node.js ]
	            VALKEY[(Valkey 7<br/>Redis fork<br/>queue + cache)]
	            PG[(PostgreSQL 18<br/>tuned for writes)]
	        end
	
	        subgraph "Observability Stack"
	            PROM[Prometheus<br/>metrics storage]
	            GRAFANA[Grafana<br/>dashboards]
	            LOKI[Loki<br/>log aggregation]
	        end
	    end
	
	    CLIENT -->|HTTPS| TRAEFIK
	    TRAEFIK -->|:3000| DASH
	    DASH -->|write events| VALKEY
	    DASH -->|read usage| PG
	    WORKER -->|consume queue| VALKEY
	    WORKER -->|aggregate writes| PG
	    DASH -->|expose :9464| PROM
	    WORKER -->|expose :9465| PROM
	    GRAFANA -->|query| PROM
	    GRAFANA -->|query| LOKI
	    DASH -.->|pino logs| LOKI
	    WORKER -.->|pino logs| LOKI

What startups actually build:

Lambda functions (1GB memory, 1.5s avg execution time)
RDS Multi-AZ (because “production needs HA”)
ElastiCache (because “Redis is critical”)
ALB (because “we need load balancing”)
CloudWatch (because “we need observability”)
NAT Gateway (because Lambda needs internet)

Cost at our test load (484 RPS for 8 hours/day):

Lambda: €9,900/month (418M requests × 1.5s × €0.0000166667/GB-second)
RDS db.m5.large Multi-AZ: €280/month
ElastiCache cache.m5.large: €180/month
ALB + NAT + CloudWatch + egress: €200/month
Total: €10,560/month

Or with lighter usage (1 hour/day): Still €1,500-2,000/month.

The FlagMeter dashboard: Real-time quota tracking for B2B SaaS products. Running on €7.59/month infrastructure.

Test 1: Single CAX11 (The Baseline)

Setup:

Hetzner CAX11: 2 vCPU, 4GB RAM, ARM64
Cost: €3.79/month
Everything on one server: App, Worker, PostgreSQL, Valkey, Prometheus, Grafana, Traefik

Hypothesis: “This will melt under load.”

Results:

RPS: 228
P95 Latency: 5,303ms (5.3 seconds)
Errors: 0.80% (35 5xx errors, 456 timeouts)
CPU: 100% utilized throughout (0% idle)
Load Average: 10.64 on 2 cores

Verdict: ❌ Failed. The 2-vCPU threshold is real. Services competing for CPU created cascading failures.

Key insight: When Prometheus scrapes metrics → CPU spike → dashboard slows → queue builds → timeouts cascade. No isolation = cascading failures.

Test 2: 2x CAX11 Docker Swarm (The “Industry Best Practice”)

Setup:

Manager Node: CAX11 (2 vCPU) - Traefik, Prometheus, Grafana, Loki
Worker Node: CAX11 (2 vCPU) - App, Worker, PostgreSQL, Valkey
Total: 4 vCPU, 8GB RAM, €7.58/month
Private overlay network connecting nodes

Hypothesis: “Separation prevents cascading failures. Observability isolated from application.”

Results:

RPS: 354 (+55% vs single CAX11)
P95 Latency: 3,524ms
Errors: 0.00% ✅
Manager CPU: Traefik at 180% (bottleneck!)
Worker CPU: Comfortable, plenty of headroom

Verdict: ✅ Passed (zero errors), but unexpectedly slow.

Key observation: Traefik consumed 180% CPU on manager (90% per core). Why? We didn’t know yet. But isolation worked—observability couldn’t crash the application.

Test 3: Single CAX21 (The Repatriation Champion)

Before testing complex configurations, we wanted a fair comparison: Same total vCPU as Swarm (4 cores), single-node simplicity.

Setup:

Hetzner CAX21: 4 vCPU, 8GB RAM, ARM64
Cost: €7.59/month (€0.01 more than Swarm!)
Everything on one server—the way infrastructure used to work

Hypothesis: “Should match the Swarm’s 354 RPS.”

Results:

RPS: 484 (+37% vs Swarm!)
P95 Latency: 2,462ms (-30% vs Swarm!)
Errors: 0.00% ✅
CPU: 2-7% idle until final minutes
Traefik: Only 36% CPU (vs 180% on Swarm!)
PostgreSQL: 110% CPU (the actual bottleneck)

Verdict: 🏆 Winner. Best performance at identical cost.

The lesson: Traefik used 5x less CPU (36% vs 180%) for 37% more throughput. Localhost communication eliminated the distributed systems tax. The overlay network wasn’t free—it was expensive.

Test 4: “Let’s Fix the Swarm!” (The €11 Mistake)

We thought: “Traefik is bottlenecked on 2 vCPU. Upgrade the manager to CAX21 (4 vCPU) and problem solved!”

Setup:

Manager Node: CAX21 (4 vCPU) ⬆️ Upgraded!
Worker Node: CAX11 (2 vCPU)
Total: 6 vCPU, 12GB RAM, €11.38/month (+50% cost)

Hypothesis: “Traefik drops to ~60% CPU, we hit 400-450 RPS.”

Expected: 🎯 400-450 RPS Actual: 💥 343 RPS (3% worse than balanced Swarm!)

Results:

RPS: 343 (-3% vs balanced Swarm!)
P95 Latency: 3,497ms (essentially same)
Errors: 0.00% ✅
Manager: Traefik 73% CPU (comfortable), load 1.79
Worker: Load 5.90 (295% of capacity!), 10 tasks on 2 cores
Cost: 50% more than balanced Swarm

Verdict: ❌ Disaster. Paid 50% more for 3% worse performance.

The asymmetric failure: The stronger manager pushed MORE traffic than the worker could handle. Requests queued at the worker instead of manager. We turned a Traefik bottleneck into a worker bottleneck—and made it worse.

Left peak (16:40-16:50): 2x CAX11 Swarm test - 354 RPS, struggling
Right peak (17:00-17:10): Single CAX21 test - 484 RPS, smooth

Key observations:

Single CAX21 peak is 37% higher (484 vs 354 RPS)
CAX21 spike is cleaner (less variance, more stable)
Same total cost (€7.59 vs €7.58/month)
Simpler architecture = better performance

This graph captures the essence of infrastructure repatriation: simplicity wins.

The Distributed Systems Tax

Why did Traefik use 5x more CPU in Swarm vs single-node?

Single-node (sustainable):

Internet → Traefik → App (localhost:3000) → Response

One network hop
Shared memory communication (minimal overhead)
Traefik: 36% CPU for 484 RPS

Swarm (complex):

Internet → Traefik (manager) →
  Overlay Network (VXLAN) →
  App (worker) →
  Overlay Network →
  Traefik → Response

Three network hops
VXLAN encapsulation/decapsulation
Service discovery per request
Traefik: 73-180% CPU for 343-354 RPS

The penalty: ~1,000ms added latency + 5x CPU overhead. Architectural, not fixable with hardware.

What This Taught Us

1. Simplicity is sustainable

The single CAX21 outperformed every distributed configuration. No overlay networks, no service discovery, no operational complexity. One server, doing its job well.

For 90% of B2B SaaS products: a single VPS handles your first 50,000 users. By then, you have revenue to justify complexity.

2. The distributed systems tax is real

Docker Swarm’s overlay network costs:

2x additional network hops
VXLAN encapsulation overhead
Service discovery lookups
TCP connection management

The result was ~1,000ms latency penalty + 5x CPU for routing. Can’t be fixed with better hardware. It’s architectural.

3. Asymmetric scaling fails spectacularly

Upgrading one node in a distributed system creates bottlenecks you didn’t have before. The stronger node overwhelms the weaker one.

Rule: In distributed systems, nodes must be identically sized or performance degrades unpredictably.

4. Vertical scaling continues to work

The data suggests: Single-server vertical scaling remains cost-effective well beyond 500 RPS. At €1.57 per 100 RPS, a CAX31 (€14.90/month, 8 vCPU) could handle ~950 RPS before hitting PostgreSQL limits.

When to distribute: Only when you’ve maxed out the largest single server (CAX41: 16 vCPU, €28.49/month, estimated ~1,500-2,000 RPS) or need geographic redundancy.

5. Database tuning > infrastructure scaling

Every test showed Postgres at 108-111% CPU. Tuning PostgreSQL (separate article) unlocked more capacity than adding servers.

When Distributed Systems Make Sense

We’re not anti-distributed. We’re anti-premature-distribution.

Use Swarm/K8s when:

True high availability required (multi-node failover)
RPS > 1,000 sustained
Geographic distribution mandated
Regulatory compliance demands redundancy

Don’t use distributed systems when:

“Best practices say…” (question the dogma)
“We might scale someday” (premature optimization)
“Distributed is more robust” (it’s more complex = more failure modes)

The Raus.cloud Philosophy: Infrastructure for Bootstrapped Startups

This is why infrastructure repatriation exists. The cloud industry profits from complexity—Kubernetes, microservices, multi-cloud—as default answers. For early-stage startups, these create operational debt that burns runway before you find product-market fit.

The reality most founders face:

You launch on AWS with Lambda + RDS because “it’s serverless and scales automatically.”

Month 1 €200 (light traffic, testing)
↓
Month 3 €2,000 (some real users, CloudWatch costs climbing)
↓
Month 6 €5,000 (moderate growth, added ElastiCache because "Redis is critical")
↓
Month 12 €8,000 (investors ask about unit economics, you have no answer)

Meanwhile, your competitor runs the same workload on a €15/month VPS.

Our repatriation approach for startups:

Start simple (single VPS, Docker Compose) - Save 95% of infrastructure budget
Tune what you have (PostgreSQL config, query optimization) - Free performance gains
Scale vertically first (CAX21 → CAX31 → CAX41) - Linear cost scaling, no architecture rewrites
Distribute only when proven necessary (>1,000 RPS sustained, or regulatory HA requirements)

If you’re a bootstrapped startup spending €5,000+/month on AWS while debugging Lambda cold starts instead of talking to customers, repatriation is your path to profitability.

📞 Book Your Free Infrastructure Audit

15-minute call • No sales pitch • Honest assessment

Next in series:

Part 2: “Zero DevOps: Deploy Production Infrastructure with Coolify” (coming soon)
Part 3: “The €8 to €800 Scaling Roadmap” (coming soon)

Ready to repatriate? Book a free workshop →

This article is part of our infrastructure repatriation case studies. Real tests, real costs, real lessons learned while building sustainable alternatives to cloud complexity.

Ready to Simplify Your Infrastructure?

Let us show you how to cut your cloud costs and simplify your operations.

Book Free Consultation

← Back to Case Studies

Executive Summary: Load Test Results

Key Findings:

The Setup

The FlagMeter Architecture

Test 1: Single CAX11 (The Baseline)

Test 2: 2x CAX11 Docker Swarm (The “Industry Best Practice”)

Test 3: Single CAX21 (The Repatriation Champion)

Test 4: “Let’s Fix the Swarm!” (The €11 Mistake)

The Distributed Systems Tax

What This Taught Us

1. Simplicity is sustainable

2. The distributed systems tax is real

3. Asymmetric scaling fails spectacularly

4. Vertical scaling continues to work

5. Database tuning > infrastructure scaling

When Distributed Systems Make Sense

The Raus.cloud Philosophy: Infrastructure for Bootstrapped Startups

Tags

Ready to Simplify Your Infrastructure?