Serverless or Actual Servers? How to Choose Your Backend in 2026

Serverless vs servers has become one of those arguments that breaks out on Twitter every six months. One camp insists servers are dead and anyone still provisioning EC2 instances is wasting money. The other camp points to horror stories of six-figure Lambda bills and argues real engineers run their own boxes. Both sides are mostly wrong. In practice, the answer for almost every production workload is “a mix of both, chosen per component,” not one or the other. This post is the framework we use at Mainix when clients ask us which to pick.

Quick definitions

When this post says serverless, it means the whole family: Lambda, Cloud Run, Vercel Functions, Cloudflare Workers, Azure Functions, Fly Machines, Modal. Pay-per-request, scale-to-zero, short-lived execution environments with strict memory and time limits.

When this post says actual servers, it means long-running compute: EC2, Hetzner boxes, dedicated servers, or managed container platforms like ECS Fargate, Cloud Run services, or Kubernetes. Fixed or scaled-per-second billing, no cold starts, full control over what runs.

The distinction is less about where the code lives and more about the billing and lifecycle model.

When serverless is the right call

Spiky or unpredictable traffic

If your workload goes from 0 requests to 1,000 RPS and back to 0 within an hour, serverless wins outright. Auto-scaling is instant, you pay only for what you use, and you don’t have to capacity-plan around rare peaks. Marketing sites, webhook receivers, internal tools, and APIs with long idle periods all fit here.

Background jobs and scheduled work

Cron jobs, nightly reports, data migrations, thumbnail generation, email delivery. These run a few times an hour or once a day. Paying for a 24/7 server to run a 30-second task is wasteful. Serverless with a queue (SQS, PubSub, Cloudflare Queues) is the clean answer.

Edge compute and middleware

Request routing, auth checks, A/B test assignment, geo-aware redirects. These need to run close to the user and return fast. Cloudflare Workers and Vercel Edge Functions are built for this, and no server-based setup matches their latency or operational simplicity.

MVPs and side services

For a two-person startup shipping its first version, serverless removes an entire category of problems. No SSH, no OS patching, no capacity planning, no server monitoring. You’ll graduate off it for some workloads eventually, but the time you save in the first year is worth the later migration cost.

When actual servers are the right call

Sustained, predictable load

If your API runs at 500 RPS around the clock, serverless is wildly more expensive than a few well-sized EC2 instances or a Fargate service. The break-even for most workloads sits around 20 to 30 percent sustained CPU utilization. Above that, servers win on cost. The exact number depends on your request profile and cloud, but the rule of thumb holds.

Long-running or stateful processes

Lambda caps out at 15 minutes. Cloud Run allows up to 60. If your request can run for an hour (report generation, large file processing, ML training or fine-tuning), you’re on servers. Same for anything stateful: WebSocket servers, in-memory caches, game servers, real-time video processing. Serverless is fundamentally stateless; fighting that costs more than just running a container.

Heavy ML inference

GPU workloads and large-model inference are possible on serverless (Modal, Replicate, Banana), but the economics rarely work for sustained traffic. If you’re serving a model to real users at any meaningful volume, dedicated GPU servers or Kubernetes with GPU nodes beat per-request billing badly.

Strict data residency or compliance

Healthcare, finance, and certain government workloads need fine-grained control over where data lives, who can access it, and how audit logs are retained. The big cloud providers all offer compliant serverless options, but the audit surface is smaller and easier to reason about with servers running in known regions. This is one area where boring wins.

What production actually looks like

The honest picture for most mid-sized production systems is a mix. A typical setup we build looks like this:

Main web app and API: Fargate or Cloud Run service. Predictable load, benefits from warm instances, cost-efficient at scale.
Database: managed DBaaS (RDS, Aurora, Neon, PlanetScale). Not really a “server vs serverless” choice; managed is always right at this layer.
Auth and session verification: edge functions (Cloudflare Workers or Vercel Edge). Runs on every request, latency-sensitive, stateless.
Webhook receivers: serverless functions. Spiky and unpredictable by nature.
Background jobs: serverless functions triggered from a queue (SQS, PubSub) or cron scheduler.
File processing / image resizing: serverless with object-storage triggers. Short-lived, isolated, easy to scale.
AI inference: depends. Low volume and experiments on Modal or Replicate; sustained production traffic on dedicated GPU instances behind an autoscaler.

This isn’t hypothetical. It’s what almost every production system we work on looks like by month twelve, regardless of what the team thought they’d build on day one.

Cost reality check

The quick math people rarely do before picking a stack:

Lambda at 1,000 RPS sustained, 200ms average duration, 512MB: roughly $12,000 to $18,000 per month depending on region and commitment discounts.
ECS Fargate running an equivalent container at the same throughput: roughly $2,500 to $4,000 per month.
EC2 or Hetzner bare metal at that load: roughly $1,200 to $2,500 per month, but with real ops overhead.

The ratios shift with workload shape, but at sustained high throughput serverless is 4 to 10 times more expensive than containers. That cost is hidden when you’re at 10 RPS and your bill is $50 a month, and it shows up suddenly when you hit scale. The opposite is also true: at 1 RPS, a Fargate service still costs you $30 a month minimum just to exist, while Lambda rounds to zero.

Most teams who get burned by serverless cost spikes didn’t measure before scaling. The time to model cost at your expected future load is during the initial architecture conversation, not at the end of the billing cycle.

The vendor lock-in question

Vendor lock-in is the most common argument against serverless, and it’s mostly overblown. Your business logic doesn’t care whether it runs inside a Lambda handler or a Fastify route, as long as the code is structured with clean boundaries. The actual lock-in is narrower than people claim:

Proprietary APIs (DynamoDB Streams, Durable Objects, Step Functions): these lock you in, but you only use them in specific places. Keep them behind an interface and migration becomes tractable.
Event shapes (Lambda event objects, Workers Request): these vary by provider. Translate at the edge of your function, keep the business logic framework-agnostic.
Cold start and memory characteristics: these differ significantly between platforms. Code written assuming Lambda may perform badly on Workers and vice versa.

In practice, moving a well-factored serverless codebase between clouds is a two to three month project, not a two-year rewrite. That cost matters, but it’s worth paying for the operational simplicity serverless gives you in exchange.

Common mistakes

Serverless for everything

Teams that start on Lambda sometimes refuse to migrate even after they hit sustained load. The bill grows, cold starts hurt user experience, and debugging becomes painful, but the team insists on staying because “we don’t want to manage servers.” Managed containers (Fargate, Cloud Run) are almost as easy and far more economical at steady state.

Kubernetes for everything

The opposite mistake. A three-person startup doesn’t need Kubernetes. Running a small service on Fargate or Cloud Run takes a few hours of setup; running the same service on Kubernetes costs you a full-time DevOps engineer you can’t afford. Save Kubernetes for when you genuinely need its abstractions across dozens of services.

Picking based on what the team already knows

Sometimes the right call. Often throwing good money after bad. If your team has deep Kubernetes experience and you’re running a handful of services, there’s no reason to switch to serverless. But if your team knows Kubernetes and you’re building a new product that’s four webhooks and a marketing site, don’t spin up a cluster.

Not measuring utilization

Most teams don’t know what their actual CPU and memory utilization looks like across the day. They’re either paying for over-provisioned servers sitting at 8% CPU or paying serverless rates that would be cheaper as containers. Measure before optimizing.

A decision checklist

Predictable sustained load >30% utilization? → servers / Fargate / Cloud Run
Spiky, bursty, or unpredictable load? → serverless
Requests that can run longer than 15 minutes? → servers
WebSockets, long polling, or real-time needs? → servers (with narrow exceptions like Cloudflare Durable Objects)
Small team, no ops specialist? → serverless or managed containers
Cost-critical at scale? → servers with autoscaling
Regulatory / data residency constraints? → servers in specific regions
Brand new MVP? → serverless, migrate individual components later as they earn it
Cron jobs, webhooks, file processing? → serverless, always

How we usually build it

For clients starting from scratch, we default to Cloud Run or Fargate for the main application, serverless functions for webhooks and jobs, managed databases for state, and CDN-edge functions for auth and routing. This mix covers 80% of the workload with minimal operational overhead and scales reasonably up to tens of thousands of RPS before anything needs rethinking.

For teams migrating from a monolith, we usually keep the monolith on servers and peel off specific workloads to serverless one at a time. Rewriting everything at once is a common failure mode. Incremental wins stack up.

The religious wars on Twitter are just noise. Pick what fits the workload, measure what you deploy, and be willing to migrate individual components as they outgrow their current home. There’s no single answer, but there are a lot of clearly wrong ones.

← All posts Working on something similar? Let’s talk →