Designing Resilient APIs: Timeouts, Retries, and Backpressure
The three patterns that separate APIs that survive production from the ones that fall over at the first traffic spike — with concrete defaults you can ship today.
Every API works on a developer’s laptop. The interesting question is what happens when a downstream dependency gets slow, a deploy doubles latency, or a batch job floods you with ten times the usual traffic. Resilience is not a library you install — it’s a set of deliberate decisions. Here are the three that matter most.
1. Every network call needs a timeout
The single most common production incident is a thread (or event-loop task) blocked forever on a call that will never return. Without a timeout, one slow dependency cascades: connections pile up, pools exhaust, and your healthy service starts failing too.
// Java — set timeouts explicitly. The defaults are almost always "infinite".
HttpClient client = HttpClient.newBuilder()
.connectTimeout(Duration.ofSeconds(2))
.build();
HttpRequest request = HttpRequest.newBuilder(URI.create(url))
.timeout(Duration.ofSeconds(3)) // request-level read timeout
.build();
A good rule of thumb: a timeout should be a small multiple of your p99 latency, not your average. If p99 is 200ms, a 2s timeout gives generous headroom while still failing fast.
2. Retry — but only safe operations, with jitter
Retries turn transient blips into successes. They also turn a small outage into a self-inflicted DDoS if you do them wrong.
Retry idempotent operations only. Never blindly retry a
POSTthat creates a resource unless you have an idempotency key.
Use exponential backoff with full jitter so retries don’t synchronize into a thundering herd:
async function withRetry<T>(fn: () => Promise<T>, attempts = 3): Promise<T> {
for (let i = 0; ; i++) {
try {
return await fn();
} catch (err) {
if (i >= attempts - 1) throw err;
const base = 100 * 2 ** i;
const delay = Math.random() * base; // full jitter
await new Promise((r) => setTimeout(r, delay));
}
}
}
3. Backpressure: shed load before it sheds you
When you cannot keep up, the worst thing to do is accept everything and queue it indefinitely. Bounded queues, concurrency limits, and load shedding let you degrade gracefully instead of collapsing.
- Bounded concurrency — cap in-flight requests to a dependency.
- Reject early — return
429or503when a queue is full. A fast failure is recoverable; a slow timeout is not. - Circuit breakers — stop calling a dependency that’s clearly down and give it room to recover.
Sensible defaults to start with
| Concern | Default |
|---|---|
| Connect timeout | 1–2s |
| Read timeout | 2–5× p99 |
| Retries | 2 (idempotent only) |
| Backoff | Exponential + full jitter |
| Circuit breaker | Trip at 50% errors over 10s |
Resilience compounds. Add timeouts first, then retries, then backpressure — and you’ll have an API that bends under load instead of breaking.
Related articles

MCP Servers Explained: The Future of AI Tool Integration
What the Model Context Protocol (MCP) is, how MCP servers work, why it beats bespoke API glue, a hands-on server example, the growing ecosystem, security considerations, and where it's all heading.

Why AI Coding Agents Will Change Software Development in 2026
What AI coding agents are, how they differ from autocomplete assistants, the tools that matter in 2026, real use cases, the productivity math, security risks, and how to fold agents into your daily workflow without regrets.

Building a Full SaaS Application with NestJS, React, PostgreSQL and Docker
A step-by-step, production-grade guide: architecture, multi-tenant database design, JWT auth, NestJS APIs, a React frontend, Docker, CI/CD with GitHub Actions, scalability, and the best practices that hold up in production.
Have a project or an idea?
We don't just write about software — we build it. Tell us what you're working on and we'll get back within 1–2 business days.