Backend May 28, 2026 8 min read

Designing Resilient APIs: Timeouts, Retries, and Backpressure

The three patterns that separate APIs that survive production from the ones that fall over at the first traffic spike — with concrete defaults you can ship today.

D

DevCraftly Team

DevCraftly

Share

Every API works on a developer’s laptop. The interesting question is what happens when a downstream dependency gets slow, a deploy doubles latency, or a batch job floods you with ten times the usual traffic. Resilience is not a library you install — it’s a set of deliberate decisions. Here are the three that matter most.

1. Every network call needs a timeout

The single most common production incident is a thread (or event-loop task) blocked forever on a call that will never return. Without a timeout, one slow dependency cascades: connections pile up, pools exhaust, and your healthy service starts failing too.

// Java — set timeouts explicitly. The defaults are almost always "infinite".
HttpClient client = HttpClient.newBuilder()
    .connectTimeout(Duration.ofSeconds(2))
    .build();

HttpRequest request = HttpRequest.newBuilder(URI.create(url))
    .timeout(Duration.ofSeconds(3)) // request-level read timeout
    .build();

A good rule of thumb: a timeout should be a small multiple of your p99 latency, not your average. If p99 is 200ms, a 2s timeout gives generous headroom while still failing fast.

2. Retry — but only safe operations, with jitter

Retries turn transient blips into successes. They also turn a small outage into a self-inflicted DDoS if you do them wrong.

Retry idempotent operations only. Never blindly retry a POST that creates a resource unless you have an idempotency key.

Use exponential backoff with full jitter so retries don’t synchronize into a thundering herd:

async function withRetry<T>(fn: () => Promise<T>, attempts = 3): Promise<T> {
  for (let i = 0; ; i++) {
    try {
      return await fn();
    } catch (err) {
      if (i >= attempts - 1) throw err;
      const base = 100 * 2 ** i;
      const delay = Math.random() * base; // full jitter
      await new Promise((r) => setTimeout(r, delay));
    }
  }
}

3. Backpressure: shed load before it sheds you

When you cannot keep up, the worst thing to do is accept everything and queue it indefinitely. Bounded queues, concurrency limits, and load shedding let you degrade gracefully instead of collapsing.

Bounded concurrency — cap in-flight requests to a dependency.
Reject early — return 429 or 503 when a queue is full. A fast failure is recoverable; a slow timeout is not.
Circuit breakers — stop calling a dependency that’s clearly down and give it room to recover.

Sensible defaults to start with

Concern	Default
Connect timeout	1–2s
Read timeout	2–5× p99
Retries	2 (idempotent only)
Backoff	Exponential + full jitter
Circuit breaker	Trip at 50% errors over 10s

Resilience compounds. Add timeouts first, then retries, then backpressure — and you’ll have an API that bends under load instead of breaking.

#api #resilience #distributed-systems #backend

Keep reading

View all →

AI Jun 13, 2026 12 min

MCP Servers Explained: The Future of AI Tool Integration

What the Model Context Protocol (MCP) is, how MCP servers work, why it beats bespoke API glue, a hands-on server example, the growing ecosystem, security considerations, and where it's all heading.

D DevCraftly Team

AI Jun 12, 2026 13 min

Why AI Coding Agents Will Change Software Development in 2026

What AI coding agents are, how they differ from autocomplete assistants, the tools that matter in 2026, real use cases, the productivity math, security risks, and how to fold agents into your daily workflow without regrets.

D DevCraftly Team

Full-Stack Jun 11, 2026 16 min

Building a Full SaaS Application with NestJS, React, PostgreSQL and Docker

A step-by-step, production-grade guide: architecture, multi-tenant database design, JWT auth, NestJS APIs, a React frontend, Docker, CI/CD with GitHub Actions, scalability, and the best practices that hold up in production.

D DevCraftly Team

Get in touch