Skip to content
Backend May 28, 2026 8 min read

Designing Resilient APIs: Timeouts, Retries, and Backpressure

The three patterns that separate APIs that survive production from the ones that fall over at the first traffic spike — with concrete defaults you can ship today.

D

DevCraftly Team

DevCraftly

Share
Designing Resilient APIs: Timeouts, Retries, and Backpressure

Every API works on a developer’s laptop. The interesting question is what happens when a downstream dependency gets slow, a deploy doubles latency, or a batch job floods you with ten times the usual traffic. Resilience is not a library you install — it’s a set of deliberate decisions. Here are the three that matter most.

1. Every network call needs a timeout

The single most common production incident is a thread (or event-loop task) blocked forever on a call that will never return. Without a timeout, one slow dependency cascades: connections pile up, pools exhaust, and your healthy service starts failing too.

// Java — set timeouts explicitly. The defaults are almost always "infinite".
HttpClient client = HttpClient.newBuilder()
    .connectTimeout(Duration.ofSeconds(2))
    .build();

HttpRequest request = HttpRequest.newBuilder(URI.create(url))
    .timeout(Duration.ofSeconds(3)) // request-level read timeout
    .build();

A good rule of thumb: a timeout should be a small multiple of your p99 latency, not your average. If p99 is 200ms, a 2s timeout gives generous headroom while still failing fast.

2. Retry — but only safe operations, with jitter

Retries turn transient blips into successes. They also turn a small outage into a self-inflicted DDoS if you do them wrong.

Retry idempotent operations only. Never blindly retry a POST that creates a resource unless you have an idempotency key.

Use exponential backoff with full jitter so retries don’t synchronize into a thundering herd:

async function withRetry<T>(fn: () => Promise<T>, attempts = 3): Promise<T> {
  for (let i = 0; ; i++) {
    try {
      return await fn();
    } catch (err) {
      if (i >= attempts - 1) throw err;
      const base = 100 * 2 ** i;
      const delay = Math.random() * base; // full jitter
      await new Promise((r) => setTimeout(r, delay));
    }
  }
}

3. Backpressure: shed load before it sheds you

When you cannot keep up, the worst thing to do is accept everything and queue it indefinitely. Bounded queues, concurrency limits, and load shedding let you degrade gracefully instead of collapsing.

  • Bounded concurrency — cap in-flight requests to a dependency.
  • Reject early — return 429 or 503 when a queue is full. A fast failure is recoverable; a slow timeout is not.
  • Circuit breakers — stop calling a dependency that’s clearly down and give it room to recover.

Sensible defaults to start with

ConcernDefault
Connect timeout1–2s
Read timeout2–5× p99
Retries2 (idempotent only)
BackoffExponential + full jitter
Circuit breakerTrip at 50% errors over 10s

Resilience compounds. Add timeouts first, then retries, then backpressure — and you’ll have an API that bends under load instead of breaking.

#api #resilience #distributed-systems #backend