When you fail, retry!

"There is no failure except in no longer trying." – Elbert Hubbard

In modern web applications, handling network requests, database operations, and other async tasks reliably is crucial. Yet these operations often fail due to temporary issues like network hiccups or service timeouts. While we could manually wrap each operation in try-catch blocks with custom retry logic, this quickly becomes repetitive and error-prone. That's where a robust retry mechanism comes in handy.

In this article, we'll build a library that makes handling retries simple yet flexible. We want a simple API that wraps async functions and handles retries automatically. The library should support configurable retry attempts, delays, and timeouts while implementing exponential backoff with jitter to prevent overwhelming services.


API Design

The library handling retries is a withRetry function that takes in as first argument an asynchronous function, and an optional object defining some behaviour configuration.

const result = await withRetry(asyncFn, { ... });

The withRetry function is defined as follows:

async function withRetry<T>(
  fn: () => Promise<T>,
  config: Partial<RetryConfig> = {},
): Promise<T> { ... }

The configuration for the withRetry function is defined as follows:

type RetryConfig = {
  maxAttempts: number;
  delay: number;
  maxDelay: number;
  backoffStrategy: RetryConfigBackoffStrategy;
  jitter: RetryConfigJitter;
  retryCondition: (error: Error) => boolean;
  onRetry: (error: Error, attempt: number) => void;
  onExhausted: (error: Error) => void;
  timeout: number;
};

The different options are:

  • maxAttempts: the maximum number of attempts before we return a failure.
  • delay: the duration (in ms) to wait between subsequent retries.
  • maxDelay: a cap on how long to delay retries. This can be useful when using some backoff strategies to avoid delays growing past a reasonable point.
  • backoffStrategy: the strategy used to calculate the delay between retries.
  • jitter: a rate (between 0 and 1) of the amount of randomness to add to the delay. This is particularly useful to avoid "thundering herd" problems, where multiple calls are awaken at the same time (in this case, after the same delay).
  • retryCondition: a function that, given the last retry error, decides if we should continue retrying (up to maxAttempts).
  • onRetry: a callback that runs on every retry.
  • onExhausted: a callback that runs once maxAttempts is reached or retryCondition returns false.
  • timeout: the duration (in ms) to wait for each retry before marking them as failed.

Backoff strategy

The backoff strategy is further defined as:

type RetryConfigBackoffStrategy =
  | "constant"
  | "linear"
  | "exponential"
  | { type: "linear"; factor: number }
  | { type: "exponential"; factor: number }
  | ((attempt: number, delay: number) => number);

A constant backoff strategy means that the delay between each retries is the same.

A linear backoff strategy multiplies the delay by the number of attempts, hence growing linearly over each retry. If passed with a factor, that number is used to further multiply the attempts and previous delay, effectively increasing the delay more.

An exponential backoff strategy multiplies the delay by 2 to the power of attempts. Similarly, if passed with a factor, that number gets multiplied to effectively increase the delay more between each retries.

Finally, the backoff strategy also allows for custom implementations by accepting a function that takes the attempt and delay and returns a number.

Jitter

The jitter is further defined as:

type RetryConfigJitter =
  | boolean
  | number
  | ((attempt: number, delay: number) => number);

Setting the jitter to true provides a random jitter on every retry, whereas setting it to a number between 0 and 1 uses that as a coefficient for subsequent delays. Similarly to the backoff strategy, it can also be set to a function for custom implementations, taking the attempt number and delay.

Default values

By default, we set the configuration to the following values:

const DEFAULT_CONFIG = {
  maxAttempts: 3,
  delay: 100,
  maxDelay: 1000,
  backoffStrategy: "constant",
  jitter: true,
  retryCondition: () => true,
  onRetry: () => {},
  onExhausted: () => {},
  timeout: 0,
};

Implementation

Let's dive into the implementation of this library!

At a high level, we'd need:

  • a main retry loop that calls the asynchronous function passed as argument, and keeps retrying based on configuration
  • a delay calculation function that returns how long the library should way between subsequent retries
  • some logic to also handle timeouts (if configured)

Main retry loop

The main retry loop is defined as an infinite while loop. This give us more flexibility on stop conditions for the loop.

async function withRetry<T>(
  fn: () => Promise<T>,
  config: Partial<RetryConfig> = {},
): Promise<T> {
  ...
  while (true) {
    try {
      return await fn();
    } catch (error) {
      ...
    }
}

Here, we return whatever the function returns if it's successful. If it fails, the while loop will continue retrying that function.

The stopping conditions for our retry loop are:

  • maxAttempts has been reached, or
  • retryCondition returns false

We can check for those conditions in the catch block as follows:

async function withRetry<T>(
  fn: () => Promise<T>,
  config: Partial<RetryConfig> = {},
): Promise<T> {
  let attempt = 0;

  while (true) {
    attempt += 1;
    
    try {
      return await fn();
    } catch (error) {
      if (
        attempt == config.maxAttempts ||
        !config.retryCondition(error as Error)
      ) {
        config.onExhausted(error as Error);
        throw error;
      }
    }
  }
}

We keep the number of attempts in a local attempt variable that gets incremented whenever we catch an error. If we have tried the maximum number (maxAttempts), or if the retryCondition function returns false, we throw the final error. We also call the onExhausted callback.

Delay calculation

The code we currently have will retry a function, but without any delays. Let's fix that!

We're going to add a delay in the catch block of the while loop and use setTimeout to wait for that given time.

async function withRetry<T>(
  fn: () => Promise<T>,
  config: Partial<RetryConfig> = {},
): Promise<T> {
  ...
  while (true) {
    ...
    try {
      return await fn();
    } catch (error) {
      ...
      const delay = calculateDelay(attempt, config);
      await new Promise((resolve) => setTimeout(resolve, delay));
    }
  }
}

The calculateDelay function depends on 3 factors:

  • backoffStrategy
  • maxDelay
  • jitter

The backoffStrategy can either be a string ("constant" | "linear" | "exponential"), an object (with a type and a factor fields), or a function that returns a number. Based on the type of backoff strategy, we calculate the new delay based on the previous delay and the attempt number.

function calculateDelay(attempt: number, config: RetryConfig): number {
  let delay: number;

  if (config.backoffStrategy === "constant") {
    delay = config.delay;
  } else if (config.backoffStrategy === "linear") {
    delay = config.delay * attempt;
  } else if (config.backoffStrategy === "exponential") {
    delay = config.delay * 2 ** attempt;
  } else if (typeof config.backoffStrategy === "object") {
    if (config.backoffStrategy.type === "linear") {
      delay = config.delay * config.backoffStrategy.factor * attempt;
    } else if (config.backoffStrategy.type === "exponential") {
      delay = config.delay * config.backoffStrategy.factor ** attempt;
    } else {
      throw new Error("Invalid backoff strategy");
    }
  } else {
    delay = config.backoffStrategy(attempt, config.delay);
  }

  return delay;
}

To put a cap on the delay, in particular for "exponential" strategies, we then take to lowest value between the newly calculated delay and maxDelay.

function calculateDelay(attempt: number, config: RetryConfig): number {
  let delay: number;

  ...

  delay = Math.min(delay, config.maxDelay);

  return delay;
}

Finally, we need to implement the jitter based on its configuration. The value of a jitter is between 0 and 1, and can be though as adding a percentage of the delay to itself. The jitter can be a function, a number, or random depending on the value passed in the configuration.

function calculateDelay(attempt: number, config: RetryConfig): number {
  let delay: number;

  ...

  if (config.jitter) {
    const jitter = isFunction(config.jitter)
      ? config.jitter(attempt, delay)
      : isNumber(config.jitter)
        ? config.jitter
        : Math.random();

    delay = delay + delay * jitter;
  }

  return delay;
}

Timeout

Now that we have our main retry loop and delay calculation settled, the last piece of the puzzle is to implement a timeout mechanism!

We leverage Promise.race([...]) and setTimeout to implement the timeout:

async function withRetry<T>(
  fn: () => Promise<T>,
  config: Partial<RetryConfig> = {},
): Promise<T> {
  ...

  while (true) {
    ...
    try {
      if (config.timeout > 0) {
        const timeoutPromise = new Promise<never>((_, reject) => {
          setTimeout(
            () => reject(new Error("Operation timed out")),
            config.timeout,
          );
        });
        return await Promise.race([fn(), timeoutPromise]);
      }

      return await fn();
    } catch (error) {
	  ...
    }
  }
}

This ensures that we can control for how long we want to wait for the asynchronous function to return. This is particularly useful for network requests!


Code

There we have it, a retry library that provides flexible configuration!

This library has been implemented and published on npm:

Antonio Villagra De La Cruz

Antonio Villagra De La Cruz

Multicultural software engineer passionate about building products that empower people.