Recently, I’ve been working extensively with API services, both commercial and our own, integrating them into larger systems and ensuring their robustness.

However, one problem I consistently encountered was handling API call failures gracefully. Network calls can be flaky: temporary outages, unexpected VPN disconnections, random failures, exceeding quotas… These issues can provoke fatal errors on long-running processes.

Enter the backoff Python package, which has proven to be a game-changer in fortifying my code against such failures.

Why Backoff?

Before discovering backoff, I frequently found myself writing awkward loops to handle repeated API call attempts. These loops looked something like this:

while True:
    try:
        response = make_api_call()
        if response.status_code == 200:
            break
    except NetworkError:
        continue
    except QuotaError:
        sleep(60)
        continue
    except ServerError:
        logger.error()

This loop would endlessly attempt the call until it succeeded, offering support for some simple error management. This was effective but inelegant, verbose and hard to maintain. Moreover, it lacked any form of exponential delay, which meant it could overwhelm the network with retries during an outage or temporary failure.

Stumbling upon backoff package felt like a breath of fresh air. It aligned perfectly with the solution I envisioned: automating retries while incorporating best practices like exponential backoff and jitter without the clunky, error-prone logic.

Integrating backoff not only simplified my code but also significantly increased its reliability and readability. This package effortlessly replaced my loop with much cleaner and more efficient logic—and it only required a few lines of code.

Integrating Backoff with API Calls

Let’s dive into some examples illustrating how backoff enhances API call reliability.

Example #1: Basic API Call Handling

Consider a simple API call to a weather service, where we might occasionally encounter a failure due to rate limits or connectivity issues.

Here’s how you might wrap that call using backoff:

import backoff
import requests

@backoff.on_exception(backoff.expo, requests.exceptions.RequestException, max_tries=5)
def get_weather_data(city):
    response = requests.get(f'http://api.weather.com/v3/wx/forecast/daily/5day?city={city}')
    response.raise_for_status()
    return response.json()

weather_data = get_weather_data("New York")

With the @backoff decorator, the call to get_weather_data will automatically implement exponential backoff on encountering a RequestException, retrying up to 5 times.

Example #2: Handling Outages and API Limits

In this scenario, we’ll implement a constant wait strategy to handle HTTP error statuses, such as 429 (Too Many Requests) or 503 (Service Unavailable):

@backoff.on_predicate(backoff.constant, lambda x: x.status_code in {429, 503}, interval=5, max_tries=8)
def fetch_user_data(user_id):
    response = requests.get(f'http://api.example.com/users/{user_id}')
    response.raise_for_status()
    return response.json()

user_data = fetch_user_data(12345)

Here, we use a constant delay of 5 seconds between retries, making it well-suited for managing rate limits or temporary service unavailability.

Example #3: Usage with asyncio

Now, let’s explore a more real-world example using asyncio to perform 50 concurrent API calls, each with its own backoff strategy:

import asyncio
import backoff
import aiohttp

@backoff.on_exception(backoff.expo, aiohttp.ClientError, max_tries=5)
async def async_get_user_data(session, user_id):
    async with session.get(f'http://api.example.com/users/{user_id}') as response:
        response.raise_for_status()
        return await response.json()

async def main():
    async with aiohttp.ClientSession() as session:
        tasks = [async_get_user_data(session, user_id) for user_id in range(50)]
        results = await asyncio.gather(*tasks)
        return results

# Running the coroutine
user_data_list = asyncio.run(main())

This example demonstrates how to integrate backoff in an asyncio context to handle multiple concurrent API calls.

By utilizing aiohttp for asynchronous requests, we efficiently manage potential network issues without blocking the entire program, retrying each failed request up to 5 times with exponential backoff.

We can also achieve this without using backoff by setting return_exceptions=True in asyncio.run. However, this would require us to write custom code to retry the functions that encountered errors. Using backoff make this so much simpler.

Additional features

The backoff package offers a wider variety of decorators to handle retries and backoff strategies effectively.

These are well documented on the backoff GitHub repository, but to sum them up briefly:

@backoff.on_exception retries a function when a specified exception is raised.
@backoff.on_predicate retries a function based on its return value.
@backoff.runtime uses the return value or thrown exception of the decorated method to determine backoff behavior.

backoff also allows the use of multiple decorators on a single function, providing more precise control over error handling behavior.

Jitter

backoff supports adding randomness to backoff intervals to prevent thundering herd problems.

Default jitter function backoff.full_jitter implements the ‘Full Jitter’ algorithm, which is nicely explained in this AWS blog post

Conclusion

The backoff package has significantly improved the robustness and reliability of my code when dealing with APIs and network calls. By leveraging its retry mechanisms, I’ve been able to make long-running processes less prone to failure due to temporary API glitches or network instability.

With just a few lines of code, backoff helps manage errors, turning big problems into small ones. It lets us focus on the main logic by handling temporary failures for us. This makes our code more efficient and reliable, ensuring it runs smoothly even in uncertain conditions.

Given its benefits, I believe that this library should be essential when working with APIs, especially those known to be flaky or have quotas that are easily exceeded. When you’re making a substantial number of requests—whether asynchronously or synchronously, using backoff helps ensure your entire process doesn’t break from a single failure, maintaining stability across your application.

Notes

Best Practices in Using Backoff

Exponential Backoff & Jitter: While constant waits might make sense in some cases, exponential backoff with added jitter (randomness) is usually the best practice for network operations to prevent thundering herd problems. Be careful when configuring this, it should be taken care of in a per-function basis, because the backoff needed for a network error is not the same needed for a rate-limit error.
Logging: Make sure to log backoff retries and integrate them into your monitoring tools to catch systemic issues early.
Understand Max Retries: Opt for a reasonable balance between retry aggressiveness and the user experience, as excessive retries could mask real issues or delay error notification.

References

Backoff GitHub Repository - Litl

Backoff PyPi

Thundering Herd Problem

AWS Exponential Backoff and Jitter