Fortifying your code with the Backoff Python package
Recently, I’ve been working extensively with API services, both commercial and our own, integrating them into larger systems and ensuring their robustness.
However, one problem I consistently encountered was handling API call failures gracefully. Network calls can be flaky: temporary outages, unexpected VPN disconnections, random failures, exceeding quotas… These issues can provoke fatal errors on long-running processes.
Enter the backoff
Python package, which has proven to be a game-changer in fortifying my code against such failures.
Why Backoff?
Before discovering backoff
, I frequently found myself writing awkward loops to handle repeated API call attempts. These loops looked something like this:
while True:
try:
response = make_api_call()
if response.status_code == 200:
break
except NetworkError:
continue
except QuotaError:
sleep(60)
continue
except ServerError:
logger.error()
This loop would endlessly attempt the call until it succeeded, offering support for some simple error management. This was effective but inelegant, verbose and hard to maintain. Moreover, it lacked any form of exponential delay, which meant it could overwhelm the network with retries during an outage or temporary failure.
Stumbling upon backoff
package felt like a breath of fresh air. It aligned perfectly with the solution I envisioned: automating retries while incorporating best practices like exponential backoff and jitter without the clunky, error-prone logic.
Integrating backoff
not only simplified my code but also significantly increased its reliability and readability. This package effortlessly replaced my loop with much cleaner and more efficient logic—and it only required a few lines of code.
Integrating Backoff with API Calls
Let’s dive into some examples illustrating how backoff
enhances API call reliability.
Example #1: Basic API Call Handling
Consider a simple API call to a weather service, where we might occasionally encounter a failure due to rate limits or connectivity issues.
Here’s how you might wrap that call using backoff
:
import backoff
import requests
@backoff.on_exception(backoff.expo, requests.exceptions.RequestException, max_tries=5)
def get_weather_data(city):
response = requests.get(f'http://api.weather.com/v3/wx/forecast/daily/5day?city={city}')
response.raise_for_status()
return response.json()
weather_data = get_weather_data("New York")
With the @backoff
decorator, the call to get_weather_data
will automatically implement exponential backoff on encountering a RequestException
, retrying up to 5 times.
Example #2: Handling Outages and API Limits
In this scenario, we’ll implement a constant wait strategy to handle HTTP error statuses, such as 429 (Too Many Requests) or 503 (Service Unavailable):
@backoff.on_predicate(backoff.constant, lambda x: x.status_code in {429, 503}, interval=5, max_tries=8)
def fetch_user_data(user_id):
response = requests.get(f'http://api.example.com/users/{user_id}')
response.raise_for_status()
return response.json()
user_data = fetch_user_data(12345)
Here, we use a constant delay of 5 seconds between retries, making it well-suited for managing rate limits or temporary service unavailability.
Example #3: Usage with asyncio
Now, let’s explore a more real-world example using asyncio
to perform 50 concurrent API calls, each with its own backoff strategy:
import asyncio
import backoff
import aiohttp
@backoff.on_exception(backoff.expo, aiohttp.ClientError, max_tries=5)
async def async_get_user_data(session, user_id):
async with session.get(f'http://api.example.com/users/{user_id}') as response:
response.raise_for_status()
return await response.json()
async def main():
async with aiohttp.ClientSession() as session:
tasks = [async_get_user_data(session, user_id) for user_id in range(50)]
results = await asyncio.gather(*tasks)
return results
# Running the coroutine
user_data_list = asyncio.run(main())
This example demonstrates how to integrate backoff
in an asyncio
context to handle multiple concurrent API calls.
By utilizing aiohttp
for asynchronous requests, we efficiently manage potential network issues without blocking the entire program, retrying each failed request up to 5 times with exponential backoff.
We can also achieve this without using backoff
by setting return_exceptions=True
in asyncio.run
. However, this would require us to write custom code to retry the functions that encountered errors. Using backoff
make this so much simpler.
Additional features
The backoff
package offers a wider variety of decorators to handle retries and backoff strategies effectively.
These are well documented on the backoff GitHub repository, but to sum them up briefly:
@backoff.on_exception
retries a function when a specified exception is raised.@backoff.on_predicate
retries a function based on its return value.@backoff.runtime
uses the return value or thrown exception of the decorated method to determine backoff behavior.
backoff
also allows the use of multiple decorators on a single function, providing more precise control over error handling behavior.
Jitter
backoff
supports adding randomness to backoff intervals to prevent thundering herd problems.
Default jitter function backoff.full_jitter implements the ‘Full Jitter’ algorithm, which is nicely explained in this AWS blog post
Conclusion
The backoff
package has significantly improved the robustness and reliability of my code when dealing with APIs and network calls. By leveraging its retry mechanisms, I’ve been able to make long-running processes less prone to failure due to temporary API glitches or network instability.
With just a few lines of code, backoff
helps manage errors, turning big problems into small ones. It lets us focus on the main logic by handling temporary failures for us. This makes our code more efficient and reliable, ensuring it runs smoothly even in uncertain conditions.
Given its benefits, I believe that this library should be essential when working with APIs, especially those known to be flaky or have quotas that are easily exceeded. When you’re making a substantial number of requests—whether asynchronously or synchronously, using backoff
helps ensure your entire process doesn’t break from a single failure, maintaining stability across your application.
Notes
Best Practices in Using Backoff
- Exponential Backoff & Jitter: While constant waits might make sense in some cases, exponential backoff with added jitter (randomness) is usually the best practice for network operations to prevent thundering herd problems. Be careful when configuring this, it should be taken care of in a per-function basis, because the backoff needed for a network error is not the same needed for a rate-limit error.
- Logging: Make sure to log backoff retries and integrate them into your monitoring tools to catch systemic issues early.
- Understand Max Retries: Opt for a reasonable balance between retry aggressiveness and the user experience, as excessive retries could mask real issues or delay error notification.