Improving API Performance with HTTP Keepalive

Optimizing response time by reducing unnecessary connection reestablishment

Clete Blackwell II
Staff Technology Engineer

Table of Contents

Introduction

Performance of business functionality is important. To keep this at the forefront, modern business functionality is backed by one or more customer-facing APIs (Application Programming Interfaces) which are often backed by a series of microservices. Any amount of unnecessary response time in a deeply-nested service can cause slow performance to customers, potentially creating inefficiencies and diminishing customer satisfaction.

A well-performing, simple API might look like this:

%%{init: {"theme": "base", "sequence": {"fontFamily": "monospace,monospace;"}, "themeVariables": { "primaryColor": "#e4e3e3", "primaryBorderColor": "#acabab", "noteBkgColor": "#f2ddbb", "noteBorderColor": "#acabab" } }}%% sequenceDiagram participant A as API participant B as Backend A->>+B: Request (0.5ms network latency) Note over B: Database query time (10 milliseconds) B->>-A: Response (0.5ms network latency)

In this example, we can expect an 11ms response time. However, this example is far too simple for microservice environments. For instance, examine the following customer-facing example:

%%{init: {"theme": "base", "sequence": {"fontFamily": "monospace,monospace;"}, "themeVariables": { "primaryColor": "#e4e3e3", "primaryBorderColor": "#acabab", "noteBkgColor": "#f2ddbb", "noteBorderColor": "#acabab" } }}%% sequenceDiagram actor A as Customer participant P as Public API participant O as Orchestration API participant B1 as Backend API 1 participant B2 as Backend API 2 A->>+P: Request (15ms network latency) P->>+O: Request (0.5ms network latency) O->>+B1: Request (0.5ms network latency) Note over B1: Database query time (10 milliseconds) B1->>-O: Response (0.5ms network latency) Note over A: Customer waits 53ms total O->>+B2: Request (0.5ms network latency) Note over B2: Database query time (10 milliseconds) B2->>-O: Response (0.5ms network latency) O->>-P: Response (0.5ms network latency) P->>-A: Response (15ms network latency)

These performance issues can be further exacerbated by running cross-data center or cross-cloud. For instance, if your public customer-facing API needs to authorize the user with an external identity provider, that authorization will incur much greater network latency.

Defining Response Time

We’ve assumed a small amount of network latency for each packet, whereas in reality it is not so simple. When a customer makes a request, the client must negotiate the connection with your API’s server. Then, it negotiates Transport Layer Security (TLS). Finally, the request payload can be sent. Each of these packets compounds on top of the regular network one-way trip latency.

%%{init: {"theme": "base", "sequence": {"fontFamily": "monospace,monospace;"}, "themeVariables": { "primaryColor": "#e4e3e3", "primaryBorderColor": "#acabab", "noteBkgColor": "#f2ddbb", "noteBorderColor": "#acabab" } }}%% sequenceDiagram actor A as Customer participant P as Public API A->>P: SYN (Synchronize) (15ms) P->>A: ACK (Acknowledge) (15ms / 30ms total) A->>P: SYN/ACK (15ms / 45ms total) Note over A,P: Connection is now established.
Total elapsed time >= 45ms A->>P: TLS Client Hello (15ms / 60ms total) P->>A: TLS Server Hello / Certificate Exchange (15ms / 75ms total) A->>P: Key Exchange / Change Cipher (15ms / 90ms total) P->>A: Change Cipher (15ms / 105ms total) Note over A,P: Connection & TLS are established.
Now communication can begin.
Total elapsed time >= 105ms A->>+P: HTTP Application Data (15ms / 120ms total) Note over P: Destination API processing time not included P->>-A: HTTP API Response (15ms+ / 135ms+ total) Note over A,P: Application data ACKs & FIN/ACK are not blocking in this scenario.
They have been omitted for brevity.

A simple API request where the user has a network latency of 15ms can take over 135ms. Connection establishment can be a performance killer!

In this example, we didn’t take into account:

  • Computational inefficiencies (e.g. CPU wait)
  • Network jitter
  • Network congestion or traffic deprioritization
  • Connection establishment from the Public API to the backend APIs
  • Packet loss
  • Other delays

Real world testing indicates that performance is, on average, significantly worse than in this theoretical example.

Additional assumptions in the above example:

  • Using either HTTP/1.1 or HTTP/2, which rely on TCP (Transmission Control Protocol). TCP establishes a connection using SYN/ACK/SYNACK packets. HTTP/3 will eliminate those three packets by using UDP (User Datagram Protocol), which has no connection establishment phase.
  • Using TLSv1.2. TLSv1.3 reduces response time marginally by removing one hop.

Solutions

Now that I’ve explained why connection between data centers, and over the internet, performs poorly, let’s talk about what we can do to improve performance.

Solution Description
Reduce distance between data centers Opportunities exist to use AWS Local Zones or AWS Outposts, but not a solution for most use cases
Reduce the need to jump between data centers by hosting, pre-fetching, or caching data where it is needed Hosting data is tricky. Pre-fetching requires foreknowledge of future incoming requests. Caching requires careful planning and has risks
Reduce number of round-trips needed to establish connections TLSv1.3 reduces one hop. TLSv1.3 pre-shared keys (PSKs) result in zero round-trip time TLS negotiation, but require pre-planning for the client and server. In the future, HTTP/3 will eliminate the use of TCP, further reducing connection establishment and overhead time
Use gRPC (Remote Procedure Call) Requires rearchitecute of your API systems, but also provides a robust feature set
Reuse established connections Easy to do with HTTP Keepalive

Solution Testing

I set up a test rig between AWS Lambda in AWS’s us-east-1 North Virginia region and a data center near Dallas, Texas. I ran one test each for HTTP/1.1 / HTTP/2, TLSv1.2 / TLSv1.3, and No Keepalive / Keepalive, and all combinations thereof. See Appendix -> Testing Setup to learn more about the testing setup. Some key findings:

  • Keepalive is crucial to good performance for repeated requests
  • TLSv1.3 negotiation is faster than TLSv1.2 (noticeable on the Without Keepalive bars below)
  • TLS negotiation becomes negligible if you are using keepalive, because it happens only once out of thousands of calls
  • In an unexpected turn of events, HTTP/2 is slower than HTTP/1.1 for single-threaded API calls (more on this later; HTTP/2 can be much faster than HTTP/1.1 depending on usage)
Code could not finish, this are some reasons why this happen. - Plot name not defined. The first parameter of the shortcode is the name. - There is a syntax error. check browser console.

HTTP Keepalive

Reusing connections saves significantly on response time. Check out this graph of average AWS Lambda response times in the real world. The top line is the original Lambda invocation time. This particular Lambda invokes an on-premises API twice sequentially. Enabling Keepalive saved the initialization time twice per invocation. In reality, the results were better than my theoretical measurements. Each keepalive call saved a whopping 206ms (412ms total) when compared to the same call without HTTP Keepalive.

Code could not finish, this are some reasons why this happen. - Plot name not defined. The first parameter of the shortcode is the name. - There is a syntax error. check browser console.

Why is that? Let’s take a second look at the connection establishment diagram from above, but this time we’ll use keepalive:

%%{init: {"theme": "base", "sequence": {"fontFamily": "monospace,monospace;"}, "themeVariables": { "primaryColor": "#e4e3e3", "primaryBorderColor": "#acabab", "noteBkgColor": "#f2ddbb", "noteBorderColor": "#acabab" } }}%% sequenceDiagram actor A as Customer participant P as Public API opt first request A->>P: SYN (Synchronize) (15ms) P->>A: ACK (Acknowledge) (15ms / 30ms total) A->>P: SYN/ACK (15ms / 45ms total) Note over A,P: Connection is now established.
Total elapsed time >= 45ms A->>P: TLS Client Hello (15ms / 60ms total) P->>A: TLS Server Hello / Certificate Exchange (15ms / 75ms total) A->>P: Key Exchange / Change Cipher (15ms / 90ms total) P->>A: Change Cipher (15ms / 105ms total) Note over A,P: Connection & TLS are established.
Now communication can begin.
Total elapsed time >= 105ms A->>+P: Request (15ms latency) P->>-A: Response (15ms latency) Note over A,P: Request 1 done
Total elapsed time >= 135ms
...Connection idle until another request comes in... end opt second request A->>+P: Request (15ms latency) P->>-A: Response (15ms latency) Note over A,P: Request 2 done in >= 30ms end opt third request A->>+P: Request (15ms latency) P->>-A: Response (15ms latency) Note over A,P: Request 3 done in >= 30ms
...and so on... end

The vast majority of overhead is incurred during initial connection. Here’s another view at how amazing keeping your connections alive can be. This data is taken from a real test from a machine near Atlanta, Georgia to a data center near Dallas, Texas.

Code could not finish, this are some reasons why this happen. - Plot name not defined. The first parameter of the shortcode is the name. - There is a syntax error. check browser console.

Keepalive benefits are realized when you reuse an existing connection. The initial connection is significantly slower, but with keepalive we don’t have to re-establish the connection on every request. Reusing the connection is a powerful and simple way to significantly improve performance.

Determining Keepalive Compatibility

Keepalive works differently depending on HTTP version:

Version Enabled by default? Note
HTTP/1.0 No Must set Connection: keep-alive header to enable. HTTP/1.0 is largely not used anymore.
HTTP/1.1 Yes Header is unnecessary. Most connections use HTTP/1.1
HTTP/2 Yes Header is explicitly prohibited. Usually enabling HTTP/2 is a conscious choice

In order to maintain the connection, both the client and server must agree to keep the connection open. To test whether a server keeps connections alive, run curl -v <url> and look at the very last line of output:

  • * Connection #0 to host <host> left intact
  • * Closing connection 0

The length of time and number of connections that a server is willing to keep open at any time may vary significantly based on server congestion, default timeout configurations, and other factors. Even if you intend to keep a connection open, it can still be closed at any time by the server.

Client Keepalive Enablement

In most cases, it is only required to cache the HTTP client in order to keep the connection alive. Initialize your client globally or inside of an initialization block rather than inside the function that is using the client. You should also look at your client’s configuration to determine whether you can increase the time to live (TTL) of a live connection.

E.g. pseudocode:

client = new client with keepalive enabled

function mainMethod:
  client.get()

Or use lazy initialization:

client = null

function mainMethod:
  if client is null:
    client = new client with keepalive enabled

  client.get()

CAUTION:

  • Ensure that the downstream service is not maintaining any state, such as session cookies, between requests. Most APIs shouldn’t be doing this, but you should still proactively check. Make a call using curl -v to the API and check the response header for any cookies that are set.
  • Ensure that your client is not persisting authentication or headers between requests. Pass in the appropriate authentication “fresh” each invocation.

See Appendix -> Keepalive Enablement Examples for code snippets for your favorite language to enable HTTP Keepalive.

TLSv1.3

TLSv1.3 can improve performance slightly by reducing round-trips required to establish connections. It is also possible to pre-share keys between the client and server to completely eliminate TLS negotiation overhead. I haven’t explored implementing pre-shared keys (PSKs), primarily because it only provides benefit during connection establishment. If we use TLSv1.3 in conjunction with HTTP Keepalive, we only have to incur TLS negotiation overhead once.

Implementing TLSv1.3 is usually automatic. To test a given server’s support for TLSv1.3, run curl -v <URL> and look in the output for TLS messages. Here is an example message stating that TLSv1.2 was negotiated:

* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256

HTTP/2…?

I’ll admit, I expected HTTP/2 to be faster than HTTP/1.1, even for single API calls.

Google’s blog post states that HTTP/2 reduces response time and overhead. It has a lot of great features:

Feature Why it didn’t help
Push support to preemptively send content before the client asks for it Not useful for APIs; designed for webpage loads to send related content like images
Connection multiplexing APIs are typically request/response and calls are not usually made to the same server in parallel
Compression of header fields May make a difference for some cases, but didn’t matter for me

Is HTTP/2 faster than HTTP/1.1? Most certainly. Is it faster for request/response API calls? In most cases, no.

To discover why HTTP/2 was slower, I had to run another load test and dig into the network traffic using Wireshark. I used a TLS keylog dump to decrypt my traffic.

Results

I ran a one-minute load test from my machine to a remote server and I’ve included the first ~70 calls in the chart below.

Code could not finish, this are some reasons why this happen. - Plot name not defined. The first parameter of the shortcode is the name. - There is a syntax error. check browser console.

After the initial connection (100+ms), observe that there are two tiers of response time: ~28-30ms and ~40-45ms. My best guess is that this is caused by network jitter. While that finding is interesting, it doesn’t explain the discrepancy between HTTP/1 and HTTP/2. To understand the discrepancy, we need to dive into the packet capture.

Below are packet captures containing initial connection establishment followed by repeated sequential requests to the server; simulating traffic using keepalive.

HTTP/1.1

HTTP/1.1 is simple as a protocol. Each request packet is followed by a single response packet containing the headers and data.

wireshark http1.1

HTTP/2

In the test I ran, HTTP/2 is 9% slower than HTTP/1.1.

HTTP/2 is a more robust protocol. The first request to the server contains settings packets defining how many streams can be opened for multiplexing and other settings about the connection. Afterward, each request is still a single packet, but the response is broken up into two packets: a header packet and a data packet. There is always a small delay between those response packets, which does not exist in HTTP/1.1. Sometimes, the network jitter can cause one of the response packets to be delayed more than a marginal amount.

wireshark http2

HTTP/2 Summary

I believe, but am not 100% certain, that the network jitter and small amounts of latency between HTTP/2 reply packets is what causes HTTP/2 to be slower for single request/response calls. HTTP/2 will be significantly faster when using advanced features like push and multiplexing.

Summary

There are many different ways to improve API performance, but one of the easiest is to use HTTP Keepalive for all your clients when invoking an API. It’s easy to enable, has few, if any, downsides and improves performance significantly. I hope you’ll give it a try. Thanks for reading and keep on building!

If you have any thoughts or comments, I’d love to hear from you: contact me.

To learn more about technology careers at State Farm, or to join our team visit, https://www.statefarm.com/careers.

Appendix

Appendix 1 - Keepalive Enablement Examples

To take advantage of keepalive, you need to cache the connection object between invocations. That means always creating the instance of the connection library outside of the handler code (AWS Lambda) or setting it at the class/singleton/static level.

Before implementing, please research all the options for keepalive that your HTTP library has. Some notes from my testing:

  • Some HTTP libraries send Keepalive heartbeat / probes to keep the connection alive. That works for containers that are always running, but will not work for Lambdas that are frozen when invocation ends (read more about Lambda lifecycle). Lack of packets may result in a stale connection sooner than anticipated.
  • Some HTTP libraries have timeouts for the maximum time a connection can live. In my testing, I set this to 300 seconds. I recommend you look into the configuration for your particular library.

Node.JS

AWS SDK

Version 3 of the AWS SDK enables keepalive by default.

For version 2, see AWS SDK documentation. The easiest method: set the environment variable AWS_NODEJS_CONNECTION_REUSE_ENABLED=1.

Axios

Axios is a promise-native library. Enablement of keepalive happens at the core https library and it controlled by the keepAlive flag.

import { Agent } from 'https'
import 'axios'

// It is important to create your instance outside of your application code
const axiosInstance = axios.create({..., httpsAgent: new Agent({keepAlive: true})})

const mainMethod = async () => {
  ...
  const result = await axiosInstance.get(url)
  ...
}

Python

AWS SDK

My research says that keepalive should be enabled by default. I did not test it.

Requests Library

Keepalive is enabled by default in the requests library when you create a session. Be careful to use stateless requests across invocations.

import requests

# It is important to create the request session outside of your application code
session = requests.session()

def main():
  session.get(url)

Go

AWS SDK

My research says that keepalive should be enabled by default. I did not test it.

Core net/http Package
import (
  "io"
  "io/ioutil"
  "net/http"
)

// It is important to initialize the client outside of your application code
var client *http.Client = &http.Client{}

func main() {
  res, err := client.Get(url) // In golang, you have to read the body to completion and then close the body in order to reuse a connection

  if err != nil {
    ...
  }

	io.Copy(ioutil.Discard, res.Body) // Reading or discarding the body is required for keepalive
	res.Body.Close() // It is required to close the body for reuse
}

Java

AWS SDK

The SDK enables keepalive by default but you can customize it.

Apache HttpClient 4.x

It’s been a while since I’ve written Java. I used to love and use it daily, but now it is a relic of my past. It may be that there is a more efficient/clean way to instantiate the client that I have not discovered. Feel free to e-mail me if my example is suboptimal.

import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClientBuilder;
import org.apache.http.util.EntityUtils;

public class MyClass {
  // Normally you would want to initialize this as a Spring bean
  private CloseableHttpClient client;

  MyClass() {
    client = HttpClientBuilder.create().build(); // Customize your client if you like
  }

  public void invoke() {
    CloseableHttpResponse response = client.execute(new HttpGet(url));

    EntityUtils.consume(response.getEntity()); // Must consume the response either by reading it or eating it

    // Must close the response
    response.close();
  }
}

Appendix 2 - Testing Setup

In order to prove out performance improvements, I needed a solid and scalable testing solution. I wanted to be able to test out multiple approaches quickly, and also have a large sample size so that small “blips” would be minimized in their impact of results.

%%{init: {"theme": "base", "sequence": {"fontFamily": "monospace,monospace;"}, "themeVariables": { "primaryColor": "#e4e3e3", "primaryBorderColor": "#acabab", "noteBkgColor": "#f2ddbb", "noteBorderColor": "#acabab" } }}%% graph LR Laptop(My Laptop)--"Put 1000s of Messages"-->SQS SQS-->Lambda Lambda--"(1) hundreds of requests over VPN"-->Linux(Linux Server running Caddy*) Lambda--"(2) Store CSV of results"-->S3 subgraph Workstation Laptop end subgraph AWS us-east-1 North Virginia SQS Lambda S3 end subgraph Data Center - Dallas Linux end

* Caddy is a ridiculously simple and highly performant HTTP server.

Each message in the queue contains parameters specifying how many and what kind of requests to make. The Lambda uses a binary called Hey which is a simple HTTP load testing utility. I contributed a couple of improvements (1, 2) to Hey which are not merged into the master repository yet.

Finally, I wrote some Bash scripts; one to add a lot of messages into the queue and one to analyze the results. Each test runs for 10 minutes to get a sufficient sample size.