Breaking the Limits

Leveraging Cloud Architecture at Massive Scale

Chad Prentice
Senior Technology Engineer

As an insurance company, you’re bound to have a lot of documents to store and maintain. As a 100 year-old insurance company with over 87 million policies and accounts, State Farm REALLY has a lot of documents…to the tune of 12 billion documents weighing in at almost 3 petabytes of data. Those totals grow by more than 3 million documents (3 terabytes) a day. Think about an insurance claim for an auto accident. You have the photos of the damage, the estimate of the repair and the payment for the repairs. All of those represent one or multiple documents. Or consider just the basic policy maintenance. You have the renewal statement and your ID cards that you receive as proof of insurance. Those are documents too. And each of those documents have to be available on a moment’s notice when a customer or a claim handler or your agent needs them.

Storing and managing documents are critical to the operations of State Farm. So, if you are going to migrate all of this data to AWS, it’s important to make sure your solution can scale. That was the biggest question in our minds when we decided to migrate our enterprise document storage solution to AWS. When implementing a new solution, it might appear to work well with low volumes but can get more challenging as you ramp up to full scale. So how did we make sure that happened seamlessly?

Our Design

We first had to establish a design that we believed would scale. We wanted to take advantage of serverless architecture but also leverage well-established services that were known for handling massive data volumes with high throughput rates. We also wanted something that was flexible and easy to use with many different services on AWS. This led to us selecting one of the oldest and most well-established services as the center of our architecture. That service was S3. The rest of our design included other serverless components, including API Gateway, Lambda, DynamoDB and SQS:

image image

S3 is an object storage service that is able to scale to any size we need with reliability in terms of availability, security and performance. It easily integrates with nearly any other service on AWS and can also be optimized for costs and access to meet our business needs. This matched exactly what we were looking for in a storage solution. However, at the scale of State Farm, we have seen many products that we thought could meet our needs, but we often find that it frequently does not work out as planned.

S3 Prefixes

S3 allows you to use prefixes for organizing data. These look like directories in a file system, but they are really just a string that makes up the first half the S3 object key. The other half is the name of the object. You can refer to the S3 User Guide for more details on how prefixes work.

We quickly recognized that our prefix design for our S3 bucket would make a big difference in our performance and whether, or not, S3 would scale successfully. AWS had been talking about this for a long time like in this blog by AWS VP and Chief Evangelist Jeff Barr. In the past we have used “salted” keys to avoid “hotspotting”, which is caused by hitting the same region of data all at once. A lot of the documentation we found when we were doing our design (including Jeff Barr’s blog) seemed to suggest something similar was needed for S3, if your volume was high enough. Note this quote from the blog:

The trick here is to calculate a short hash (note: collisions don’t matter here, just pseudo-randomness) and pre-pend it to the string you wish to use for your object name. This way, operations are again fanned out over multiple partitions. To list keys with common prefixes, several list operations can be performed in parallel, one for each unique character prefix in your hash.

Fortunately, by the time we were building our solution, AWS had published this announcement with this note at the very end of the announcement:

This S3 request rate performance increase removes any previous guidance to randomize object prefixes to achieve faster performance. That means you can now use logical or sequential naming patterns in S3 object naming without any performance implications.

This new guidance gave us 3,500 PUTs and 5,500 GETs per second for each prefix/partition. This meant that S3 should be able to scale sufficiently to handle the rates we needed as long as we had enough unique prefixes. Based on this, we came up with a prefix structure that used internal document classifications that are assigned to each document along with a unique document id. By including the document id in the prefix, we actually have a separate prefix for every document. In theory, this would mean our scale could be unlimited.

We also had a couple of additional fields at the beginning of the prefix to separate our data from other uses cases since this bucket was shared by multiple application areas. So a sample prefix would look something like this:

app/contents/DocClass/50193B45BFE920210521164940221075217112

Slow Down!

We deployed our application and began sending data to S3 at low volumes. Everything went smoothly and worked as expected, but this was not surprising since we were not really stressing the architecture yet. As a couple of months passed, we gradually began to increase our ingestion rates without any major issues, but we were still far from running at full scale. Then we began to migrate our historical data from on-prem to AWS. This is when things got interesting.

Our migration flow involved Lambda functions that both read from and wrote to the S3 bucket. As we ramped up the rate of execution for these Lambda functions, we started to get strange errors from S3 that confused us:

HTTP 503: An error occurred (SlowDown) when calling the PutObject operation

With help from AWS support we found out that the way S3 uses prefixes is not quite as straightforward as we thought. S3 actually uses partitions to manage the S3 capacity and there is a difference between an S3 prefix and an S3 partition even though they are closely related. Just because you have a prefix does not mean there is an S3 partition already built for that prefix. In fact, you could even be sharing a prefix with another bucket when your bucket is first created. (AWS does this to better manage their infrastructure. They don’t want to build partitions if they are not needed.) As you begin to use the bucket, S3 will build out the necessary partitions based on your access patterns.

AWS uses AI tools to monitor your S3 traffic patterns and determine which prefixes need new partitions based on a sustained period of usage. It is also flexible and can assign a partition anywhere in the prefix string. So, you might have one document class that needs a single partition. That partition would be assigned to a prefix that might look like this:

app/contents/DocClass1

The partitioning is not bound by where you have ‘/’ delimiter, so you could have another document class that needs a partition for documents starting with ‘A’. This partition would look something like this:

app/contents/DocClass2/A

When you first start to near the throughput limits (3,500 PUTs/5,500 GETs) of the existing partitions, you will get the slowdown errors like we experienced. It does not mean that S3 cannot scale for your volume. However, you need to include retry logic to handle this and allow S3 time to build the new partitions required for your access patterns. If you use the AWS SDKs, these should include automatic retries and you can configure the number of retries you need to help prevent any issues when the slowdown errors occur.

Working at Scale

Once we understood how the S3 partitioning worked, we were able to enhance our retry configurations and keep going. We had to make sure we kept running when the slowdown errors occurred so that S3 could see sustained traffic on those prefixes and build out the new partitions to increase our capacity. This allowed us to complete our migration relatively smoothly without any major issues. Our daily workloads, with more than 3 million new documents and 4 million document retrievals, ran fine as well.

So, the answer to our question was that S3 does scale, but you need to have an understanding of how it works. Fortunately, AWS has now documented this clearly with this S3 optimizing performance guide and an additional guide for handling retries. We are now quite happy with our solution using S3.

To learn more about technology careers at State Farm, or to join our team visit, https://www.statefarm.com/careers.