Saving on Amazon Athena costs by using Provisioned Capacity
For heavy Athena users, a small investment in time can yield big savings
Introduction
Amazon Athena is an AWS serverless service allowing users to leverage standard SQL queries to analyze data stored in Amazon S3. Today, Athena has many uses across State Farm® ranging from surfacing application logs for troubleshooting to producing operational reports for contact center performance through AWS Quicksight. Here at State Farm, data enables informed decisions. In order to do that we have to generate plenty of useful data to make good decisions. As a result of this approach, certain Athena queries scan a lot of data even for short operational intervals. With Athena’s pricing model being per-TB of data scanned, costs have grown significantly as cloud adoption grows.
Up until April of 2023 Athena’s only pricing model was based on the amount of data scanned in S3. Since then, AWS introduced an additional pricing model for Athena. Consumers of Athena now have the option to reserve provisioned capacity to run queries.
Athena Provisioned Capacity
When paying per-TB of S3 data scanned, AWS allocates compute resources from a dedicated backend pool to run each query. Consumers can now reserve their own dedicated compute resources to run their queries. These dedicated resources, called Provisioned Capacity, run the queries submit by non-Spark enabled Athena Workgroups associated to the reservation. Capacity reservations can be increased or decreased as necessary. Any query submit beyond the Provisioned Capacity will be queued until resources become available, up to the query timeout limit.
Provisioned Capacity is reserved on a per-DPU (Data Processing Unit) basis. A single DPU is equal to 4 vCPUs and 16GB Ram. Pricing is $0.30 per DPU Hour, billed per minute. The minimum reservation is 8 hours, and the minimum capacity reservation is 24 DPUs, in increments of 4 DPUs. The minimum capacity reservation for the smallest amount of time is $57.60. This should provide roughly 5 concurrent queries across all associated workgroups for 8 hours. For context, a single query that scans 8TB of data will cost $40.
The complete documentation for Provisioned Capacity is located here.
Capacity Considerations
Consideration should be made for how much capacity a specific query requires, and the overall capacity needed in general. The tricky part is Athena itself determines the number of DPUs required to run a particular DML query. Due to this, query tuning and optimization continues to be extremely important as it will reduce the number of DPUs required for query execution. By reviewing AWS capacity requirements documentation, we can see that AWS recommends that we start with estimating the number concurrently executed queries.
Concurrent Query Execution | DPU requirement |
---|---|
10 | 40 |
20 | 96 |
30+ | 240 |
The strategy, as AWS recommends, should be to estimate your DPU requirements, take it for a spin, and adjust based on your own goals. For example, does lowest cost reign supreme? If so, finding the lowest effective resource reservation that provides the most tolerable query queue times will be the best bet. If a critical aspect is having queries run as fast as possible, without any queueing, reserving enough capacity for peak query submissions will be required. Naturally, understanding how Athena usage will increase or decrease over the longer term is also an important consideration as well.
There are many strategies that can be employed to balance query volume, query queue times, and cost. For example, Provisioned Capacity can be modified, with minimum duration still being 8 hours. In this way, you can adjust capacity reservations up and down if there are concentrated days or times when query volume is known to spike. If quicksight data is scheduled to be refreshed on Monday’s between 10pm-3am, the reservation can be bumped up to accommodate increased query volume and backed down after the spike in queries is over. This could be done with lambda functions and an EventBridge Rules. Using this method, we can assign Provisioned Capacity dynamically based on known query spikes.
Alternatively, if high query volume is not isolated to a single day or time, the day can be divided into 3 8-hour slots such as high, mid, and low query volume. Lambda and EventBridge can be used similar to the above example to bump the capacity up and down every 8 hours to accommodate the respective query volume. At State Farm, we have many use cases for Provisioned Capacity. One in particular uses this strategy where we divide the day into 3 different capacity slots based on consumption requirements. With that information, we can appropriately size the provisioned capacity based on the needs of each stakeholder business unit within the use case. Using EventBridge rules and Lambda functions, we automate the increases and decreases of capacity each day. We must also plan according to overall increases in consumption to ensure our capacity remains inline with our needs.
Ultimately, each use case must have a strategy and goals in mind in order to optimize capacity and cost using provisioned capacity.
Conclusion
Provisioned Capacity can help lower Athena query costs if using non-Spark enabled workgroups. It can also help reduce the month-to-month bill fluctuations that come with varying amounts of data scanned from queries. Some work is required to right-size the capacity required and a little vigilance over time to maintain the correct capacity.
To learn more about technology careers at State Farm, or to join our team visit, https://www.statefarm.com/careers.