Jan 5, 2021

Using Terraform for S3 Storage with MIME Type Association

Confidently deploy your content to S3 using Terraform - here's how we did it.

Dillon Henn

Lead Software Engineer

Note: Some content in this article could potentially be outdated due to age of the submission. Please refer to newer articles and technical documentation to validate whether the content listed below is still current and/or best practice.

Today, many product teams may utilize the Amazon Simple Storage Service to store Single Page Application (SPA) resources. Oftentimes they do this by packaging infrastructure code with the application code. However, this tends to add complexity to the pipeline because the content for S3 must be uploaded and synched in a separate step.

The State Farm Cloud Infrastructure team thought maybe, just maybe, it’s possible to use Terraform, to alleviate that extra step.

In this article I’ll walk through how a team could leverage Terraform to upload object(s) into S3 and what issues you may encounter along the way.

Things to know

All code snippets used throughout can be found on the @d-henn GitHub repository.
Terraform version 0.12.8+ is needed for the fileset(path, pattern) and filemd5(path) functions.

Let’s get started

For demonstration purposes, let’s assume we already created the S3 bucket and will be using a sample index.html file as our illustration object. Digging through the Terraform documentation we found the aws_s3_bucket_object resource was a good place to start.

resource "aws_s3_bucket_object" "s3_upload" {
  bucket = "s3-upload-bucket-test"
  key    = "index.html"
  source = "${path.root}/files-example/index.html"
}

Here is the object in S3 after running a terraform apply.

That’s it, Terraform successfully uploaded the file!

Well, we thought it did…

Let’s check the attributes associated with this object.

Hmmm, this isn’t correct. To understand why that’s the case, you need to know what a MIME type is.

MIME stands for Multipurpose Internet Mail Extensions, and to keep things simple, it’s a standard used to classify certain types of information on the internet. This classification system is incredibly important; your server must send back the correct Content-Type header to the browser. If the browser does not have the correct data in the correct format, it will not know how to process the data to serve back to the user, which may result in your site functioning improperly.

Take the index.html file as an example. For the browser to handle this file correctly, the server would need to send back a type of text/html in the response header.

If you’d like to learn more about MIME types, you can head over to the Mozilla Documentation.

Gotcha, so what happened?

When storing object(s) in S3 via the AWS console, SDK, CLI, etc., the correct metadata content type is supplied for you, whereas the AWS provider does not. By default, the aws_s3_bucket_object resource will set the type to binary/octet-stream unless you explicitly define it.

Of course, you can hard code the content type in the resource like this:

resource "aws_s3_bucket_object" "s3_upload" {
  //...
  content_type = "text/html" // or whatever type you need
}

However, that works only if you have one object type. In our scenario we are looping over and storing objects of all assortments which requires a more dynamic content type association.

MIME file to the rescue

To achieve the dynamic association, we needed to make sure there is a one-to-one relationship between a MIME type and an object. We decided to create a simple mime.json file that would map an object extension to its corresponding MIME type. This JSON file includes many of the standard types found on the Internet Assigned Numbers Authority (IANA) official website.

{
  ".123": "application/vnd.lotus-1-2-3",
  ".3dml": "text/vnd.in3d.3dml",
  ".3g2": "video/3gpp2",
  ".3gp": "video/3gpp",
  ".a": "application/octet-stream",
  ".aab": "application/x-authorware-bin",
  ".aac": "audio/x-aac"
  // ...and so forth
}

In hindsight, we may have been able to make an HTTP request out to a CDN or S3 bucket that hosts the content, rather than having it live locally with the Terraform.

Retrieving the proper MIME type

Now that the static file has been set up, we can leveraged several Terraform functions to read the data into a local variable.

locals {
  mime_types = jsondecode(file("${path.module}/data/mime.json"))
}

Next, we performed a simple lookup on that variable. Using the regular expression "\\.[^.]+$", we pulled the extension from the current object and used it as the key to the mime_types map. For example, the file index.html is ran against the regex and will return .html as the key for the map lookup. Since the map has a key/value pair that matches, in this case {".html": "text/html"}, the result for the content type will be "text/html".

Note: If for whatever reason there is no match, leave the content_type null, to indicate you’re using the resource default.

resource "aws_s3_bucket_object" "s3_upload" {
  // ...omitted for brevity
  content_type = lookup(local.mime_types, regex("\\.[^.]+$", "index.html"), null)
}

Eureka!

Everything is ready to go, applying the Terraform again will trigger the changes. Below is the diff that Terraform will write out to the console. The important part here is in the content type:

- resource "aws_s3_bucket_object" "s3_upload" {
-   // ...
-   bucket                 = "s3-upload-bucket-test" -> null
-   content_type           = "binary/octet-stream" -> null
-   source                 = "./files-example/index.html" -> null
- }
+ resource "aws_s3_bucket_object" "s3_upload" {
+   // ...
+   bucket                 = "s3-upload-bucket-test"
+   content_type           = "text/html"
+   source                 = "./files-example/index.html"
+ }

After navigating to the console to check on the object, we saw that index.html had the correct content type!

Expanding to multi-file upload

All the work above may seem like overkill for something so trivial, but the true power comes when you loop over many objects. As we discussed earlier, many of our teams store their SPAs in S3 which consists of many different types of objects that traverse many different directories. We can now pass something like a whole build folder to the aws_s3_bucket_object resource and loop over each object, assigning it the corresponding content type.

resource "aws_s3_bucket_object" "s3_upload" {
  for_each = fileset("${path.root}/build", "**/*")

  bucket = "s3-upload-bucket-test"
  key    = each.value
  source = "${path.root}/build/${each.value}"

  etag         = filemd5("${path.root}/build/${each.value}")
  content_type = lookup(local.mime_types, regex("\\.[^.]+$", each.value), null)
}

Looking back

Here is a high-level overview of what we did in order to fulfill our needs:

Deploy content to S3 via Terraform.
Create a MIME map to be able to associate dynamic content types.
Leverage the data to retrieve the correct association.
Upload entire recursive directories such as a SPA.

We’re now able to confidently deploy our content to S3 using Terraform. It took a little bit of work, but it was well worth it. It’s possible that one day this lookup functionality will be integrated into the provider itself, but for now this approach has helped our team’s development workflow!

Once again, you can find the code examples on GitHub from @d-henn!

To learn more about technology careers at State Farm, or to join our team visit, https://www.statefarm.com/careers.

terraform

AWS