Solutions to "Scala with Cats": Chapter 2

April 3, 2023

These are my solutions to the exercises of chapter 2 of Scala with Cats.

Table of Contents

Exercise 2.3: The Truth About Monoids

For this exercise, rather than defining instances for the proposed types, I defined instances for Cats’ Monoid directly. For that purpose, we need to import cats.Monoid.

For the Boolean type, we can define 4 monoid instances. The first is boolean or, with combine being equal to the application of the || operator and empty being false:

val booleanOrMonoid: Monoid[Boolean] = new Monoid[Boolean] {
  def combine(x: Boolean, y: Boolean): Boolean = x || y
  def empty: Boolean = false
}

The second is boolean and, with combine being equal to the application of the && operator and empty being true:

val booleanAndMonoid: Monoid[Boolean] = new Monoid[Boolean] {
  def combine(x: Boolean, y: Boolean): Boolean = x && y
  def empty: Boolean = true
}

The third is boolean exclusive or, with combine being equal to the application of the ^ operator and empty being false:

val booleanXorMonoid: Monoid[Boolean] = new Monoid[Boolean] {
  def combine(x: Boolean, y: Boolean): Boolean = x ^ y
  def empty: Boolean = false
}

The fourth is boolean exclusive nor (the negation of exclusive or), with combine being equal to the negation of the application of the ^ operator and empty being true:

val booleanXnorMonoid: Monoid[Boolean] = new Monoid[Boolean] {
  def combine(x: Boolean, y: Boolean): Boolean = !(x ^ y)
  def empty: Boolean = true
}

To convince ourselves that the monoid laws hold for the proposed monoids, we can verify them on all instances of Boolean values. Since they’re only 2 (true and false), it’s easy to check them all:

object BooleanMonoidProperties extends App {
  final val BooleanValues = List(true, false)

  def checkAssociativity(monoid: Monoid[Boolean]): Boolean =
    (for {
      a <- BooleanValues
      b <- BooleanValues
      c <- BooleanValues
    } yield monoid.combine(monoid.combine(a, b), c) == monoid.combine(a, monoid.combine(b, c))).forall(identity)

  def checkIdentityElement(monoid: Monoid[Boolean]): Boolean =
    (for { a <- BooleanValues } yield monoid.combine(a, monoid.empty) == a).forall(identity)

  def checkMonoidLaws(monoid: Monoid[Boolean]): Boolean =
    checkAssociativity(monoid) && checkIdentityElement(monoid)

  assert(checkMonoidLaws(booleanOrMonoid))
  assert(checkMonoidLaws(booleanAndMonoid))
  assert(checkMonoidLaws(booleanXorMonoid))
  assert(checkMonoidLaws(booleanXnorMonoid))
}

Exercise 2.4: All Set for Monoids

Set union forms a monoid for sets:

def setUnion[A]: Monoid[Set[A]] = new Monoid[Set[A]] {
  def combine(x: Set[A], y: Set[A]): Set[A] = x.union(y)
  def empty: Set[A] = Set.empty[A]
}

Set intersection only forms a semigroup for sets, since we can’t define an identity element for the general case. In theory, the identity element would be the set including all instances of the type of elements in the set, but in practice we can’t produce that for a generic type A:

def setIntersection[A]: Semigroup[Set[A]] = new Semigroup[Set[A]] {
  def combine(x: Set[A], y: Set[A]): Set[A] = x.intersect(y)
}

The book’s solutions suggest an additional monoid (symmetric difference), which didn’t occur to me at the time:

def setSymdiff[A]: Monoid[Set[A]] = new Monoid[Set[A]] {
  def combine(x: Set[A], y: Set[A]): Set[A] = (x.diff(y)).union(y.diff(x))
  def empty: Set[A] = Set.empty[A]
}

Exercise 2.5.4: Adding All the Things

The exercise is clearly hinting us towards using a monoid, but the first step can be defined in terms of Int only. The description doesn’t tell us what we should do in case of an empty list, but, since we’re in a chapter about monoids, I assume we want to return the identity element:

def add(items: List[Int]): Int =
  items.foldLeft(0)(_ + _)

Changing the code above to also work with Option[Int] and making sure there is no code duplication can be achieved by introducing a dependency on a Monoid instance:

import cats.Monoid

def add[A](items: List[A])(implicit monoid: Monoid[A]): A =
  items.foldLeft(monoid.empty)(monoid.combine)

With the above in place we continue to be able to add Ints, but we’re also now able to add Option[Int]s, provided we have the appropriate Monoid instances in place:

import cats.instances.int._
import cats.instances.option._

add(List(1, 2, 3))
// Returns 6.

add(List(1))
// Returns 1.

add(List.empty[Int])
// Returns 0.

add(List(Some(1), Some(2), Some(3), None))
// Returns Some(6).

add(List(Option.apply(1)))
// Returns Some(1).

add(List.empty[Option[Int]])
// Returns None.

To be able to add Order instances without making any modifications to add, we can define a Monoid instance for Order. In this case, we’re piggybacking on the Monoid instance for Double, but we could’ve implemented the sums and the production of the identity element directly:

case class Order(totalCost: Double, quantity: Double)

object Order {
  implicit val orderMonoid: Monoid[Order] = new Monoid[Order] {
    import cats.instances.double._

    val doubleMonoid = Monoid[Double]

    def combine(x: Order, y: Order): Order =
      Order(
        totalCost = doubleMonoid.combine(x.totalCost, y.totalCost),
        quantity = doubleMonoid.combine(x.quantity, y.quantity)
      )

    def empty: Order =
      Order(
        totalCost = doubleMonoid.empty,
        quantity = doubleMonoid.empty
      )
  }
}

Solutions to "Scala with Cats": Chapter 1

March 25, 2023

These are my solutions to the exercises of chapter 1 of Scala with Cats.

Table of Contents

Setting Up the Scala Project

I solved the exercises in a sandbox Scala project that has Cats as a dependency. The book recommends using a Giter8 template, so that’s what I used:

$ sbt new scalawithcats/cast-seed.g8

The above command generates (at the time of writing) a minimal project with the following build.sbt file:

name := "scala-with-cats"
version := "0.0.1-SNAPSHOT"

scalaVersion := "2.13.8"

libraryDependencies += "org.typelevel" %% "cats-core" % "2.8.0"

// scalac options come from the sbt-tpolecat plugin so need to set any here

addCompilerPlugin("org.typelevel" %% "kind-projector" % "0.13.2" cross CrossVersion.full)

The above differs a bit from what the book lists, since there are both new Scala 2.13 and Cats versions out already, but I followed along using these settings with minimal issues.

Exercise 1.3: Printable Library

The definition of the Printable type class can be as follows:

trait Printable[A] {
  def format(value: A): String
}

In terms of defining the Printable instances for Scala types, I’d probably prefer to include those in the companion object of Printable so that they were readily available in the implicit scope, but the exercise asks us explicitly to create a PrintableInstances object:

object PrintableInstances {
  implicit val stringPrintable: Printable[String] =
    new Printable[String] {
      def format(value: String): String = value
    }

  implicit val intPrintable: Printable[Int] =
    new Printable[Int] {
      def format(value: Int): String = value.toString
    }
}

The interface methods in the companion object of Printable can be defined as follows:

object Printable {
  def format[A](value: A)(implicit p: Printable[A]): String =
    p.format(value)

  def print[A](value: A)(implicit p: Printable[A]): Unit =
    println(p.format(value))
}

On the above, the print method could have relied on the format method directly, but I opted to not have the unnecessary call.

For the Cat example, we can define a Printable instance for that data type directly in its companion object:

final case class Cat(name: String, age: Int, color: String)

object Cat {
  implicit val printable: Printable[Cat] =
    new Printable[Cat] {
      import PrintableInstances._

      val sp = implicitly[Printable[String]]
      val ip = implicitly[Printable[Int]]

      def format(value: Cat): String = {
        val name = sp.format(value.name)
        val age = ip.format(value.age)
        val color = sp.format(value.color)
        s"$name is a $age year-old $color cat."
      }
    }
}

This allows us to use the Printable instance without explicit imports:

val garfield = Cat("Garfield", 41, "ginger and black")
Printable.print(garfield)
// Prints "Garfield is a 41 year-old ginger and black cat.".

For the extension methods, we can define the PrintableSyntax object as follows:

object PrintableSyntax {
  implicit class PrintableOps[A](val value: A) extends AnyVal {
    def format(implicit p: Printable[A]): String =
      p.format(value)

    def print(implicit p: Printable[A]): Unit =
      println(p.format(value))
  }
}

I have opted to use a value class for performance reasons, but for the purpose of this exercise it was likely unnecessary.

By importing PrintableSyntax._ we can now call print directly on our Cat instance:

import PrintableSyntax.__

val garfield = Cat("Garfield", 41, "ginger and black")
garfield.print
// Prints "Garfield is a 41 year-old ginger and black cat.".

Exercise 1.4.6: Cat Show

To implement the previous example using Show instead of Printable, we need to define an instance of Show for Cat. Similar to the approach taken before, we’re defining the instance directly in the companion object of Cat:

import cats.Show

final case class Cat(name: String, age: Int, color: String)

object Cat {
  implicit val show: Show[Cat] =
    new Show[Cat] {
      val stringShow = Show[String]
      val intShow = Show[Int]

      def show(t: Cat): String = {
        val name = stringShow.show(t.name)
        val age = intShow.show(t.age)
        val color = stringShow.show(t.color)
        s"$name is a $age year-old $color cat."
      }
    }
}

Cats implements summoners for the Show type class, so we no longer need to use implicitly.

This can be used as follows:

import cats.implicits._

val garfield = Cat("Garfield", 41, "ginger and black")
println(garfield.show)
// Prints "Garfield is a 41 year-old ginger and black cat.".

Cats doesn’t have an extension method to directly print an instance using its Show instance, so we’re using println with the value returned by the show call.

Exercise 1.5.5: Equality, Liberty, and Felinity

A possible Eq instance for Cat can be implemented as follows. Similar to the above, I’ve opted to include it in the companion object of Cat.

object Cat {
  implicit val eq: Eq[Cat] =
    new Eq[Cat] {
      val stringEq = Eq[String]
      val intEq = Eq[Int]

      def eqv(x: Cat, y: Cat): Boolean =
        stringEq.eqv(x.name, y.name) && intEq.eqv(x.age, y.age) && stringEq.eqv(x.color, y.color)
    }
}

We can now use it to compare Cat instances:

import cats.implicits._

val cat1 = Cat("Garfield", 38, "orange and black")
val cat2 = Cat("Heathcliff", 33, "orange and black")
val optionCat1 = Option(cat1)
val optionCat2 = Option.empty[Cat]

cat1 === cat2
// Returns false.

cat1 === cat1
// Returns true.

optionCat1 === optionCat2
// Returns false.

Using GitHub Packages for Scala

January 14, 2023

We use Scala at $WORK for multiple projects. These projects rely on various internal libraries. Being able to rely on built artifacts between projects in a way that is convenient for developers in different teams is a huge benefit.

The whole company uses GitHub to manage source code, so we have recently started using GitHub Packages to share Scala artifacts privately. After circumventing some quirks, it is actually a quite convenient way to share Scala (and other Maven) artifacts privately.

We use sbt as the build tool for all of our Scala projects, so the remainder of this post is written for sbt. It should be easy to adapt the instructions below to other build tools.

Setting Up Credentials to Authenticate with GitHub Packages

Authentication in GitHub Packages is done through personal access tokens. We can generate one in our GitHub personal settings. The token must have the read:packages (when we want to read packages from GitHub Packages) and the write:packages (when we want to write to GitHub Packages) permissions.

We can then set the credentials for sbt to be able to read them via the following, replacing <username> and <token> with our username and previously created token, respectively:

credentials += Credentials(
  "GitHub Package Registry",
  "maven.pkg.github.com",
  "<username>",
  "<token>")

The token is a password, so we should treat it as such. We shouldn’t commit this into our repositories, and ideally we have this set up in a global location that sbt has access to (like ~/.sbt/1.0/github-credentials.sbt).

Publishing an Artifact to GitHub Packages

When publishing artifacts in sbt, we always need to specify a repository where artifacts and descriptors are uploaded. In the case of GitHub Packages, every GitHub project provides a repository we can use to publish artifacts to. This means that, in sbt, we can define the location of our repository by setting the publishTo task key to something like the following:

publishTo := Some(
  "GitHub Package Registry (<project>)" at "https://maven.pkg.github.com/<org>/<project>"
)

In the snippet above, we should replace the <org> and <project> placeholders by the organization and project we want to publish to, respectively.

If our credentials are properly set up, this now allows us to run sbt publish and have our artifacts published to GitHub Packages. Note that packages in GitHub Packages are immutable, so we can’t directly replace a package with the same version. We can, however, delete an existing version in GitHub.

Downloading Artifacts from GitHub Packages

In order to download artifacts from GitHub Packages as dependencies of our projects we must set up the appropriate resolvers in our sbt build. For that purpose, we can set up the same location we mentioned previously when publishing artifacts:

resolvers += ("GitHub Package Registry" at "https://maven.pkg.github.com/<org>/<project>")

And then add the project as a regular library dependency:

libraryDependencies += "<org>" %% "<project>" % "<version>"

If credentials are properly set up, this now allows us to rely on GitHub Packages as a source of dependencies.

There is one slight inconvenience with the process suggested above, which is the fact that every project has its own resolver. When depending on multiple projects from the same organization, this can become cumbersome to manage, since every dependency would bring its own resolver. Fortunately, there’s a way to work around this and have an organization-wide resolver. The thing is that the <project> section of the resolver doesn’t need to exist, so we can reference some arbitrary repository, like _:

resolvers += ("GitHub Package Registry" at "https://maven.pkg.github.com/<org>/_")

This will give us access to packages published on any repository within the organization. The personal access token we use will control our access. If the token only has access to public repositories, then this resolver won’t allow access to private ones. If it does have access to private repositories, then all artifacts will be visible.

With this resolver in place, we have convenient access to all artifacts published within the organization.

Interacting with GitHub Packages in Automated Workflows

Using GitHub Packages in a pipeline of continuous integration or continuous delivery is also possible. There are various ways to manage this. One way is to rely on an environment variable that is populated with the contents of some secret that includes a personal access token with appropriate access. For that purpose, we can set up something like the following in our sbt build:

credentials ++= {
  val githubToken = System.getenv("GITHUB_TOKEN")
  if (githubToken == null) Seq.empty
  else Seq(Credentials("GitHub Package Registry", "maven.pkg.github.com", "_", githubToken))
}

With the above in place, builds of our project will look at the existence of a GITHUB_TOKEN environment variable and use it to set up the appropriate sbt crendentials. Note that the above uses _ as the username for the crendentials. This is doable because GitHub Packages doesn’t care about the actual username that is used, only if the token has appropriate access.

When using GitHub Actions, there’s always a GITHUB_TOKEN secret that has access to the repository where the action is executed, so we can reference that:

env:
  GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}

Note that if we need to fetch artifacts from other projects, we need to set up a personal access token with more permissions.

Managing Snapshot Versions

It is customary for Maven artifacts to have snapshot versions which are usually versioned as X.Y.Z-SNAPSHOT. These snapshots are usually mutable and new versions continuously replace the existing snapshot. This doesn’t play very well with GitHub Packages because versions there are immutable and you can’t easily replace one. It is possible to delete the existing one and publish again, but it is cumbersome.

To allow for snapshots while using GitHub Packages, we have started using sbt-dynver. sbt-dynver is an sbt plugin that dynamically sets the version of our projects from git. You can look at some details on how sbt-dynver sets the version, but, essentially, when there is a tag in the current tree, then the version of the project is the version specified in the tag and, when there is not a tag in the current tree, then the version of the project is a string built from the closest tag and the distance to that reference.

With sbt-dynver we can have snapshot-like versions with the version immutability that GitHub Packages provides.

Pricing

In terms of billing, we get a total amount of free storage and some amount of free data transfer per month. Anything above that incurs in $0.008 USD per GB of storage per day and $0.50 USD per GB of data transfer. One important note is that traffic using a GITHUB_TOKEN from within GitHub Actions is always free, regardless of where the runner is hosted.

In short, using GitHub Packages is a very convenient way to share Scala artifacts within a private organization, particularly if said organization already uses GitHub to manage their source code.

Using GitHub Actions to Publish This Website

September 11, 2022

I have recently moved this website from DreamHost to AWS. While I was able to automate the setup of the infrastructure, I was still deploying changes manually. It is not a very cumbersome process and it involves the following steps after a change is created:

  1. Build the website;
  2. Sync the new website contents with the main S3 bucket;
  3. Invalidate the cache of the non-www CloudFront distribution;
  4. Invalidate the cache of the www CloudFront distribution.

In its essence, this involves running the following 4 commands, in sequence:

$ bundle exec jekyll build
$ aws s3 sync _site/ s3://jcazevedo.net/ --delete
$ aws cloudfront create-invalidation --distribution-id E1M51KVTH60PJ5 --paths '/*'
$ aws cloudfront create-invalidation --distribution-id E2YP0O47Y4BTWK --paths '/*'

This is not terrible to run each time I introduce a new change, but it would be easier if I could make it so that every push to the master branch of the repository which holds the contents of the website would trigger a deploy. Fortunately we can use GitHub Actions for this.

Setting Up the GitHub Action

In order to set that up, we first need to create a workflow. Workflows live in the .github/workflows folder, and that is where I have created the deploy.yml file.

We start by giving the workflow a name:

name: Deploy

Then, we setup which actions trigger a workflow run. In this case, I want every push to the master branch to trigger it:

on:
  push:
    branches:
      - master

Following that, we can start defining our job. In this case, we need to specify in which environment the job should run and the list of steps that comprise it. We’re OK with running on the latest Ubuntu version:

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      (...)

To build the website, we need to have 3 steps: (1) checkout the repository, (2) setup ruby and install dependencies and (3) run bundle exec jekyll build:

- uses: actions/checkout@v3

- uses: ruby/setup-ruby@v1
  with:
    ruby-version: 3.0
    bundler-cache: true

- run: bundle exec jekyll build

Once the site is built, we need to publish it to S3 and invalidate the caches of the CloudFront distributions. The AWS Command Line Interface is already available in GitHub-hosted virtual environments, so we just need to set up the credentials we want to use. In this case, we want to reference some repository secrets which we will set up later:

- uses: aws-actions/configure-aws-credentials@v1
  with:
    aws-access-key-id: $
    aws-secret-access-key: $
    aws-region: us-east-1

With the credentials set up, we can run the commands we previously listed:

- run: aws s3 sync _site/ s3://jcazevedo.net/ --delete
- run: aws cloudfront create-invalidation --distribution-id E1M51KVTH60PJ5 --paths '/*'
- run: aws cloudfront create-invalidation --distribution-id E2YP0O47Y4BTWK --paths '/*'

The full YAML for the workflow definition is as follows:

name: Deploy

on:
  push:
    branches:
      - master

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - uses: ruby/setup-ruby@v1
        with:
          ruby-version: 3.0
          bundler-cache: true

      - run: bundle exec jekyll build

      - uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{secrets.AWS_ACCESS_KEY_ID}}
          aws-secret-access-key: ${{secrets.AWS_SECRET_ACCESS_KEY}}
          aws-region: us-east-1

      - run: aws s3 sync _site/ s3://jcazevedo.net/ --delete
      - run: aws cloudfront create-invalidation --distribution-id E1M51KVTH60PJ5 --paths '/*'
      - run: aws cloudfront create-invalidation --distribution-id E2YP0O47Y4BTWK --paths '/*'

Creating a User for GitHub Actions

To set up the credentials this workflow is going to use to interact with AWS, I wanted to create a user with permissions to interact with the relevant S3 bucket and CloudFront distributions only. To do that, I have added the following to the Terraform definition (refer to the previous post for more details on the existing Terraform definition):

resource "aws_iam_user" "github-actions" {
  name = "github-actions"
}

resource "aws_iam_access_key" "github-actions" {
  user = aws_iam_user.github-actions.name
}

output "github-actions_aws_iam_access_key_secret" {
  value = aws_iam_access_key.github-actions.secret
  sensitive = true
}

resource "aws_iam_user_policy" "github-actions" {
  name = "github-actions_policy"
  user = aws_iam_user.github-actions.name
  policy = data.aws_iam_policy_document.github-actions_policy.json
}

data "aws_iam_policy_document" "github-actions_policy" {
  statement {
    sid = "S3Access"

    actions = [
      "s3:PutBucketWebsite",
      "s3:PutObject",
      "s3:PutObjectAcl",
      "s3:GetObject",
      "s3:ListBucket",
      "s3:DeleteObject"
    ]

    resources = [
      "${aws_s3_bucket.jcazevedo_net.arn}",
      "${aws_s3_bucket.www_jcazevedo_net.arn}",
      "${aws_s3_bucket.jcazevedo_net.arn}/*",
      "${aws_s3_bucket.www_jcazevedo_net.arn}/*"
    ]
  }

  statement {
    sid = "CloudFrontAccess"

    actions = [
      "cloudfront:GetInvalidation",
      "cloudfront:CreateInvalidation"
    ]

    resources = [
      "${aws_cloudfront_distribution.root_s3_distribution.arn}",
      "${aws_cloudfront_distribution.www_s3_distribution.arn}"
    ]
  }
}

This creates a new IAM user, attaches a policy to it that gives it access to the relevant S3 and CloudFront resources, and creates a new access key which we will set up as a secret in our GitHub repository. The secret access key gets stored in the Terraform state, but we define an output that allows us to read it with terraform output -raw github-actions_aws_iam_access_key_secret.

With the GitHub secrets appropriately set up, we now have a workflow that publishes this website whenever a new commit is pushed to the master branch.

Migrating This Website to AWS

September 7, 2022

I’ve decided to migrate this website from DreamHost to Amazon Web Services. The main driver for this is costs. This is a static website (consisting of only HTML, CSS, and a small portion of JavaScript) which is very suitable to be hosted in S3. This is also a very low traffic website, well within S3’s free tier. Keeping up with my current setup, I wanted to retain the following:

I also wanted to keep everything managed from within AWS, so that I could get some infrastructure automation (Terraform) to help me with setting everything up. Keeping everything managed from within AWS meant that I had to have the jcazevedo.net domain transferred and have an SSL certificate provisioned by AWS (I was previously using Let’s Encrypt). I also didn’t mind downtime (again, this is a very low traffic website).

Setting Up Terraform

It wasn’t absolutely necessary to use Terraform (or any tool allowing for infrastructure as code) for this. I don’t predict wanting to have this infrastructure reproducible nor frequently modified. Still, it serves as documentation on what is set up on AWS, so I figured it would be a good idea.

The first step was to get Terraform set up and the providers defined. I didn’t want to keep Terraform’s state locally, so I decided to also use S3 as a state backend. I don’t need locks on the state (it’s only going to be me deploying this), so a single file on a S3 bucket would suffice. So, I created an _infra folder on the root of the directory tree of this website and placed a providers.tf file in it:

terraform {
  required_version = "~> 1.0.11"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
  backend "s3" {
    bucket = "jcazevedo-terraform-state"
    key    = "terraform.tfstate"
    region = "us-east-1"
  }
}

provider "aws" {
  region = "us-east-1"
}

The required version is set to 1.0.11 because that’s the one I’m currently using at $WORK. Setting up the state backend required manually creating a bucket for it. Using Terraform to manage that bucket would lead us to the problem of remotely managing state that we were trying to avoid with it in the first place.

With this set up, a call to terraform init should complete successfully.

Setting Up the S3 Bucket(s)

The next step was to set up the S3 buckets. I actually went with 2 buckets: one for the root domain (jcazevedo.net) and another for the www subdomain (www.jcazevedo.net). The reason for it was to set up a redirect on the www subdomain, which S3 supports. To set the buckets up, I created an s3.tf file under the _infra_ folder with the following contents:

resource "aws_s3_bucket" "jcazevedo_net" {
  bucket = "jcazevedo.net"
}

resource "aws_s3_bucket" "www_jcazevedo_net" {
  bucket = "www.jcazevedo.net"
}

resource "aws_s3_bucket_cors_configuration" "jcazevedo_net" {
  bucket = aws_s3_bucket.jcazevedo_net.id
  cors_rule {
    allowed_headers = ["Authorization", "Content-Length"]
    allowed_methods = ["GET", "POST"]
    allowed_origins = ["https://jcazevedo.net"]
    max_age_seconds = 3000
  }
}

resource "aws_s3_bucket_website_configuration" "jcazevedo_net" {
  bucket = aws_s3_bucket.jcazevedo_net.bucket
  index_document {
    suffix = "index.html"
  }
  error_document {
    key = "404/index.html"
  }
}

resource "aws_s3_bucket_website_configuration" "www_jcazevedo_net" {
  bucket = aws_s3_bucket.www_jcazevedo_net.bucket
  redirect_all_requests_to {
    host_name = "jcazevedo.net"
  }
}

resource "aws_s3_bucket_policy" "jcazevedo_net_allow_public_access" {
  bucket = aws_s3_bucket.jcazevedo_net.bucket
  policy = data.aws_iam_policy_document.jcazevedo_net_allow_public_access.json
}

resource "aws_s3_bucket_policy" "www_jcazevedo_net_allow_public_access" {
  bucket = aws_s3_bucket.www_jcazevedo_net.bucket
  policy = data.aws_iam_policy_document.www_jcazevedo_net_allow_public_access.json
}

data "aws_iam_policy_document" "jcazevedo_net_allow_public_access" {
  statement {
    principals {
      type        = "AWS"
      identifiers = ["*"]
    }
    actions   = ["s3:GetObject"]
    resources = ["${aws_s3_bucket.jcazevedo_net.arn}/*"]
  }
}

data "aws_iam_policy_document" "www_jcazevedo_net_allow_public_access" {
  statement {
    principals {
      type        = "AWS"
      identifiers = ["*"]
    }
    actions   = ["s3:GetObject"]
    resources = ["${aws_s3_bucket.www_jcazevedo_net.arn}/*"]
  }
}

Most of the configuration is the same for both buckets. We want both buckets to allow GET requests from the public. The main difference is in the website configuration. The root bucket specifies an index and error document (to be served in case of errors), whereas the www bucket just configures the redirection policy.

Once this was set up, I pushed this website contents to root S3 bucket by running a bundle exec jekyll build followed by an aws s3 sync _site/ s3://jcazevedo.net/ --delete. The site was already available via the root bucket website endpoint (http://jcazevedo.net.s3-website-us-east-1.amazonaws.com/) and the redirection was already working via the www bucket website endpoint (http://www.jcazevedo.net.s3-website-us-east-1.amazonaws.com/). At this point the domain wasn’t yet migrated, so this was still redirecting to the DreamHost instance.

Setting Up the Domain

I had never transferred a domain before, so I followed AWS instructions. This would take a while to complete, so I figured I would create the Route53 zone before and have the domain already transferred to the new zone. For that purpose I created a route53.tf file under the _infra folder with the following contents:

resource "aws_route53_zone" "jcazevedo_net" {
  name = "jcazevedo.net"
}

While the domain had its tranfer in progress, I proceeded to set up the SSL certificates.

Provisioning the SSL Certificates

I had to search how to define ACM certificates in Terraform and to intregate them with CloudFront. Fortunately, I found this blog post by Alex Hyett: Hosting a Secure Static Website on AWS S3 using Terraform (Step By Step Guide). The blog post covered pretty much what I had already done thus far, and was extremely helpful on the next steps: setting up SSL and the CloudFront distribution.

To set up SSL, I created the acm.tf file under the _infra folder with the following contents:

resource "aws_acm_certificate" "ssl_certificate" {
  domain_name               = "jcazevedo.net"
  subject_alternative_names = ["*.jcazevedo.net"]
  validation_method         = "EMAIL"
  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_acm_certificate_validation" "cert_validation" {
  certificate_arn = aws_acm_certificate.ssl_certificate.arn
}

For the validation method I used email instead of DNS since at that time I didn’t have the DNS moved yet. The email validation is performed while we’re applying the Terraform diff, so it’s quite fast.

Setting Up the CloudFront Distributions

CloudFront speeds up the distribution of static (and dynamic) web content. It can handle caching, compression and can require viewers to use HTTPS so that connections are encrypted. The previously mentioned blog post by Alex Hyett provided instructions to set CloudFront distributions pointing to existing S3 buckets and using HTTPS, so I almost blindly copied the Terraform definitions. Similar to what had been done before, we needed two distributions: one for the root bucket and one for the www bucket.

resource "aws_cloudfront_distribution" "root_s3_distribution" {
  origin {
    domain_name = aws_s3_bucket.jcazevedo_net.website_endpoint
    origin_id   = "S3-.jcazevedo.net"
    custom_origin_config {
      http_port              = 80
      https_port             = 443
      origin_protocol_policy = "http-only"
      origin_ssl_protocols   = ["TLSv1", "TLSv1.1", "TLSv1.2"]
    }
  }
  enabled             = true
  is_ipv6_enabled     = true
  default_root_object = "index.html"
  aliases             = ["jcazevedo.net"]
  custom_error_response {
    error_caching_min_ttl = 0
    error_code            = 404
    response_code         = 200
    response_page_path    = "/404/index.html"
  }
  default_cache_behavior {
    allowed_methods  = ["GET", "HEAD"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = "S3-.jcazevedo.net"
    forwarded_values {
      query_string = false
      cookies {
        forward = "none"
      }
    }
    viewer_protocol_policy = "redirect-to-https"
    min_ttl                = 31536000
    default_ttl            = 31536000
    max_ttl                = 31536000
    compress               = true
  }
  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }
  viewer_certificate {
    acm_certificate_arn      = aws_acm_certificate_validation.cert_validation.certificate_arn
    ssl_support_method       = "sni-only"
    minimum_protocol_version = "TLSv1.1_2016"
  }
}

resource "aws_cloudfront_distribution" "www_s3_distribution" {
  origin {
    domain_name = aws_s3_bucket.www_jcazevedo_net.website_endpoint
    origin_id   = "S3-www.jcazevedo.net"
    custom_origin_config {
      http_port              = 80
      https_port             = 443
      origin_protocol_policy = "http-only"
      origin_ssl_protocols   = ["TLSv1", "TLSv1.1", "TLSv1.2"]
    }
  }
  enabled         = true
  is_ipv6_enabled = true
  aliases         = ["www.jcazevedo.net"]
  default_cache_behavior {
    allowed_methods  = ["GET", "HEAD"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = "S3-www.jcazevedo.net"
    forwarded_values {
      query_string = true
      cookies {
        forward = "none"
      }
      headers = ["Origin"]
    }
    viewer_protocol_policy = "allow-all"
    min_ttl                = 0
    default_ttl            = 86400
    max_ttl                = 31536000
  }
  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }
  viewer_certificate {
    acm_certificate_arn      = aws_acm_certificate_validation.cert_validation.certificate_arn
    ssl_support_method       = "sni-only"
    minimum_protocol_version = "TLSv1.1_2016"
  }
}

The configurations are similar, except for the caching settings, since the second distribution only points to the S3 bucket that redirects to the non-www website.

Adding Route53 Records Pointing to the CloudFront Distributions

The last part of the process involved creating new Route53 A records pointing to the CloudFront distributions created previously. For this, I’ve added the following to the route53.tf file mentioned previously:

resource "aws_route53_record" "jcazevedo_net-a" {
  zone_id = aws_route53_zone.jcazevedo_net.zone_id
  name    = "jcazevedo.net"
  type    = "A"
  alias {
    name                   = aws_cloudfront_distribution.root_s3_distribution.domain_name
    zone_id                = aws_cloudfront_distribution.root_s3_distribution.hosted_zone_id
    evaluate_target_health = false
  }
}

resource "aws_route53_record" "www_jcazevedo_net-a" {
  zone_id = aws_route53_zone.jcazevedo_net.zone_id
  name    = "www.jcazevedo.net"
  type    = "A"
  alias {
    name                   = aws_cloudfront_distribution.www_s3_distribution.domain_name
    zone_id                = aws_cloudfront_distribution.www_s3_distribution.hosted_zone_id
    evaluate_target_health = false
  }
}

There’s one record for each of the distributions (www and non-www).

Waiting for the Domain Transfer

The domain transfer from DreamHost to Route53 took around 8 days. I was notified by email when it was completed. Since everything was already pre-configured and the website contents had already been pushed to S3, the website continued to be served as expected from https://jcazevedo.net/.