Using GitHub Actions to Publish This Website

September 11, 2022

I have recently moved this website from DreamHost to AWS. While I was able to automate the setup of the infrastructure, I was still deploying changes manually. It is not a very cumbersome process and it involves the following steps after a change is created:

  1. Build the website;
  2. Sync the new website contents with the main S3 bucket;
  3. Invalidate the cache of the non-www CloudFront distribution;
  4. Invalidate the cache of the www CloudFront distribution.

In its essence, this involves running the following 4 commands, in sequence:

$ bundle exec jekyll build
$ aws s3 sync _site/ s3://jcazevedo.net/ --delete
$ aws cloudfront create-invalidation --distribution-id E1M51KVTH60PJ5 --paths '/*'
$ aws cloudfront create-invalidation --distribution-id E2YP0O47Y4BTWK --paths '/*'

This is not terrible to run each time I introduce a new change, but it would be easier if I could make it so that every push to the master branch of the repository which holds the contents of the website would trigger a deploy. Fortunately we can use GitHub Actions for this.

Setting Up the GitHub Action

In order to set that up, we first need to create a workflow. Workflows live in the .github/workflows folder, and that is where I have created the deploy.yml file.

We start by giving the workflow a name:

name: Deploy

Then, we setup which actions trigger a workflow run. In this case, I want every push to the master branch to trigger it:

on:
  push:
    branches:
      - master

Following that, we can start defining our job. In this case, we need to specify in which environment the job should run and the list of steps that comprise it. We’re OK with running on the latest Ubuntu version:

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      (...)

To build the website, we need to have 3 steps: (1) checkout the repository, (2) setup ruby and install dependencies and (3) run bundle exec jekyll build:

- uses: actions/checkout@v3

- uses: ruby/setup-ruby@v1
  with:
    ruby-version: 3.0
    bundler-cache: true

- run: bundle exec jekyll build

Once the site is built, we need to publish it to S3 and invalidate the caches of the CloudFront distributions. The AWS Command Line Interface is already available in GitHub-hosted virtual environments, so we just need to set up the credentials we want to use. In this case, we want to reference some repository secrets which we will set up later:

- uses: aws-actions/configure-aws-credentials@v1
  with:
    aws-access-key-id: $
    aws-secret-access-key: $
    aws-region: us-east-1

With the credentials set up, we can run the commands we previously listed:

- run: aws s3 sync _site/ s3://jcazevedo.net/ --delete
- run: aws cloudfront create-invalidation --distribution-id E1M51KVTH60PJ5 --paths '/*'
- run: aws cloudfront create-invalidation --distribution-id E2YP0O47Y4BTWK --paths '/*'

The full YAML for the workflow definition is as follows:

name: Deploy

on:
  push:
    branches:
      - master

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - uses: ruby/setup-ruby@v1
        with:
          ruby-version: 3.0
          bundler-cache: true

      - run: bundle exec jekyll build

      - uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: $
          aws-secret-access-key: $
          aws-region: us-east-1

      - run: aws s3 sync _site/ s3://jcazevedo.net/ --delete
      - run: aws cloudfront create-invalidation --distribution-id E1M51KVTH60PJ5 --paths '/*'
      - run: aws cloudfront create-invalidation --distribution-id E2YP0O47Y4BTWK --paths '/*'

Creating a User for GitHub Actions

To set up the credentials this workflow is going to use to interact with AWS, I wanted to create a user with permissions to interact with the relevant S3 bucket and CloudFront distributions only. To do that, I have added the following to the Terraform definition (refer to the previous post for more details on the existing Terraform definition):

resource "aws_iam_user" "github-actions" {
  name = "github-actions"
}

resource "aws_iam_access_key" "github-actions" {
  user = aws_iam_user.github-actions.name
}

output "github-actions_aws_iam_access_key_secret" {
  value = aws_iam_access_key.github-actions.secret
  sensitive = true
}

resource "aws_iam_user_policy" "github-actions" {
  name = "github-actions_policy"
  user = aws_iam_user.github-actions.name
  policy = data.aws_iam_policy_document.github-actions_policy.json
}

data "aws_iam_policy_document" "github-actions_policy" {
  statement {
    sid = "S3Access"

    actions = [
      "s3:PutBucketWebsite",
      "s3:PutObject",
      "s3:PutObjectAcl",
      "s3:GetObject",
      "s3:ListBucket",
      "s3:DeleteObject"
    ]

    resources = [
      "${aws_s3_bucket.jcazevedo_net.arn}",
      "${aws_s3_bucket.www_jcazevedo_net.arn}",
      "${aws_s3_bucket.jcazevedo_net.arn}/*",
      "${aws_s3_bucket.www_jcazevedo_net.arn}/*"
    ]
  }

  statement {
    sid = "CloudFrontAccess"

    actions = [
      "cloudfront:GetInvalidation",
      "cloudfront:CreateInvalidation"
    ]

    resources = [
      "${aws_cloudfront_distribution.root_s3_distribution.arn}",
      "${aws_cloudfront_distribution.www_s3_distribution.arn}"
    ]
  }
}

This creates a new IAM user, attaches a policy to it that gives it access to the relevant S3 and CloudFront resources, and creates a new access key which we will set up as a secret in our GitHub repository. The secret access key gets stored in the Terraform state, but we define an output that allows us to read it with terraform output -raw github-actions_aws_iam_access_key_secret.

With the GitHub secrets appropriately set up, we now have a workflow that publishes this website whenever a new commit is pushed to the master branch.

Migrating This Website to AWS

September 7, 2022

I’ve decided to migrate this website from DreamHost to Amazon Web Services. The main driver for this is costs. This is a static website (consisting of only HTML, CSS, and a small portion of JavaScript) which is very suitable to be hosted in S3. This is also a very low traffic website, well within S3’s free tier. Keeping up with my current setup, I wanted to retain the following:

I also wanted to keep everything managed from within AWS, so that I could get some infrastructure automation (Terraform) to help me with setting everything up. Keeping everything managed from within AWS meant that I had to have the jcazevedo.net domain transferred and have an SSL certificate provisioned by AWS (I was previously using Let’s Encrypt). I also didn’t mind downtime (again, this is a very low traffic website).

Setting Up Terraform

It wasn’t absolutely necessary to use Terraform (or any tool allowing for infrastructure as code) for this. I don’t predict wanting to have this infrastructure reproducible nor frequently modified. Still, it serves as documentation on what is set up on AWS, so I figured it would be a good idea.

The first step was to get Terraform set up and the providers defined. I didn’t want to keep Terraform’s state locally, so I decided to also use S3 as a state backend. I don’t need locks on the state (it’s only going to be me deploying this), so a single file on a S3 bucket would suffice. So, I created an _infra folder on the root of the directory tree of this website and placed a providers.tf file in it:

terraform {
  required_version = "~> 1.0.11"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
  backend "s3" {
    bucket = "jcazevedo-terraform-state"
    key    = "terraform.tfstate"
    region = "us-east-1"
  }
}

provider "aws" {
  region = "us-east-1"
}

The required version is set to 1.0.11 because that’s the one I’m currently using at $WORK. Setting up the state backend required manually creating a bucket for it. Using Terraform to manage that bucket would lead us to the problem of remotely managing state that we were trying to avoid with it in the first place.

With this set up, a call to terraform init should complete successfully.

Setting Up the S3 Bucket(s)

The next step was to set up the S3 buckets. I actually went with 2 buckets: one for the root domain (jcazevedo.net) and another for the www subdomain (www.jcazevedo.net). The reason for it was to set up a redirect on the www subdomain, which S3 supports. To set the buckets up, I created an s3.tf file under the _infra_ folder with the following contents:

resource "aws_s3_bucket" "jcazevedo_net" {
  bucket = "jcazevedo.net"
}

resource "aws_s3_bucket" "www_jcazevedo_net" {
  bucket = "www.jcazevedo.net"
}

resource "aws_s3_bucket_cors_configuration" "jcazevedo_net" {
  bucket = aws_s3_bucket.jcazevedo_net.id
  cors_rule {
    allowed_headers = ["Authorization", "Content-Length"]
    allowed_methods = ["GET", "POST"]
    allowed_origins = ["https://jcazevedo.net"]
    max_age_seconds = 3000
  }
}

resource "aws_s3_bucket_website_configuration" "jcazevedo_net" {
  bucket = aws_s3_bucket.jcazevedo_net.bucket
  index_document {
    suffix = "index.html"
  }
  error_document {
    key = "404/index.html"
  }
}

resource "aws_s3_bucket_website_configuration" "www_jcazevedo_net" {
  bucket = aws_s3_bucket.www_jcazevedo_net.bucket
  redirect_all_requests_to {
    host_name = "jcazevedo.net"
  }
}

resource "aws_s3_bucket_policy" "jcazevedo_net_allow_public_access" {
  bucket = aws_s3_bucket.jcazevedo_net.bucket
  policy = data.aws_iam_policy_document.jcazevedo_net_allow_public_access.json
}

resource "aws_s3_bucket_policy" "www_jcazevedo_net_allow_public_access" {
  bucket = aws_s3_bucket.www_jcazevedo_net.bucket
  policy = data.aws_iam_policy_document.www_jcazevedo_net_allow_public_access.json
}

data "aws_iam_policy_document" "jcazevedo_net_allow_public_access" {
  statement {
    principals {
      type        = "AWS"
      identifiers = ["*"]
    }
    actions   = ["s3:GetObject"]
    resources = ["${aws_s3_bucket.jcazevedo_net.arn}/*"]
  }
}

data "aws_iam_policy_document" "www_jcazevedo_net_allow_public_access" {
  statement {
    principals {
      type        = "AWS"
      identifiers = ["*"]
    }
    actions   = ["s3:GetObject"]
    resources = ["${aws_s3_bucket.www_jcazevedo_net.arn}/*"]
  }
}

Most of the configuration is the same for both buckets. We want both buckets to allow GET requests from the public. The main difference is in the website configuration. The root bucket specifies an index and error document (to be served in case of errors), whereas the www bucket just configures the redirection policy.

Once this was set up, I pushed this website contents to root S3 bucket by running a bundle exec jekyll build followed by an aws s3 sync _site/ s3://jcazevedo.net/ --delete. The site was already available via the root bucket website endpoint (http://jcazevedo.net.s3-website-us-east-1.amazonaws.com/) and the redirection was already working via the www bucket website endpoint (http://www.jcazevedo.net.s3-website-us-east-1.amazonaws.com/). At this point the domain wasn’t yet migrated, so this was still redirecting to the DreamHost instance.

Setting Up the Domain

I had never transferred a domain before, so I followed AWS instructions. This would take a while to complete, so I figured I would create the Route53 zone before and have the domain already transferred to the new zone. For that purpose I created a route53.tf file under the _infra folder with the following contents:

resource "aws_route53_zone" "jcazevedo_net" {
  name = "jcazevedo.net"
}

While the domain had its tranfer in progress, I proceeded to set up the SSL certificates.

Provisioning the SSL Certificates

I had to search how to define ACM certificates in Terraform and to intregate them with CloudFront. Fortunately, I found this blog post by Alex Hyett: Hosting a Secure Static Website on AWS S3 using Terraform (Step By Step Guide). The blog post covered pretty much what I had already done thus far, and was extremely helpful on the next steps: setting up SSL and the CloudFront distribution.

To set up SSL, I created the acm.tf file under the _infra folder with the following contents:

resource "aws_acm_certificate" "ssl_certificate" {
  domain_name               = "jcazevedo.net"
  subject_alternative_names = ["*.jcazevedo.net"]
  validation_method         = "EMAIL"
  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_acm_certificate_validation" "cert_validation" {
  certificate_arn = aws_acm_certificate.ssl_certificate.arn
}

For the validation method I used email instead of DNS since at that time I didn’t have the DNS moved yet. The email validation is performed while we’re applying the Terraform diff, so it’s quite fast.

Setting Up the CloudFront Distributions

CloudFront speeds up the distribution of static (and dynamic) web content. It can handle caching, compression and can require viewers to use HTTPS so that connections are encrypted. The previously mentioned blog post by Alex Hyett provided instructions to set CloudFront distributions pointing to existing S3 buckets and using HTTPS, so I almost blindly copied the Terraform definitions. Similar to what had been done before, we needed two distributions: one for the root bucket and one for the www bucket.

resource "aws_cloudfront_distribution" "root_s3_distribution" {
  origin {
    domain_name = aws_s3_bucket.jcazevedo_net.website_endpoint
    origin_id   = "S3-.jcazevedo.net"
    custom_origin_config {
      http_port              = 80
      https_port             = 443
      origin_protocol_policy = "http-only"
      origin_ssl_protocols   = ["TLSv1", "TLSv1.1", "TLSv1.2"]
    }
  }
  enabled             = true
  is_ipv6_enabled     = true
  default_root_object = "index.html"
  aliases             = ["jcazevedo.net"]
  custom_error_response {
    error_caching_min_ttl = 0
    error_code            = 404
    response_code         = 200
    response_page_path    = "/404/index.html"
  }
  default_cache_behavior {
    allowed_methods  = ["GET", "HEAD"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = "S3-.jcazevedo.net"
    forwarded_values {
      query_string = false
      cookies {
        forward = "none"
      }
    }
    viewer_protocol_policy = "redirect-to-https"
    min_ttl                = 31536000
    default_ttl            = 31536000
    max_ttl                = 31536000
    compress               = true
  }
  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }
  viewer_certificate {
    acm_certificate_arn      = aws_acm_certificate_validation.cert_validation.certificate_arn
    ssl_support_method       = "sni-only"
    minimum_protocol_version = "TLSv1.1_2016"
  }
}

resource "aws_cloudfront_distribution" "www_s3_distribution" {
  origin {
    domain_name = aws_s3_bucket.www_jcazevedo_net.website_endpoint
    origin_id   = "S3-www.jcazevedo.net"
    custom_origin_config {
      http_port              = 80
      https_port             = 443
      origin_protocol_policy = "http-only"
      origin_ssl_protocols   = ["TLSv1", "TLSv1.1", "TLSv1.2"]
    }
  }
  enabled         = true
  is_ipv6_enabled = true
  aliases         = ["www.jcazevedo.net"]
  default_cache_behavior {
    allowed_methods  = ["GET", "HEAD"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = "S3-www.jcazevedo.net"
    forwarded_values {
      query_string = true
      cookies {
        forward = "none"
      }
      headers = ["Origin"]
    }
    viewer_protocol_policy = "allow-all"
    min_ttl                = 0
    default_ttl            = 86400
    max_ttl                = 31536000
  }
  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }
  viewer_certificate {
    acm_certificate_arn      = aws_acm_certificate_validation.cert_validation.certificate_arn
    ssl_support_method       = "sni-only"
    minimum_protocol_version = "TLSv1.1_2016"
  }
}

The configurations are similar, except for the caching settings, since the second distribution only points to the S3 bucket that redirects to the non-www website.

Adding Route53 Records Pointing to the CloudFront Distributions

The last part of the process involved creating new Route53 A records pointing to the CloudFront distributions created previously. For this, I’ve added the following to the route53.tf file mentioned previously:

resource "aws_route53_record" "jcazevedo_net-a" {
  zone_id = aws_route53_zone.jcazevedo_net.zone_id
  name    = "jcazevedo.net"
  type    = "A"
  alias {
    name                   = aws_cloudfront_distribution.root_s3_distribution.domain_name
    zone_id                = aws_cloudfront_distribution.root_s3_distribution.hosted_zone_id
    evaluate_target_health = false
  }
}

resource "aws_route53_record" "www_jcazevedo_net-a" {
  zone_id = aws_route53_zone.jcazevedo_net.zone_id
  name    = "www.jcazevedo.net"
  type    = "A"
  alias {
    name                   = aws_cloudfront_distribution.www_s3_distribution.domain_name
    zone_id                = aws_cloudfront_distribution.www_s3_distribution.hosted_zone_id
    evaluate_target_health = false
  }
}

There’s one record for each of the distributions (www and non-www).

Waiting for the Domain Transfer

The domain transfer from DreamHost to Route53 took around 8 days. I was notified by email when it was completed. Since everything was already pre-configured and the website contents had already been pushed to S3, the website continued to be served as expected from https://jcazevedo.net/.

September LeetCoding Challenge, Day 13: Insert Interval

December 7, 2020

This is part of a series of posts about the September LeetCoding Challenge. Check the first post for more information.

I got mildly bored of writing these blog posts for the September LeetCoding Challenge, hence this huge gap in days between the last post and this one. I continued solving the problems, and LeetCode continued to put up challenges for the following months. Let’s see if I can at least complete the series of posts for September.

The problem for September 13 is Insert Interval. We are given a set of non-overlapping intervals, represented by their start and end points, and we are asked to return a new set of non-overlapping intervals that results from merging a new given interval to the existing set. We are also told that the original set of non-overlapping intervals is sorted according to their start point.

This problem can be solved in \(O(n)\) by iterating through the list of intervals and keeping track of an interval to merge (which originally is the new interval). Whenever we see a new interval, we decide whether we want to include it as is in the final set or merge with the new interval (if it overlaps). The following is an implementation of that idea:

class Solution {
public:
  vector<vector<int>>
  insert(vector<vector<int>>& intervals, vector<int>& newInterval) {
    vector<vector<int>> result;
    for (vector<int> interval : intervals) {
      if (interval[1] < newInterval[0]) {
        result.push_back(interval);
      } else if (interval[0] > newInterval[1]) {
        result.push_back(newInterval);
        newInterval = interval;
      } else if (interval[1] >= newInterval[0] ||
                 interval[0] <= newInterval[1]) {
        newInterval = {min(interval[0], newInterval[0]),
                       max(interval[1], newInterval[1])};
      }
    }
    result.push_back(newInterval);
    return result;
  }
};

In order to simplify the logic of handling the fact that we’ve gone past the new interval to insert, the previous solution keeps replacing the interval to merge with the current interval once the starting point of the intervals exceeds the ending point of the interval to merge. This allows us to always push the interval to merge at the end, without having special considerations on whether or not it should be included. An alternative approach, but probably more error-prone to implement, would be to keep track if the merged interval had already been inserted or not.

September LeetCoding Challenge, Day 12: Combination Sum III

September 13, 2020

This is part of a series of posts about the September LeetCoding Challenge. Check the first post for more information.

The problem for September 12 is Combination Sum III. We are interested in finding all possible distinct combinations of \(k\) numbers that add up to a number \(n\), given that only numbers from \(1\) to \(9\) can be used and each combination should be a unique set of numbers. Since you can only use numbers from \(1\) to \(9\), both \(k\) and \(n\) can’t be negative; \(k\) is at most \(9\) and \(n\) is at most \(\sum_{i=1}^{9} i = \frac{9 \times (9 + 1)}{2} = 45\).

An exhaustive search using a DFS is possible given these limits, so it looks like a good candidate for a solution. The following is an implementation of that:

class Solution {
private:
  vector<vector<int>> ans;
  vector<int> next;

  void dfs(int curr_sum, int next_num, int n_nums, int k, int n) {
    if (n_nums == k) {
      if (curr_sum == n)
        ans.push_back(next);
      return;
    }
    for (int i = next_num; i <= 9; ++i) {
      if (curr_sum + i > n)
        continue;
      next.push_back(i);
      dfs(curr_sum + i, i + 1, n_nums + 1, k, n);
      next.pop_back();
    }
  }

public:
  vector<vector<int>> combinationSum3(int k, int n) {
    ans.clear();
    next.clear();
    dfs(0, 1, 0, k, n);
    return ans;
  }
};

September LeetCoding Challenge, Day 11: Maximum Product Subarray

September 13, 2020

This is part of a series of posts about the September LeetCoding Challenge. Check the first post for more information.

The problem for September 11 is Maximum Product Subarray. You’re given a non-empty array of integers nums and are interested in finding the contiguous non-empty subarray which has the largest product. You’re not given any limits on the size of nums.

Since we’re not given any limits on the size of nums, I didn’t assume that the naive algorithm of checking all possible subarrays in \(\mathcal{O}(n^3)\) (possibly reducing to \(\mathcal{O}(n^2)\) using dynamic programming) would work. Instead, I tried to find an algorithm whose time complexity would match the theoretical lower bound of \(\mathcal{O}(n)\). Some relevant observations are:

  • If the array has a single element, then the maximum product will be that element;
  • If the array has more than one element, then the maximum product subarray will contain an even number of negative integers, so that it is either positive or 0;
  • If there’s at least one non-empty subarray without a 0 with an even number of negative integers, then the maximum product will never be 0.

Based on the previous observations, we can derive an \(\mathcal{O}(n)\) algorithm to solve this problem. The idea is to iterate through all the values of nums, keeping track of, for the product of a non-empty subarray ending at that number, the largest possible positive number and the smallest possible negative number. When visiting a new number, if the number is 0, we reset both these values. If the number is positive, both the largest positive number and smallest negative number are multiplied by that number. If the number is negative, the largest possible positive number becomes the multiplication of the smallest negative number with that number, and vice-versa for the smallest possible negative number. The largest positive number at each iteration (or 0) is the largest product of a non-empty subarray ending at that number. The following is an implementation of that idea:

class Solution {
public:
  int maxProduct(vector<int>& nums) {
    int best_neg = -1;
    int best_pos = -1;
    int ans = nums[0];
    if (nums[0] < 0)
      best_neg = -nums[0];
    else if (nums[0] > 0)
      best_pos = nums[0];
    else {
      best_neg = -1;
      best_pos = -1;
    }
    int N = nums.size();
    for (int i = 1; i < N; ++i) {
      if (nums[i] < 0) {
        int prev_best_pos = best_pos;
        int prev_best_neg = best_neg;
        if (prev_best_neg != -1)
          best_pos = prev_best_neg * abs(nums[i]);
        else
          best_pos = -1;
        if (prev_best_pos != -1)
          best_neg = prev_best_pos * abs(nums[i]);
        else
          best_neg = abs(nums[i]);
      }
      if (nums[i] == 0) {
        best_neg = -1;
        best_pos = -1;
        ans = max(ans, 0);
      }
      if (nums[i] > 0) {
        if (best_pos != -1)
          best_pos = best_pos * nums[i];
        else
          best_pos = nums[i];
        if (best_neg != -1)
          best_neg = best_neg * nums[i];
      }
      ans = max(ans, best_pos);
    }
    return ans;
  }
};