For this exercise, rather than defining instances for the proposed types, I
defined instances for Cats’ Monoid directly. For that purpose, we need to
import cats.Monoid.
For the Boolean type, we can define 4 monoid instances. The first is boolean
or, with combine being equal to the application of the || operator and
empty being false:
The fourth is boolean exclusive nor (the negation of exclusive or), with
combine being equal to the negation of the application of the ^ operator and
empty being true:
To convince ourselves that the monoid laws hold for the proposed monoids, we can
verify them on all instances of Boolean values. Since they’re only 2 (true
and false), it’s easy to check them all:
Set intersection only forms a semigroup for sets, since we can’t define an
identity element for the general case. In theory, the identity element would be
the set including all instances of the type of elements in the set, but in
practice we can’t produce that for a generic type A:
The exercise is clearly hinting us towards using a monoid, but the first step
can be defined in terms of Int only. The description doesn’t tell us what we
should do in case of an empty list, but, since we’re in a chapter about monoids,
I assume we want to return the identity element:
Changing the code above to also work with Option[Int] and making sure there is
no code duplication can be achieved by introducing a dependency on a Monoid
instance:
With the above in place we continue to be able to add Ints, but we’re also now
able to add Option[Int]s, provided we have the appropriate Monoid instances
in place:
To be able to add Order instances without making any modifications to add,
we can define a Monoid instance for Order. In this case, we’re piggybacking
on the Monoid instance for Double, but we could’ve implemented the sums and
the production of the identity element directly:
I solved the exercises in a sandbox Scala project that has Cats as a
dependency. The book recommends using a Giter8 template, so that’s what I used:
$ sbt new scalawithcats/cast-seed.g8
The above command generates (at the time of writing) a minimal project with the
following build.sbt file:
name:="scala-with-cats"version:="0.0.1-SNAPSHOT"scalaVersion:="2.13.8"libraryDependencies+="org.typelevel"%%"cats-core"%"2.8.0"// scalac options come from the sbt-tpolecat plugin so need to set any hereaddCompilerPlugin("org.typelevel"%%"kind-projector"%"0.13.2"crossCrossVersion.full)
The above differs a bit from what the book lists, since there are both new Scala
2.13 and Cats versions out already, but I followed along using these settings
with minimal issues.
Exercise 1.3: Printable Library
The definition of the Printable type class can be as follows:
traitPrintable[A]{defformat(value:A):String}
In terms of defining the Printable instances for Scala types, I’d probably
prefer to include those in the companion object of Printable so that they were
readily available in the implicit scope, but the exercise asks us explicitly to
create a PrintableInstances object:
On the above, the print method could have relied on the format method
directly, but I opted to not have the unnecessary call.
For the Cat example, we can define a Printable instance for that data type
directly in its companion object:
finalcaseclassCat(name:String,age:Int,color:String)objectCat{implicitvalprintable:Printable[Cat]=newPrintable[Cat]{importPrintableInstances._valsp=implicitly[Printable[String]]valip=implicitly[Printable[Int]]defformat(value:Cat):String={valname=sp.format(value.name)valage=ip.format(value.age)valcolor=sp.format(value.color)s"$name is a $age year-old $color cat."}}}
This allows us to use the Printable instance without explicit imports:
valgarfield=Cat("Garfield",41,"ginger and black")Printable.print(garfield)// Prints "Garfield is a 41 year-old ginger and black cat.".
For the extension methods, we can define the PrintableSyntax object as
follows:
I have opted to use a value class for performance reasons, but for the purpose
of this exercise it was likely unnecessary.
By importing PrintableSyntax._ we can now call print directly on our Cat
instance:
importPrintableSyntax.__valgarfield=Cat("Garfield",41,"ginger and black")garfield.print// Prints "Garfield is a 41 year-old ginger and black cat.".
Exercise 1.4.6: Cat Show
To implement the previous example using Show instead of Printable, we need
to define an instance of Show for Cat. Similar to the approach taken before,
we’re defining the instance directly in the companion object of Cat:
importcats.ShowfinalcaseclassCat(name:String,age:Int,color:String)objectCat{implicitvalshow:Show[Cat]=newShow[Cat]{valstringShow=Show[String]valintShow=Show[Int]defshow(t:Cat):String={valname=stringShow.show(t.name)valage=intShow.show(t.age)valcolor=stringShow.show(t.color)s"$name is a $age year-old $color cat."}}}
Cats implements summoners for the Show type class, so we no longer need to use
implicitly.
This can be used as follows:
importcats.implicits._valgarfield=Cat("Garfield",41,"ginger and black")println(garfield.show)// Prints "Garfield is a 41 year-old ginger and black cat.".
Cats doesn’t have an extension method to directly print an instance using its
Show instance, so we’re using println with the value returned by the show
call.
Exercise 1.5.5: Equality, Liberty, and Felinity
A possible Eq instance for Cat can be implemented as follows. Similar to the
above, I’ve opted to include it in the companion object of Cat.
We use Scala at $WORK for multiple projects. These projects rely on various
internal libraries. Being able to rely on built artifacts between projects in a
way that is convenient for developers in different teams is a huge benefit.
The whole company uses GitHub to manage source code, so we have
recently started using GitHub Packages to share Scala
artifacts privately. After circumventing some quirks, it is actually a quite
convenient way to share Scala (and other Maven) artifacts privately.
We use sbt as the build tool for all of our Scala projects, so the
remainder of this post is written for sbt. It should be easy to adapt the
instructions below to other build tools.
Setting Up Credentials to Authenticate with GitHub Packages
Authentication in GitHub Packages is done through personal access tokens. We can
generate one in our GitHub personal settings. The token must
have the read:packages (when we want to read packages from GitHub Packages)
and the write:packages (when we want to write to GitHub Packages) permissions.
We can then set the credentials for sbt to be able to read them via the
following, replacing <username> and <token> with our username and previously
created token, respectively:
The token is a password, so we should treat it as such. We shouldn’t commit this
into our repositories, and ideally we have this set up in a global location that
sbt has access to (like ~/.sbt/1.0/github-credentials.sbt).
Publishing an Artifact to GitHub Packages
When publishing artifacts in sbt, we always need to specify a repository where
artifacts and descriptors are uploaded. In the case of GitHub Packages, every
GitHub project provides a repository we can use to publish artifacts to. This
means that, in sbt, we can define the location of our repository by setting the
publishTo task key to something like the following:
In the snippet above, we should replace the <org> and <project> placeholders
by the organization and project we want to publish to, respectively.
If our credentials are properly set up, this now allows us to run sbt publish
and have our artifacts published to GitHub Packages. Note that packages in
GitHub Packages are immutable, so we can’t directly replace a package with the
same version. We can, however, delete an existing version in GitHub.
Downloading Artifacts from GitHub Packages
In order to download artifacts from GitHub Packages as dependencies of our
projects we must set up the appropriate resolvers in our sbt build. For that
purpose, we can set up the same location we mentioned previously when publishing
artifacts:
If credentials are properly set up, this now allows us to rely on GitHub
Packages as a source of dependencies.
There is one slight inconvenience with the process suggested above, which is the
fact that every project has its own resolver. When depending on multiple
projects from the same organization, this can become cumbersome to manage, since
every dependency would bring its own resolver. Fortunately, there’s a way to
work around this and have an organization-wide resolver. The thing is that the
<project> section of the resolver doesn’t need to exist, so we can reference
some arbitrary repository, like _:
This will give us access to packages published on any repository within the
organization. The personal access token we use will control our access. If the
token only has access to public repositories, then this resolver won’t allow
access to private ones. If it does have access to private repositories, then all
artifacts will be visible.
With this resolver in place, we have convenient access to all artifacts
published within the organization.
Interacting with GitHub Packages in Automated Workflows
Using GitHub Packages in a pipeline of continuous integration or continuous
delivery is also possible. There are various ways to manage this. One way is to
rely on an environment variable that is populated with the contents of some
secret that includes a personal access token with appropriate access. For that
purpose, we can set up something like the following in our sbt build:
With the above in place, builds of our project will look at the existence of a
GITHUB_TOKEN environment variable and use it to set up the appropriate sbt
crendentials. Note that the above uses _ as the username for the crendentials.
This is doable because GitHub Packages doesn’t care about the actual username
that is used, only if the token has appropriate access.
When using GitHub Actions, there’s always a GITHUB_TOKEN
secret that has access to the repository where the action is executed, so we can
reference that:
env:GITHUB_TOKEN:${{secrets.GITHUB_TOKEN}}
Note that if we need to fetch artifacts from other projects, we need to set up a
personal access token with more permissions.
Managing Snapshot Versions
It is customary for Maven artifacts to have snapshot versions which are usually
versioned as X.Y.Z-SNAPSHOT. These snapshots are usually mutable and new
versions continuously replace the existing snapshot. This doesn’t play very well
with GitHub Packages because versions there are immutable and you can’t easily
replace one. It is possible to delete the existing one and publish again, but it
is cumbersome.
To allow for snapshots while using GitHub Packages, we have started using
sbt-dynver. sbt-dynver is an sbt plugin that dynamically sets the
version of our projects from git. You can look at some details on how
sbt-dynver sets the version, but, essentially, when there
is a tag in the current tree, then the version of the project is the version
specified in the tag and, when there is not a tag in the current tree, then the
version of the project is a string built from the closest tag and the distance
to that reference.
With sbt-dynver we can have snapshot-like versions with the version immutability
that GitHub Packages provides.
Pricing
In terms of billing, we get a total amount of free
storage and some amount of free data transfer per month. Anything above that
incurs in $0.008 USD per GB of storage per day and $0.50 USD per GB of data
transfer. One important note is that traffic using a GITHUB_TOKEN from within
GitHub Actions is always free, regardless of where the runner is hosted.
In short, using GitHub Packages is a very convenient way to share Scala
artifacts within a private organization, particularly if said organization
already uses GitHub to manage their source code.
I have recently moved
this website from DreamHost to AWS. While I was able to
automate the setup of the infrastructure, I was still deploying changes
manually. It is not a very cumbersome process and it involves the following
steps after a change is created:
Build the website;
Sync the new website contents with the main S3 bucket;
Invalidate the cache of the non-www CloudFront distribution;
Invalidate the cache of the www CloudFront distribution.
In its essence, this involves running the following 4 commands, in sequence:
This is not terrible to run each time I introduce a new change, but it would be
easier if I could make it so that every push to the master branch of the
repository which holds the contents of the website would trigger
a deploy. Fortunately we can use GitHub Actions for this.
Setting Up the GitHub Action
In order to set that up, we first need to create a workflow. Workflows live in
the .github/workflows folder, and that is where I have created the
deploy.yml file.
We start by giving the workflow a name:
name:Deploy
Then, we setup which actions trigger a workflow run. In this case, I want every
push to the master branch to trigger it:
on:push:branches:-master
Following that, we can start defining our job. In this case, we need to specify
in which environment the job should run and the list of steps that comprise it.
We’re OK with running on the latest Ubuntu version:
jobs:deploy:runs-on:ubuntu-lateststeps:(...)
To build the website, we need to have 3 steps: (1) checkout the repository, (2)
setup ruby and install dependencies and (3) run bundle exec jekyll build:
Once the site is built, we need to publish it to S3 and invalidate the caches of
the CloudFront distributions. The AWS Command Line Interface is
already available in GitHub-hosted virtual environments, so we just need to set
up the credentials we want to use. In this case, we want to reference some
repository secrets which we will set up later:
To set up the credentials this workflow is going to use to interact with AWS, I
wanted to create a user with permissions to interact with the relevant S3 bucket
and CloudFront distributions only. To do that, I have added the following to the
Terraform definition (refer to the previous post for more
details on the existing Terraform definition):
This creates a new IAM user, attaches a policy to it that gives it
access to the relevant S3 and CloudFront resources, and creates a new access key
which we will set up as a secret in our GitHub repository. The secret access key
gets stored in the Terraform state, but we define an output that allows us to
read it with terraform output -raw github-actions_aws_iam_access_key_secret.
With the GitHub secrets appropriately set up, we now have a
workflow that publishes this website whenever a new commit is pushed to the
master branch.
I’ve decided to migrate this website from DreamHost to Amazon Web
Services. The main driver for this is costs. This is a static website
(consisting of only HTML, CSS, and a small portion of JavaScript) which is very
suitable to be hosted in S3. This is also a very low
traffic website, well within S3’s free tier. Keeping up with my
current setup, I wanted to retain the following:
I also wanted to keep everything managed from within AWS, so that I could get
some infrastructure automation (Terraform) to help me with setting
everything up. Keeping everything managed from within AWS meant that I had to
have the jcazevedo.net domain transferred and have an SSL certificate
provisioned by AWS (I was previously using Let’s Encrypt). I also
didn’t mind downtime (again, this is a very low traffic website).
Setting Up Terraform
It wasn’t absolutely necessary to use Terraform (or any tool
allowing for infrastructure as code) for this. I don’t predict wanting to have
this infrastructure reproducible nor frequently modified. Still, it serves as
documentation on what is set up on AWS, so I figured it would be a good idea.
The first step was to get Terraform set up and the providers defined. I didn’t
want to keep Terraform’s state locally, so I decided to also use S3 as a state
backend. I don’t need locks on the state (it’s only going to be me deploying
this), so a single file on a S3 bucket would suffice. So, I created an _infra
folder on the root of the directory tree of this website and placed a
providers.tf file in it:
The required version is set to 1.0.11 because that’s the one I’m currently using
at $WORK. Setting up the state backend required manually creating a bucket for
it. Using Terraform to manage that bucket would lead us to the problem of
remotely managing state that we were trying to avoid with it in the first place.
With this set up, a call to terraform init should complete successfully.
Setting Up the S3 Bucket(s)
The next step was to set up the S3 buckets. I actually went with 2 buckets: one
for the root domain (jcazevedo.net) and another for the www subdomain
(www.jcazevedo.net). The reason for it was to set up a redirect on the www
subdomain, which S3 supports. To set the buckets up, I
created an s3.tf file under the _infra_ folder with the following contents:
Most of the configuration is the same for both buckets. We want both buckets to
allow GET requests from the public. The main difference is in the website
configuration. The root bucket specifies an index and error document (to be
served in case of errors), whereas the www bucket just configures the
redirection policy.
Once this was set up, I pushed this website contents to root S3 bucket by
running a bundle exec jekyll build followed by an aws s3 sync _site/
s3://jcazevedo.net/ --delete. The site was already available via the root
bucket website endpoint
(http://jcazevedo.net.s3-website-us-east-1.amazonaws.com/)
and the redirection was already working via the www bucket website endpoint
(http://www.jcazevedo.net.s3-website-us-east-1.amazonaws.com/).
At this point the domain wasn’t yet migrated, so this was still redirecting to
the DreamHost instance.
Setting Up the Domain
I had never transferred a domain before, so I followed AWS
instructions.
This would take a while to complete, so I figured I would create the
Route53 zone before and have the domain already transferred to the
new zone. For that purpose I created a route53.tf file under the _infra
folder with the following contents:
For the validation method I used email instead of DNS since at that time I
didn’t have the DNS moved yet. The email validation is performed while we’re
applying the Terraform diff, so it’s quite fast.
Setting Up the CloudFront Distributions
CloudFront speeds up the distribution of static (and dynamic) web
content. It can handle caching, compression and can require viewers to use HTTPS
so that connections are encrypted. The previously mentioned blog
post by Alex Hyett provided
instructions to set CloudFront distributions pointing to existing S3 buckets and
using HTTPS, so I almost blindly copied the Terraform definitions. Similar to
what had been done before, we needed two distributions: one for the root bucket
and one for the www bucket.
The configurations are similar, except for the caching settings, since the
second distribution only points to the S3 bucket that redirects to the non-www
website.
Adding Route53 Records Pointing to the CloudFront Distributions
The last part of the process involved creating new Route53 A records pointing to
the CloudFront distributions created previously. For this, I’ve added the
following to the route53.tf file mentioned previously:
There’s one record for each of the distributions (www and non-www).
Waiting for the Domain Transfer
The domain transfer from DreamHost to Route53 took
around 8 days. I was notified by email when it was completed. Since everything
was already pre-configured and the website contents had already been pushed to
S3, the website continued to be served as expected from
https://jcazevedo.net/.