The problem for September 3 is Repeated Substring Pattern. The
problem statement is straightforward: given an non-empty string of at most 10000
characters, check if it can be constructed by taking a substring of it and
appending multiple (more than one) copies of it. In other words, the function
should return true if, for a given string \(s\), there exists a proper substring
\(m\) of \(s\) such that \(s = m + \dots + m = n \times m\), for \(n > 1\).

The size of the string is small enough to check all proper substrings of \(s\)
(in \(\mathcal{O}(n^2)\) time). The following is an implementation of that
strategy:

The previous solution is good enough, but some improvements can still be
performed under the same strategy. Namely, it’s not necessary to check for
substrings larger than half the size of \(s\), and there’s no need to build a
new string for the prefix (we can just keep track of the size of the substring
under consideration). However, those improvements don’t improve the asymptotic
time complexity of the solution.

One key observation for a solution with a better asymptotic time complexity is
that if we have a string \(s\) of size \(N\) composed of \(n\) repetitions of
substring \(m\) (let’s say that \(s = n \times m\)), and we append string \(s\)
onto itself (i.e. we have \(s + s = 2 \times n \times m\)), then \(s\) can also
be found in \(s + s\) starting in an index other than \(0\) or \(N\) (since
\(|s + s| = 2N\)). Building on this insight, we can append \(s\) onto itself,
remove the first and last character of it and try to find an occurrence of \(s\)
in the resulting string. If we find it, then \(s\) must be built using a
repeated substring pattern. We remove the first and last character to avoid
finding the instances of \(s\) starting at index \(0\) and index \(N\). If we’re
able to find \(s\) in the resulting string in \(\mathcal{O}(N)\), then we arrive
at an \(\mathcal{O}(N)\) solution for this problem. The
Knuth-Morris-Pratt (KMP) algorithm allows searching for occurrences of a
word \(W\) within a main text string \(S\) in \(\mathcal{O}(|W|) +
\mathcal{O}(|S|)\) using \(\mathcal{O}(|W|)\) extra space, and is therefore
suitable for our use case. I won’t go into details describing the KMP algorithm.
The following is an implementation of the previously described strategy:

September LeetCoding Challenge, Day 2: Contains Duplicate III

The problem for September 2 is Contains Duplicate III. The title of
the problem is a bit misleading, since the statement isn’t about finding
duplicates. Instead, we’re interested in finding, from an array of numbers
nums and integers k and t, whether there are two distinct indices i and
j such that the absolute difference of nums[i] and nums[j] is at most t
and the absolute difference between i and j is at most k.

One way to approach this problem is to, for each number in the array, check if
there is a number in the previous k numbers such that the absolute difference
between them is at most k. If we can keep a sorted list of k numbers at all
times, we should be able to do this in \(\mathcal{O}(n \times log(k))\), \(n\)
being the size of the nums array. The \(\mathcal{O}(n)\) part is due to the
fact that we have to iterate through all elements of nums. The
\(\mathcal{O}(log(k))\) part is due to the fact that, if we can keep a sorted
list of the previous k numbers, we can binary search for the element that
minimizes the absolute difference. A way to implement this in C++ is to use a
multiset (a simple set doesn’t work since there can be repeated elements in
nums). Elements in C++
multisets are sorted and
the interface provides both lower_bound and upper_bound methods to search
for elements in logarithmic time complexity. The following is an implementation
of that strategy:

We check two numbers in each iteration: the number returned by the lower_bound
method (which is going to be the first number that is larger than or equal to
nums[i]) and the number immediately before (the larger number that is smaller
than nums[i]). One of those is going to be the one producing the smallest
absolute difference. If the smallest absolute difference is not greater than
t, we have found a satisfying pair.

The casts to long long are necessary because, even though t is an int, the
absolute difference between two numbers in nums may not fit in an int
(consider \(|2147483647 - (-2147483648)|\) for example).

September LeetCoding Challenge, Day 1: Largest Time for Given Digits

LeetCode is a platform with programming challenges. They’re designed
around helping prepare for technical interviews, but they also host programming
contests and have some explore packages, which
are bundles of problems and articles. In April 2020, they started providing
monthly explore packages in the form of challenges, in which they’d add a
problem each day of the month. Solving all problems within 24 hours of their
release makes you eligible for some rewards. I have been solving these monthly
challenges for fun since April, but decided to start writing a bit about them in
September. It’s very easy to find solutions for most of
these problems online, but I’ll try to post these the day after the challenge,
to avoid unnecessary copying. I’m solving these problems in C++, since it’s my
language of choice for this type of algorithmic programming problems.

The problem for September 1 is Largest Time for Given Digits. The
statement is straightforward: given 4 digits, return the largest 24 hour time
(in a “HH:MM” string) that can be made using all of those digits, or an empty
string if no valid time can be made.

A possible solution for this problem is to enumerate all valid times and check
the largest one that uses all of the provided digits. Since there are at most
\(24 \times 60 = 1440\) valid times, this is good enough to avoid a time limit
exceeded. The following is an implementation of that strategy:

Another possible solution is to iterate through all the permutations of the
provided digits, and return the one that produces a valid time and is the
largest. Since there are only \(4! = 24\) permutations, this runs much faster
than the previous solution. The following is an implementation of that strategy:

After reading this problem for the first time, I was under the impression that a
greedy algorithm that would pick the next largest available digit satisfying the
following restrictions would also work:

The digit in the tens place of the hours number must not be larger than 2;

The digit in the ones place of the hours number must not be larger than 3 if
the digit in the tens place of the hours number is equal to 2;

The digit in the tens place of the minutes number must not be larger than 5.

However, it’s easy to come up with a counter example in which this algorithm
would fail. If provided with the digits \([0, 2, 6, 6]\), the correct output
would be “06:26”. However, the previously described algorithm would fail to
produce a valid answer, since it’d try to use digits \([0, 2]\) for the hours
and be left with \([6, 6]\) for the minutes, which can’t produce a valid minute
number.

WARNING: Contains Game of Thrones spoilers from Season 7 Episode 4.

HBO posted a video giving an inside look on filming the intense loot train
attack in Season 7 Episode 4. From cameras needing to move in sync with ground
explosions to how stuntman need to keep their heart rate down in order to hold
their breath while on fire, it’s filled with a lot of details on how that
amazing scene got filmed.

Apparently, storage blocks on Amazon EBS volumes that were restored from
snapshots incur in significant latency penalties on I/O operations the first
time they’re accessed. According to the user guide:

New EBS volumes receive their maximum performance the moment that they are
available and do not require initialization (formerly known as pre-warming).
However, storage blocks on volumes that were restored from snapshots must be
initialized (pulled down from Amazon S3 and written to the volume) before you
can access the block. This preliminary action takes time and can cause a
significant increase in the latency of an I/O operation the first time each
block is accessed. For most applications, amortizing this cost over the
lifetime of the volume is acceptable. Performance is restored after the data
is accessed once.

This can be particularly painful when restoring Amazon RDS DB snapshots, as the
performance can be severely impacted. In order to overcome the first touch
penalty, it is advisable to warm up the disk by performing a full table scan or
a vacuum on all tables in the database.