September LeetCoding Challenge, Day 3: Repeated Substring Pattern

September 4, 2020

This is part of a series of posts about the September LeetCoding Challenge. Check the first post for more information.

The problem for September 3 is Repeated Substring Pattern. The problem statement is straightforward: given an non-empty string of at most 10000 characters, check if it can be constructed by taking a substring of it and appending multiple (more than one) copies of it. In other words, the function should return true if, for a given string \(s\), there exists a proper substring \(m\) of \(s\) such that \(s = m + \dots + m = n \times m\), for \(n > 1\).

The size of the string is small enough to check all proper substrings of \(s\) (in \(\mathcal{O}(n^2)\) time). The following is an implementation of that strategy:

class Solution {
private:
  bool is_rep(const string& rep, const string& s) {
    int R = rep.size(), S = s.size(), ir = 0;
    if (R == S || S % R != 0)
      return false;
    for (int i = 0; i < S; ++i) {
      if (s[i] != rep[ir])
        return false;
      ir = (ir + 1) % R;
    }
    return ir == 0;
  }

public:
  bool repeatedSubstringPattern(string s) {
    string rep = "";
    for (char ch : s) {
      rep += ch;
      if (is_rep(rep, s))
        return true;
    }
    return false;
  }
};

The previous solution is good enough, but some improvements can still be performed under the same strategy. Namely, it’s not necessary to check for substrings larger than half the size of \(s\), and there’s no need to build a new string for the prefix (we can just keep track of the size of the substring under consideration). However, those improvements don’t improve the asymptotic time complexity of the solution.

One key observation for a solution with a better asymptotic time complexity is that if we have a string \(s\) of size \(N\) composed of \(n\) repetitions of substring \(m\) (let’s say that \(s = n \times m\)), and we append string \(s\) onto itself (i.e. we have \(s + s = 2 \times n \times m\)), then \(s\) can also be found in \(s + s\) starting in an index other than \(0\) or \(N\) (since \(|s + s| = 2N\)). Building on this insight, we can append \(s\) onto itself, remove the first and last character of it and try to find an occurrence of \(s\) in the resulting string. If we find it, then \(s\) must be built using a repeated substring pattern. We remove the first and last character to avoid finding the instances of \(s\) starting at index \(0\) and index \(N\). If we’re able to find \(s\) in the resulting string in \(\mathcal{O}(N)\), then we arrive at an \(\mathcal{O}(N)\) solution for this problem. The Knuth-Morris-Pratt (KMP) algorithm allows searching for occurrences of a word \(W\) within a main text string \(S\) in \(\mathcal{O}(|W|) + \mathcal{O}(|S|)\) using \(\mathcal{O}(|W|)\) extra space, and is therefore suitable for our use case. I won’t go into details describing the KMP algorithm. The following is an implementation of the previously described strategy:

class Solution {
private:
  bool kmp(const string& W, const string& S) {
    int M = W.size(), N = S.size();
    vector<int> T(M, 0);
    int len = 0;
    T[0] = 0;
    int i = 1;
    while (i < M) {
      if (W[i] == W[len]) {
        len++;
        T[i] = len;
        i++;
      } else if (len != 0) {
        len = T[len - 1];
      } else {
        T[i] = 0;
        i++;
      }
    }
    int j;
    i = j = 0;
    while (i < N) {
      if (W[j] == S[i]) {
        j++;
        i++;
      }
      if (j == M)
        return true;
      if (W[j] != S[i]) {
        if (j != 0)
          j = T[j - 1];
        else
          i = i + 1;
      }
    }
    return false;
  }

public:
  bool repeatedSubstringPattern(string s) {
    string res = (s + s).substr(1, s.size() * 2 - 2);
    return kmp(s, res);
  }
};

September LeetCoding Challenge, Day 2: Contains Duplicate III

September 3, 2020

This is part of a series of posts about the September LeetCoding Challenge. Check the first post for more information.

The problem for September 2 is Contains Duplicate III. The title of the problem is a bit misleading, since the statement isn’t about finding duplicates. Instead, we’re interested in finding, from an array of numbers nums and integers k and t, whether there are two distinct indices i and j such that the absolute difference of nums[i] and nums[j] is at most t and the absolute difference between i and j is at most k.

One way to approach this problem is to, for each number in the array, check if there is a number in the previous k numbers such that the absolute difference between them is at most k. If we can keep a sorted list of k numbers at all times, we should be able to do this in \(\mathcal{O}(n \times log(k))\), \(n\) being the size of the nums array. The \(\mathcal{O}(n)\) part is due to the fact that we have to iterate through all elements of nums. The \(\mathcal{O}(log(k))\) part is due to the fact that, if we can keep a sorted list of the previous k numbers, we can binary search for the element that minimizes the absolute difference. A way to implement this in C++ is to use a multiset (a simple set doesn’t work since there can be repeated elements in nums). Elements in C++ multisets are sorted and the interface provides both lower_bound and upper_bound methods to search for elements in logarithmic time complexity. The following is an implementation of that strategy:

class Solution {
public:
  bool containsNearbyAlmostDuplicate(vector<int>& nums, int k, int t) {
    if (nums.empty() || k == 0)
      return false;
    multiset<int> current;
    current.insert(nums[0]);
    int N = nums.size();
    for (int i = 1; i < N; ++i) {
      auto l_itr = current.lower_bound(nums[i]);
      if (l_itr != current.end() &&
          abs(((long long) *l_itr) - ((long long) nums[i])) <= t)
        return true;
      if (l_itr != current.begin() &&
          abs(((long long) *(--l_itr)) - ((long long) nums[i])) <= t)
        return true;
      current.insert(nums[i]);
      if (current.size() > k)
        current.erase(current.find(nums[i - k]));
    }
    return false;
  }
};

We check two numbers in each iteration: the number returned by the lower_bound method (which is going to be the first number that is larger than or equal to nums[i]) and the number immediately before (the larger number that is smaller than nums[i]). One of those is going to be the one producing the smallest absolute difference. If the smallest absolute difference is not greater than t, we have found a satisfying pair.

The casts to long long are necessary because, even though t is an int, the absolute difference between two numbers in nums may not fit in an int (consider \(|2147483647 - (-2147483648)|\) for example).

September LeetCoding Challenge, Day 1: Largest Time for Given Digits

September 2, 2020

LeetCode is a platform with programming challenges. They’re designed around helping prepare for technical interviews, but they also host programming contests and have some explore packages, which are bundles of problems and articles. In April 2020, they started providing monthly explore packages in the form of challenges, in which they’d add a problem each day of the month. Solving all problems within 24 hours of their release makes you eligible for some rewards. I have been solving these monthly challenges for fun since April, but decided to start writing a bit about them in September. It’s very easy to find solutions for most of these problems online, but I’ll try to post these the day after the challenge, to avoid unnecessary copying. I’m solving these problems in C++, since it’s my language of choice for this type of algorithmic programming problems.

The problem for September 1 is Largest Time for Given Digits. The statement is straightforward: given 4 digits, return the largest 24 hour time (in a “HH:MM” string) that can be made using all of those digits, or an empty string if no valid time can be made.

A possible solution for this problem is to enumerate all valid times and check the largest one that uses all of the provided digits. Since there are at most \(24 \times 60 = 1440\) valid times, this is good enough to avoid a time limit exceeded. The following is an implementation of that strategy:

class Solution {
private:
  pair<int, int> inc(pair<int, int> curr) {
    pair<int, int> ans = {curr.first, curr.second + 1};
    if (ans.second >= 60) {
      ans.first++;
      ans.second = 0;
    }
    return ans;
  }

  bool good(unordered_map<int, int> cnt, pair<int, int>& curr) {
    cnt[curr.first / 10]--;
    cnt[curr.first % 10]--;
    cnt[curr.second / 10]--;
    cnt[curr.second % 10]--;
    for (auto itr = cnt.begin(); itr != cnt.end(); ++itr) {
      if (itr->second != 0)
        return false;
    }
    return true;
  }

public:
  string largestTimeFromDigits(vector<int>& A) {
    pair<int, int> curr = {0, 0};
    pair<int, int> best = {-1, -1};
    unordered_map<int, int> cnt;
    for (int num : A)
      cnt[num]++;
    while (curr.first < 24) {
      if (good(cnt, curr))
        best = curr;
      curr = inc(curr);
    }
    if (best.first == -1)
      return "";
    string ans = "";
    ans += (best.first / 10) + '0';
    ans += (best.first % 10) + '0';
    ans += ':';
    ans += (best.second / 10) + '0';
    ans += (best.second % 10) + '0';
    return ans;
  }
};

Another possible solution is to iterate through all the permutations of the provided digits, and return the one that produces a valid time and is the largest. Since there are only \(4! = 24\) permutations, this runs much faster than the previous solution. The following is an implementation of that strategy:

class Solution {
private:
  bool good(const vector<int>& A) {
    int hours = A[0] * 10 + A[1];
    int minutes = A[2] * 10 + A[3];
    return hours <= 23 && minutes <= 59;
  }

public:
  string largestTimeFromDigits(vector<int>& A) {
    sort(A.begin(), A.end());
    vector<int> best;
    do {
      if (good(A))
        best = A;
    } while (next_permutation(A.begin(), A.end()));
    if (best.empty())
      return "";
    string ans = "";
    ans += best[0] + '0';
    ans += best[1] + '0';
    ans += ':';
    ans += best[2] + '0';
    ans += best[3] + '0';
    return ans;
  }
};

After reading this problem for the first time, I was under the impression that a greedy algorithm that would pick the next largest available digit satisfying the following restrictions would also work:

  • The digit in the tens place of the hours number must not be larger than 2;
  • The digit in the ones place of the hours number must not be larger than 3 if the digit in the tens place of the hours number is equal to 2;
  • The digit in the tens place of the minutes number must not be larger than 5.

However, it’s easy to come up with a counter example in which this algorithm would fail. If provided with the digits \([0, 2, 6, 6]\), the correct output would be “06:26”. However, the previously described algorithm would fail to produce a valid answer, since it’d try to use digits \([0, 2]\) for the hours and be left with \([6, 6]\) for the minutes, which can’t produce a valid minute number.

An Inside Look into the Loot Train Attack Scene

August 9, 2017

WARNING: Contains Game of Thrones spoilers from Season 7 Episode 4.

HBO posted a video giving an inside look on filming the intense loot train attack in Season 7 Episode 4. From cameras needing to move in sync with ground explosions to how stuntman need to keep their heart rate down in order to hold their breath while on fire, it’s filled with a lot of details on how that amazing scene got filmed.

First Touch Penalty on Amazon EBS Volumes

August 8, 2017

Apparently, storage blocks on Amazon EBS volumes that were restored from snapshots incur in significant latency penalties on I/O operations the first time they’re accessed. According to the user guide:

New EBS volumes receive their maximum performance the moment that they are available and do not require initialization (formerly known as pre-warming). However, storage blocks on volumes that were restored from snapshots must be initialized (pulled down from Amazon S3 and written to the volume) before you can access the block. This preliminary action takes time and can cause a significant increase in the latency of an I/O operation the first time each block is accessed. For most applications, amortizing this cost over the lifetime of the volume is acceptable. Performance is restored after the data is accessed once.

This can be particularly painful when restoring Amazon RDS DB snapshots, as the performance can be severely impacted. In order to overcome the first touch penalty, it is advisable to warm up the disk by performing a full table scan or a vacuum on all tables in the database.