September LeetCoding Challenge, Day 3: Repeated Substring Pattern
September 4, 2020The problem for September 3 is Repeated Substring Pattern. The problem statement is straightforward: given an non-empty string of at most 10000 characters, check if it can be constructed by taking a substring of it and appending multiple (more than one) copies of it. In other words, the function should return true if, for a given string \(s\), there exists a proper substring \(m\) of \(s\) such that \(s = m + \dots + m = n \times m\), for \(n > 1\).
The size of the string is small enough to check all proper substrings of \(s\) (in \(\mathcal{O}(n^2)\) time). The following is an implementation of that strategy:
The previous solution is good enough, but some improvements can still be performed under the same strategy. Namely, it’s not necessary to check for substrings larger than half the size of \(s\), and there’s no need to build a new string for the prefix (we can just keep track of the size of the substring under consideration). However, those improvements don’t improve the asymptotic time complexity of the solution.
One key observation for a solution with a better asymptotic time complexity is that if we have a string \(s\) of size \(N\) composed of \(n\) repetitions of substring \(m\) (let’s say that \(s = n \times m\)), and we append string \(s\) onto itself (i.e. we have \(s + s = 2 \times n \times m\)), then \(s\) can also be found in \(s + s\) starting in an index other than \(0\) or \(N\) (since \(|s + s| = 2N\)). Building on this insight, we can append \(s\) onto itself, remove the first and last character of it and try to find an occurrence of \(s\) in the resulting string. If we find it, then \(s\) must be built using a repeated substring pattern. We remove the first and last character to avoid finding the instances of \(s\) starting at index \(0\) and index \(N\). If we’re able to find \(s\) in the resulting string in \(\mathcal{O}(N)\), then we arrive at an \(\mathcal{O}(N)\) solution for this problem. The Knuth-Morris-Pratt (KMP) algorithm allows searching for occurrences of a word \(W\) within a main text string \(S\) in \(\mathcal{O}(|W|) + \mathcal{O}(|S|)\) using \(\mathcal{O}(|W|)\) extra space, and is therefore suitable for our use case. I won’t go into details describing the KMP algorithm. The following is an implementation of the previously described strategy: