Longest Repeating Subsequence

We'll cover the following

Problem Statement

πŸ™‹ Question
Given a sequence, find the length of its longest repeating subsequence (LRS). A repeating subsequence will be the one that appears at least twice in the original sequence and is not overlapping (i.e. none of the corresponding characters in the repeating subsequences have the same index).
Input: β€œt o m o r r o w”
Output: 2
Explanation: The longest repeating subsequence is β€œor” {tomorrow}.
Input: β€œa a b d b c e c”
Output: 3
Explanation: The longest repeating subsequence is β€œa b c” {a a b d b c e c}.
Input: β€œf m f f”
Output: 2
Explanation: The longest repeating subsequence is β€œf f” {f m f f, f m f f}. Please note the second last character is shared in LRS.

Basic Solution

The problem is quite similar to the Longest Common Subsequence (LCS), with two differences:

A basic brute-force solution could be to try all subsequences of the given sequence to find the longest repeating one, but the problem is how to ensure that the LRS’s characters do not have the same index. For this, we can start with two indexes in the given sequence, so at any step we have two choices:

Code

Here is the code:

def find_LRS_length(str):
    return find_LRS_length_recursive(str, 0, 0)


def find_LRS_length_recursive(str,  i1,  i2):
    if i1 == len(str) or i2 == len(str):
        return 0

    if i1 != i2 and str[i1] == str[i2]:
        return 1 + find_LRS_length_recursive(str, i1 + 1, i2 + 1)

    c1 = find_LRS_length_recursive(str, i1, i2 + 1)
    c2 = find_LRS_length_recursive(str, i1 + 1, i2)

    return max(c1, c2)


def main():
    print(find_LRS_length("tomorrow"))
    print(find_LRS_length("aabdbcec"))
    print(find_LRS_length("fmff"))


main()
πŸ‘‰ Complexity
The time complexity of the above algorithm is exponential O(2n)O(2^n), where nn is the length of the input sequence. The space complexity is O(n)O(n) which is used to store the recursion stack.

Top-down Dynamic Programming with Memoization

We can use an array to store the already solved subproblems.

The two changing values to our recursive function are the two indexes, i1 and i2. Therefore, we can store the results of all the subproblems in a two-dimensional array. (Another alternative could be to use a hash-table whose key would be a string (i1 + β€œ|” + i2)).

Code

Here is the code:

def find_LRS_length(str):
    n = len(str)
    dp = [[-1 for _ in range(n)] for _ in range(n)]
    return find_LRS_length_recursive(dp, str, 0, 0)


def find_LRS_length_recursive(dp,  str, i1, i2):
    n = len(str)
    if i1 == n or i2 == n:
        return 0

    if dp[i1][i2] == -1:
        if i1 != i2 and str[i1] == str[i2]:
            dp[i1][i2] = 1 + find_LRS_length_recursive(dp, str, i1 + 1, i2 + 1)
        else:
            c1 = find_LRS_length_recursive(dp, str, i1, i2 + 1)
            c2 = find_LRS_length_recursive(dp, str, i1 + 1, i2)
            dp[i1][i2] = max(c1, c2)

    return dp[i1][i2]


def main():
    print(find_LRS_length("tomorrow"))
    print(find_LRS_length("aabdbcec"))
    print(find_LRS_length("fmff"))


main()

Bottom-up Dynamic Programming

Since we want to match all the subsequences of the given string, we can use a two-dimensional array to store our results. As mentioned above, we will be tracking two indexes to overcome the overlapping problem. So for each of the two indexes, i1 and i2, we will choose one of the following options:

So our recursive formula would be:

if i1 != i2 && str[i1] == str[i2] 
    dp[i1][i2] = 1 + dp[i1-1][i2-1]
else 
    dp[i1][i2] = max(dp[i1-1][i2], dp[i1][i2-1])

Code

Here is the code for our bottom-up dynamic programming approach:

def find_LRS_length(str):
    n = len(str)
    dp = [[0 for _ in range(n+1)] for _ in range(n+1)]
    maxLength = 0
    # dp[i1][i2] will be storing the LRS up to str[0..i1-1][0..i2-1]
    # this also means that subsequences of length zero(first row and column of
    # dp[][]), will always have LRS of size zero.
    for i1 in range(1, n+1):
        for i2 in range(1, n+1):
            if i1 != i2 and str[i1 - 1] == str[i2 - 1]:
                dp[i1][i2] = 1 + dp[i1 - 1][i2 - 1]
            else:
                dp[i1][i2] = max(dp[i1 - 1][i2], dp[i1][i2 - 1])

            maxLength = max(maxLength, dp[i1][i2])

    return maxLength


def main():
    print(find_LRS_length("tomorrow"))
    print(find_LRS_length("aabdbcec"))
    print(find_LRS_length("fmff"))


main()
πŸ‘‰ Complexity
The time and space complexity of the above algorithm is O(n2)O(n^2), where nn is the length of the input sequence.

Next

  • Subsequence Pattern Matching