Finding the Longest Palindromic Substring in Linear Time

Finding the Longest Palindromic Substring in Linear Time
Fred Akalin
November 28, 2007

Another interesting problem I stumbled across on reddit is finding the longest substring of a given string that is a palindrome. I found the explanation on Johan Jeuring's blog somewhat confusing and I had to spend some time poring over the Haskell code (eventually rewriting it in Python) and walking through examples before it "clicked." I haven't found any other explanations of the same approach so hopefully my explanation below will help the next person who is curious about this problem.

Of course, the most naive solution would be to exhaustively examine all (n2)substrings of the given n-length string, test each one if it's a palindrome, and keep track of the longest one seen so far. This has complexity O(n3), but we can easily do better by realizing that a palindrome is centered on either a letter (for odd-length palindromes) or a space between letters (for even-length palindromes). Therefore we can examine all 2n+1possible centers and find the longest palindrome for that center, keeping track of the overall longest palindrome. This has complexity O(n2).

It is not immediately clear that we can do better but if we're told that an Θ(n)algorithm exists we can infer that the algorithm is most likely structured as an iteration through all possible centers. As an off-the-cuff first attempt, we can adapt the above algorithm by keeping track of the current center and expanding until we find the longest palindrome around that center, in which case we then consider the last letter (or space) of that palindrome as the new center. The algorithm (which isn't correct) looks like this informally:

    Set the current center to the first letter.
    Loop while the current center is valid:
        Expand to the left and right simultaneously until we find the largest palindrome around this center.
        If the current palindrome is bigger than the stored maximum one, store the current one as the maximum one.
        Set the space following the current palindrome as the current center unless the two letters immediately surrounding it are different, in which case set the last letter of the current palindrome as the current center.
    Return the stored maximum palindrome.

This seems to work but it doesn't handle all cases: consider the string "abababa". The first non-trivial palindrome we see is "a|bababa", followed by "aba|baba". Considering the current space as the center doesn't get us anywhere but considering the preceding letter (the second 'a') as the center, we can expand to get "ababa|ba". From this state, considering the current space again doesn't get us anywhere but considering the preceding letter as the center, we can expand to get "abababa|". However, this is incorrect as the longest palindrome is actually the entire string! We can remedy this case by changing the algorithm to try and set the new center to be one before the end of the last palindrome, but it is clear that having a fixed "lookbehind" doesn't solve the general case and anything more than that will probably bump us back up to quadratic time.

The key question is this: given the state from the example above, "ababa|ba", what makes the second 'b' so special that it should be the new center? To use another example, in "abcbabcba|bcba", what makes the second 'c' so special that it should be the new center? Hopefully, the answer to this question will lead to the answer to the more important question: once we stop expanding the palindrome around the current center, how do we pick the next center? To answer the first question, first notice that the current palindromes in the above examples themselves contain smaller non-trivial palindromes: "ababa" contains "aba" and "abcbabcba" contains "abcba" which also contains "bcb". Then, notice that if we expand around the "special" letters, we get a palindrome which shares a right edge with the current palindrome; that is, the longest palindrome around the special letters are proper suffixes of the current palindrome. With a little thought, we can then answer the second question: to pick the next center, take the center of the longest palindromic proper suffix of the current palindrome. Our algorithm then looks like this:

    Set the current center to the first letter.
    Loop while the current center is valid:
        Expand to the left and right simultaneously until we find the largest palindrome around this center.
        If the current palindrome is bigger than the stored maximum one, store the current one as the maximum one.
        Find the maximal palindromic proper suffix of the current palindrome.
        Set the center of the suffix from c as the current center and start expanding from the suffix as it is palindromic.
    Return the stored maximum palindrome.

However, unless step 2c can be done efficiently, it will cause the algorithm to be superlinear. Doing step 2c efficiently seems impossible since we have to examine the entire current palindrome to find the longest palindromic suffix unless we somehow keep track of extra state as we progress through the input string. Notice that the longest palindromic suffix would by definition also be a palindrome of the input string so it might suffice to keep track of every palindrome that we see as we move through the string and hopefully, by the time we finish expanding around a given center, we would know where all the palindromes with centers lying to the left of the current one are. However, if the longest palindromic suffix has a center to the right of the current center, we would not know about it. But we also have at our disposal the very useful fact that a palindromic proper suffix of a palindrome has a corresponding dual palindromic proper prefix. For example, in one of our examples above, "abcbabcba", notice that "abcba" appears twice: once as a prefix and once as a suffix. Therefore, while we wouldn't know about all the palindromic suffixes of our current palindrome, we would know about either it or its dual.

Another crucial realization is the fact that we don't have to keep track of all the palindromes we've seen. To use the example "abcbabcba" again, we don't really care about "bcb" that much, since it's already contained in the palindrome "abcba". In fact, we only really care about keeping track of the longest palindromes for a given center or equivalently, the length of the longest palindrome for a given center. But this is simply a more general version of our original problem, which is to find the longest palindrome around any center! Thus, if we can keep track of this state efficiently, maybe by taking advantage of the properties of palindromes, we don't have to keep track of the maximal palindrome and can instead figure it out at the very end.

Unfortunately, we seem to be back where we started; the second naive algorithm that we have is simply to loop through all possible centers and for each one find the longest palindrome around that center. But our discussion has led us to a different incremental formulation: given a current center, the longest palindrome around that center, and a list of the lengths of the longest palindromes around the centers to the left of the current center, can we figure out the new center to consider and extend the list of longest palindrome lengths up to that center efficiently? For example, if we have the state:

<"ababa|??", [0, 1, 0, 3, 0, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]>

where the highlighted letter is the current center, the vertical line is our current position, the question marks represent unread characters or unknown quantities, and the array represents the list of longest palindrome lengths by center, can we get to the state:

<"ababa|??", [0, 1, 0, 3, 0, 5, 0, ?, ?, ?, ?, ?, ?, ?, ?]>

and then to:

<"abababa|", [0, 1, 0, 3, 0, 5, 0, 7, 0, 5, 0, 3, 0, 1, 0]>

efficiently? The crucial thing to notice is that the longest palindrome lengths array (we'll call it simply the lengths array) in the final state is palindromic since the original string is palindromic. In fact, the lengths array obeys a more general property: the longest palindrome d places to the right of the current center (the d-right palindrome) is at least as long as the longest palindrome d places to the left of the current center (the d-left palindrome) if the d-left palindrome is completely contained in the longest palindrome around the current center (the center palindrome), and it is of equal length if the d-left palindrome is not a prefix of the center palindrome or if the center palindrome is a suffix of the entire string. This then implies that we can more or less fill in the values to the right of the current center from the values to the left of the current center. For example, from [0, 1, 0, 3, 0, 5, ?, ?, ?, ?, ?, ?, ?, ?, ?] we can get to [0, 1, 0, 3, 0, 5, 0, ≥3?, 0, ≥1?, 0, ?, ?, ?, ?]. This also implies that the first unknown entry (in this case, ≥3?) should be the new center because it means that the center palindrome is not a suffix of the input string (i.e., we're not done) and that the d-left palindrome is a prefix of the center palindrome.

From these observations we can construct our final algorithm which returns the lengths array, and from which it is easy to find the longest palindromic substring:

    Initialize the lengths array to the number of possible centers.
    Set the current center to the first center.
    Loop while the current center is valid:
        Expand to the left and right simultaneously until we find the largest palindrome around this center.
        Fill in the appropriate entry in the longest palindrome lengths array.
        Iterate through the longest palindrome lengths array backwards and fill in the corresponding values to the right of the entry for the current center until an unknown value (as described above) is encountered.
        set the new center to the index of this unknown value.
    Return the lengths array.

Note that at each step of the algorithm we're either incrementing our current position in the input string or filling in an entry in the lengths array. Since the lengths array has size linear in the size of the input array, the algorithm has worst-case linear running time. Since given the lengths array we can find and return the longest palindromic substring in linear time, a linear-time algorithm to find the longest palindromic substring is the composition of these two operations.

Here is Python code that implements the above algorithm (although it is closer to Johan Jeuring's Haskell implementation than to the above description):
* An exercise for the reader: in this place in the code you might think that you can replace the == with >= to improve performance. This does not change the correctness of the algorithm but it does hurt performance, contrary to expectations. Why?

def fastLongestPalindromes(seq):
    """
    Behaves identically to naiveLongestPalindrome (see below), but
    runs in linear time.
    """
    seqLen = len(seq)
    l = []
    i = 0
    palLen = 0
    # Loop invariant: seq[(i - palLen):i] is a palindrome.
    # Loop invariant: len(l) >= 2 * i - palLen. The code path that
    # increments palLen skips the l-filling inner-loop.
    # Loop invariant: len(l) < 2 * i + 1. Any code path that
    # increments i past seqLen - 1 exits the loop early and so skips
    # the l-filling inner loop.
    while i < seqLen:
        # First, see if we can extend the current palindrome.  Note
        # that the center of the palindrome remains fixed.
        if i > palLen and seq[i - palLen - 1] == seq[i]:
            palLen += 2
            i += 1
            continue

        # The current palindrome is as large as it gets, so we append
        # it.
        l.append(palLen)

        # Now to make further progress, we look for a smaller
        # palindrome sharing the right edge with the current
        # palindrome.  If we find one, we can try to expand it and see
        # where that takes us.  At the same time, we can fill the
        # values for l that we neglected during the loop above. We
        # make use of our knowledge of the length of the previous
        # palindrome (palLen) and the fact that the values of l for
        # positions on the right half of the palindrome are closely
        # related to the values of the corresponding positions on the
        # left half of the palindrome.

        # Traverse backwards starting from the second-to-last index up
        # to the edge of the last palindrome.
        s = len(l) - 2
        e = s - palLen
        for j in xrange(s, e, -1):
            # d is the value l[j] must have in order for the
            # palindrome centered there to share the left edge with
            # the last palindrome.  (Drawing it out is helpful to
            # understanding why the - 1 is there.)
            d = j - e - 1

            # We check to see if the palindrome at l[j] shares a left
            # edge with the last palindrome.  If so, the corresponding
            # palindrome on the right half must share the right edge
            # with the last palindrome, and so we have a new value for
            # palLen.
            if l[j] == d: # *
                palLen = d
                # We actually want to go to the beginning of the outer
                # loop, but Python doesn't have loop labels.  Instead,
                # we use an else block corresponding to the inner
                # loop, which gets executed only when the for loop
                # exits normally (i.e., not via break).
                break

            # Otherwise, we just copy the value over to the right
            # side.  We have to bound l[i] because palindromes on the
            # left side could extend past the left edge of the last
            # palindrome, whereas their counterparts won't extend past
            # the right edge.
            l.append(min(d, l[j]))
        else:
            # This code is executed in two cases: when the for loop
            # isn't taken at all (palLen == 0) or the inner loop was
            # unable to find a palindrome sharing the left edge with
            # the last palindrome.  In either case, we're free to
            # consider the palindrome centered at seq[i].
            palLen = 1
            i += 1

    # We know from the loop invariant that len(l) < 2 * seqLen + 1, so
    # we must fill in the remaining values of l.

    # Obviously, the last palindrome we're looking at can't grow any
    # more.
    l.append(palLen)

    # Traverse backwards starting from the second-to-last index up
    # until we get l to size 2 * seqLen + 1. We can deduce from the
    # loop invariants we have enough elements.
    lLen = len(l)
    s = lLen - 2
    e = s - (2 * seqLen + 1 - lLen)
    for i in xrange(s, e, -1):
        # The d here uses the same formula as the d in the inner loop
        # above.  (Computes distance to left edge of the last
        # palindrome.)
        d = i - e - 1
        # We bound l[i] with min for the same reason as in the inner
        # loop above.
        l.append(min(d, l[i]))

    return l

And here is a naive quadratic version for comparison:

def naiveLongestPalindromes(seq):
    """
    Given a sequence seq, returns a list l such that l[2 * i + 1]
    holds the length of the longest palindrome centered at seq[i]
    (which must be odd), l[2 * i] holds the length of the longest
    palindrome centered between seq[i - 1] and seq[i] (which must be
    even), and l[2 * len(seq)] holds the length of the longest
    palindrome centered past the last element of seq (which must be 0,
    as is l[0]).

    The actual palindrome for l[i] is seq[s:(s + l[i])] where s is i
    // 2 - l[i] // 2. (// is integer division.)

    Example:
    naiveLongestPalindrome('ababa') -> [0, 1, 0, 3, 0, 5, 0, 3, 0, 1]
   
    Runs in quadratic time.
    """
    seqLen = len(seq)
    lLen = 2 * seqLen + 1
    l = []

    for i in xrange(lLen):
        # If i is even (i.e., we're on a space), this will produce e
        # == s.  Otherwise, we're on an element and e == s + 1, as a
        # single letter is trivially a palindrome.
        s = i / 2
        e = s + i % 2

        # Loop invariant: seq[s:e] is a palindrome.
        while s > 0 and e < seqLen and seq[s - 1] == seq[e]:
            s -= 1
            e += 1

        l.append(e - s)

    return l

Note that this is not the only efficient solution to this problem; building a suffix tree is linear in the length of the input string and you can use one to solve this problem but as Johan also mentions, that is a much less direct and efficient solution compared to this one.

Finding the Longest Palindromic Substring in Linear Time的更多相关文章

  1. 5.Longest Palindromic Substring (String; DP, KMP)

    Given a string S, find the longest palindromic substring in S. You may assume that the maximum lengt ...

  2. 最长回文子串-LeetCode 5 Longest Palindromic Substring

    题目描述 Given a string S, find the longest palindromic substring in S. You may assume that the maximum ...

  3. leetcode--5. Longest Palindromic Substring

    题目来自 https://leetcode.com/problems/longest-palindromic-substring/ 题目:Given a string S, find the long ...

  4. [LeetCode] Longest Palindromic Substring 最长回文串

    Given a string S, find the longest palindromic substring in S. You may assume that the maximum lengt ...

  5. No.005:Longest Palindromic Substring

    问题: Given a string S, find the longest palindromic substring in S. You may assume that the maximum l ...

  6. Leetcode Longest Palindromic Substring

    Given a string S, find the longest palindromic substring in S. You may assume that the maximum lengt ...

  7. 【leedcode】 Longest Palindromic Substring

    Given a , and there exists one unique longest palindromic substring. https://leetcode.com/problems/l ...

  8. [LeetCode_5] Longest Palindromic Substring

    LeetCode: 5. Longest Palindromic Substring class Solution { public: //动态规划算法 string longestPalindrom ...

  9. 5. Longest Palindromic Substring

    Given a string S, find the longest palindromic substring in S. You may assume that the maximum lengt ...

随机推荐

  1. Javascript 匀速运动——应用案例:网站常用功能分享到

    网站上会经常用到Javascript 中的运动,这次与大家分享一下一些运动的基本应用 . 方便大家在开发中能够直接使用. 代码简单易懂,适用于初学者.最后会一步一步整理出一套自己的运动框架. 应用案例 ...

  2. Windows系统环境下一个Apache运行多个PHP版本

    我个人机器上环境是基于Apache2.2运行的PHP5.2/4,如你想部署其他版本的PHP或在更多的版本之间切换,同理操作步骤是一致的. 依本人环境为例,机器上已经安装了PHP5.2版本, 所以首先重 ...

  3. Continue

    Continue 其作用为结束本次循环.即跳出循环体中下面尚未执行的语句. 对于while循环,继续求解循环条件. 对于for循环程序流程接着求解for语句头中的第三个部分expression表达式. ...

  4. C语言学习 —— 字符串的学习(一)

    这是本人在学习 C语言有关 字符串内容 时的相关笔记 由于本人技术有限,如有错误,还望指正 C语言中数据类型中只有 字符型(char),而 char型 变量一次只能存储一个字符,在日常工作中经常需要定 ...

  5. Orchard 源码探索(Application_Start)之异步委托调用

    2014年5月26日 10:26:31 晴 ASP.NET 接收到对应用程序中任何资源的第一个请求时,名为ApplicationManager 的类会创建一个应用程序域.应用程序域为全局变量提供应用程 ...

  6. 使用Ramdisk 加速 Visualstudio 编译调试

    一般来说ASP.NET在执行的时候,会先动态编译在目录 C:\Windows\Microsoft.NET\Framework64\版本\Temporary ASP.NET Files 由于每次修改程序 ...

  7. 7.15 css与js 选择奇偶子元素的区别

    js: 选取偶数位置的 <tr> 元素 $("tr:even") 选取奇数位置的 <tr> 元素 $("tr:odd") css 选取偶 ...

  8. 为什么需要Page Object?

    为什么需要Page Object? Page Object(PO)是界面自动化验收测试中的一个常见模式,要和@槽神刘叫兽探讨一下PO的必要性,顾写这篇小文表达一下我的观点. PO的主要价值体现在对界面 ...

  9. BadUSB的防范研究

    近期爆出的badUSB漏洞,通过将病毒植入固件,能够伪装成键盘等设备,直接控制电脑,业界还没有非常好的修复方法. 从安全产品的角度.对于这个问题的防范,有下面几点可能不成熟的想法 1.病毒伪装成键盘. ...

  10. 关于虚拟机装kali-linux的联网问题

    我用的是VMware Workstation11,近期装了一个kali-linux,想玩一下password破解.没想到装上之后网络连接显示的是活跃的却无法上网,我试过桥接等其他方式去联网,却依然无法 ...