LeetCode 1044. Longest Duplicate Substring

Question

Given a string s, consider all duplicated substrings: (contiguous) substrings of s that occur 2 or more times. The occurrences may overlap.

Return any duplicated substring that has the longest possible length. If s does not have a duplicated substring, the answer is "".

Example 1:

1
2
Input: s = "banana"
Output: "ana"

Example 2:

1
2
Input: s = "abcd"
Output: ""

Constraints:

  • 2 <= s.length <= 3 * 104
  • s consists of lowercase English letters.

Source: https://leetcode.com/problems/longest-duplicate-substring/

Solution

One intuitive idea is 2D DP, which has a O(n^2) time complexity. Finding duplicate substring in one string can be regarded as a variant of the problem of finding common substrings between two strings, like LeetCode 718. Maximum Length of Repeated Subarray.

How to solve this problem faster? The key point is how to determine whether two substring are identical. To prove that two substring are identical, we have to compare their characters one by one. However, we do not have to do so to be sure that two substring are not identical. Instead, if the hash values of two substrings are not the same, these two substrings must be different.

Computing the hash value of each substring is still time consuming. We use rolling hash function, which treats character in a string as digits and takes the numeric value of these digits as the hash value. Thus, for a new string that only changes a few characters (in this problem, the first and the last), rolling hash function can compute the new hash value quickly.

In the rolling hash function, we use a large prime number 10^9+7 to avoid overflow.

A long duplicate string must contain shorter duplicate strings. Also, we only care about the longest duplicate substring. We can use binary search to find the largest length of duplicate substring.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
/**
* @param L length of duplicate strings
* @param base base of numeric strings
* @param N length of the original string
* @param nums numeric array of the original string
* @return the start index of a duplicate string, if not exist, return -1
*/
private int searchDuplicateStrings(int L, int base, long modulus, int N, int[] nums) {
long hash = 0; // hash code
long highest = 1; // base^(L-1)
for (int i = 0; i < L; i++) {
hash = (hash * base + nums[i]) % modulus;
}
for (int i = 1; i < L; i++) {
highest = (highest * base) % modulus;
}

// hash code -> a list of start indexes
Map<Long, List<Integer>> map = new HashMap<>();
// add the first substring
map.put(hash, new ArrayList<>());
map.get(hash).add(0);
for (int left = 1; left <= N - L; left++) {
// update hash value while avoiding overflow
hash = (hash - nums[left - 1] * highest % modulus + modulus) % modulus;
hash = (hash * base + nums[left + L - 1]) % modulus;
if (map.containsKey(hash)) {
List<Integer> starts = map.get(hash);
// compare substrings with the same hash code
for (int start : starts) {
boolean isSame = true;
for (int i = 0; i < L; i++) {
if (nums[i + left] != nums[i + start]) {
isSame = false;
break;
}
}
if (isSame) {
return left;
}
}
starts.add(left);
} else {
map.put(hash, new ArrayList<>());
map.get(hash).add(left);
}
}
return -1;
}

// average O(nlog(n))
public String longestDupSubstring(String s) {
int sLen = s.length();
int[] nums = new int[sLen];
// modulus for rolling hash function
// a large prime number that fits into 32-bit integer
final long modulus = (long) Math.pow(10, 9) + 7;
// base of numeric strings
final int a = 26;

for (int i = 0; i < sLen; i++) {
nums[i] = s.charAt(i) - 'a';
}
// binary search
int low = 1, high = sLen;
int resultLen = 0, resultIndex = -1;
while (low <= high) {
int mid = low + (high - low) / 2;
int index = searchDuplicateStrings(mid, a, modulus, sLen, nums);
if (index == -1) {
// no duplicate substrings with length of mid
high = mid - 1;
} else {
// duplicate substrings with length of mid exist
resultIndex = index;
resultLen = mid;
low = mid + 1;
}
}
return resultLen == 0 ? "" : s.substring(resultIndex, resultIndex + resultLen);
}
Author

Weihao Ye

Posted on

2022-01-10

Updated on

2022-01-10

Licensed under