C++ KMP Algorithm Implementation: Find Substring Efficiently
C++ Implementation of KMP Algorithm for Substring Search
The Knuth-Morris-Pratt (KMP) algorithm is a powerful technique for finding occurrences of a pattern within a text string. It's renowned for its efficiency, especially when dealing with repetitive patterns.
Here's a C++ implementation of the KMP algorithm:
#include <iostream>
#include <cstring>
using namespace std;
// Compute the next array
void getNext(char* p, int* next, int len) {
int i = 0, j = -1;
next[0] = -1;
while (i < len) {
if (j == -1 || p[i] == p[j]) {
i++;
j++;
next[i] = j;
} else {
j = next[j];
}
}
}
// KMP algorithm
int kmp(char* s, char* p) {
int slen = strlen(s);
int plen = strlen(p);
int* next = new int[plen];
getNext(p, next, plen);
int i = 0, j = 0;
while (i < slen && j < plen) {
if (j == -1 || s[i] == p[j]) {
i++;
j++;
} else {
j = next[j];
}
}
delete[] next;
if (j == plen) {
return i - j;
} else {
return -1;
}
}
int main() {
char s[] = 'hello world';
char p[] = 'world';
int pos = kmp(s, p);
if (pos == -1) {
cout << 'not found' << endl;
} else {
cout << 'found at position ' << pos << endl;
}
return 0;
}
Explanation:
-
getNext(char* p, int* next, int len): This function calculates thenextarray, which is crucial for the KMP algorithm. Thenextarray stores the length of the longest proper prefix ofpthat is also a suffix ofp[0...i]. This information helps to avoid unnecessary backtracking during the pattern matching process. -
kmp(char* s, char* p): This function implements the KMP algorithm itself. It iterates through the text stringsand the patternpusing thenextarray calculated previously. If a mismatch occurs, it uses thenextarray to efficiently move the pattern pointerjforward, thus optimizing the search.
Example:
In the main function, we search for the pattern 'world' within the text string 'hello world'. The output will be:
found at position 6
This means the pattern 'world' is found at position 6 (starting index) within the text string.
Key Benefits of KMP:
- Efficiency: KMP significantly reduces backtracking compared to naive string matching algorithms.
- Repetitive Patterns: It excels in handling patterns with repeating substrings.
Further Exploration:
- Explore the mathematical reasoning behind the
nextarray computation. - Implement KMP for other data types like strings with Unicode characters.
- Compare its performance with other substring search algorithms like Boyer-Moore.
原文地址: https://www.cveoy.top/t/topic/mOco 著作权归作者所有。请勿转载和采集!