C++ KMP Algorithm Implementation: Find Substring Efficiently

C++ Implementation of KMP Algorithm for Substring Search

The Knuth-Morris-Pratt (KMP) algorithm is a powerful technique for finding occurrences of a pattern within a text string. It's renowned for its efficiency, especially when dealing with repetitive patterns.

Here's a C++ implementation of the KMP algorithm:

#include <iostream>
#include <cstring>

using namespace std;

// Compute the next array
void getNext(char* p, int* next, int len) {
    int i = 0, j = -1;
    next[0] = -1;
    while (i < len) {
        if (j == -1 || p[i] == p[j]) {
            i++;
            j++;
            next[i] = j;
        } else {
            j = next[j];
        }
    }
}

// KMP algorithm
int kmp(char* s, char* p) {
    int slen = strlen(s);
    int plen = strlen(p);
    int* next = new int[plen];
    getNext(p, next, plen);
    int i = 0, j = 0;
    while (i < slen && j < plen) {
        if (j == -1 || s[i] == p[j]) {
            i++;
            j++;
        } else {
            j = next[j];
        }
    }
    delete[] next;
    if (j == plen) {
        return i - j;
    } else {
        return -1;
    }
}

int main() {
    char s[] = 'hello world';
    char p[] = 'world';
    int pos = kmp(s, p);
    if (pos == -1) {
        cout << 'not found' << endl;
    } else {
        cout << 'found at position ' << pos << endl;
    }
    return 0;
}

Explanation:

getNext(char* p, int* next, int len): This function calculates the next array, which is crucial for the KMP algorithm. The next array stores the length of the longest proper prefix of p that is also a suffix of p[0...i]. This information helps to avoid unnecessary backtracking during the pattern matching process.
kmp(char* s, char* p): This function implements the KMP algorithm itself. It iterates through the text string s and the pattern p using the next array calculated previously. If a mismatch occurs, it uses the next array to efficiently move the pattern pointer j forward, thus optimizing the search.

Example:

In the main function, we search for the pattern 'world' within the text string 'hello world'. The output will be:

found at position 6

This means the pattern 'world' is found at position 6 (starting index) within the text string.

Key Benefits of KMP:

Efficiency: KMP significantly reduces backtracking compared to naive string matching algorithms.
Repetitive Patterns: It excels in handling patterns with repeating substrings.

Further Exploration:

Explore the mathematical reasoning behind the next array computation.
Implement KMP for other data types like strings with Unicode characters.
Compare its performance with other substring search algorithms like Boyer-Moore.

C++ KMP Algorithm Implementation: Find Substring Efficiently