From 6014da5d896e69844bd3e49cbc9b9ce193e36769 Mon Sep 17 00:00:00 2001 From: V Pratheek Date: Mon, 7 Oct 2024 21:22:31 +0530 Subject: [PATCH 1/4] Added 5 New Algorithms under String Algorithms --- .../string-algorithms/ApostolicoGiancarlo.md | 100 +++++++++++++++ docs/algorithms/string-algorithms/BNDM.md | 108 ++++++++++++++++ docs/algorithms/string-algorithms/Bitap.md | 108 ++++++++++++++++ .../string-algorithms/CommentzWalter.md | 121 ++++++++++++++++++ docs/algorithms/string-algorithms/ShiftOr.md | 100 +++++++++++++++ .../string-algorithms/_category_.json | 8 ++ 6 files changed, 545 insertions(+) create mode 100644 docs/algorithms/string-algorithms/ApostolicoGiancarlo.md create mode 100644 docs/algorithms/string-algorithms/BNDM.md create mode 100644 docs/algorithms/string-algorithms/Bitap.md create mode 100644 docs/algorithms/string-algorithms/CommentzWalter.md create mode 100644 docs/algorithms/string-algorithms/ShiftOr.md create mode 100644 docs/algorithms/string-algorithms/_category_.json diff --git a/docs/algorithms/string-algorithms/ApostolicoGiancarlo.md b/docs/algorithms/string-algorithms/ApostolicoGiancarlo.md new file mode 100644 index 000000000..43ac87723 --- /dev/null +++ b/docs/algorithms/string-algorithms/ApostolicoGiancarlo.md @@ -0,0 +1,100 @@ +--- + +id: apostolico-giancarlo-algo +sidebar_position: 1 +title: Apostolico–Giancarlo Algorithm +sidebar_label: Apostolico–Giancarlo Algorithm + +--- + +### Definition: + +The Apostolico–Giancarlo algorithm is an advanced string matching algorithm designed for efficient searching of a pattern in a text by minimizing redundant comparisons. It utilizes the knowledge gained from previous mismatches to skip unnecessary character comparisons. + +### Characteristics: + +- **Efficient Skipping**: + - This algorithm reduces the number of comparisons by reusing information about previously matched characters and skipping over sections of text that cannot possibly match the pattern. + +- **Text Scanning**: + - It processes the text in a left-to-right fashion, scanning characters and performing checks to see if the pattern matches. + +- **Optimal Shifts**: + - Apostolico–Giancarlo optimizes the pattern shifting process after mismatches by using suffix information, ensuring fewer comparisons in cases of repeated patterns. + +- **Suboptimal on Small Patterns**: + - While efficient for longer patterns, its performance may not be as significant for smaller ones compared to simpler algorithms like Knuth-Morris-Pratt (KMP). + +### Time Complexity: + +- **Best Case: O(n/m)** + In the best-case scenario, the algorithm performs optimally, making only a fraction of comparisons proportional to the length of the text divided by the length of the pattern. + +- **Average Case: O(n)** + On average, the Apostolico–Giancarlo algorithm makes approximately linear scans through the text, resulting in efficient performance for most practical use cases. + +- **Worst Case: O(n * m)** + In the worst case, if the pattern has repeated sections that align poorly with the text, the algorithm could degrade to quadratic time complexity, where `n` is the text length and `m` is the pattern length. + +### Space Complexity: + +- **Space Complexity: O(m + n)** + The algorithm requires additional space for storing suffix and shift tables, but the space overhead is linear with respect to both the pattern and the text size. + +### C++ Implementation: + +**Iterative Approach** +```cpp +#include +#include +#include +using namespace std; + +void computeSuffixArray(const string& pattern, vector& suffixArray) { + int m = pattern.length(); + suffixArray[m - 1] = m; + for (int i = m - 2; i >= 0; --i) { + int j = i; + while (j >= 0 && pattern[j] == pattern[m - 1 - (i - j)]) { + --j; + } + suffixArray[i] = i - j; + } +} + +void apostolicoGiancarloSearch(const string& text, const string& pattern) { + int n = text.length(); + int m = pattern.length(); + if (m > n) return; + + vector suffixArray(m); + computeSuffixArray(pattern, suffixArray); + + int i = 0; + while (i <= n - m) { + int j = m - 1; + while (j >= 0 && pattern[j] == text[i + j]) { + --j; + } + if (j < 0) { + cout << "Pattern found at index " << i << endl; + i += suffixArray[0]; // Shift based on the suffix array + } else { + i += max(1, suffixArray[j]); + } + } +} + +int main() { + string text = "ABAAABCDABC"; + string pattern = "ABC"; + + apostolicoGiancarloSearch(text, pattern); + + return 0; +} +``` + +### Summary: + +The Apostolico–Giancarlo algorithm is an advanced string matching algorithm that leverages optimal shifts and pattern reuse to efficiently find patterns within text. Though it offers significant performance advantages for large and repetitive patterns, it is not always the first choice for small or simple patterns. \ No newline at end of file diff --git a/docs/algorithms/string-algorithms/BNDM.md b/docs/algorithms/string-algorithms/BNDM.md new file mode 100644 index 000000000..6adfc59b3 --- /dev/null +++ b/docs/algorithms/string-algorithms/BNDM.md @@ -0,0 +1,108 @@ +--- + +id: bndm-algo +sidebar_position: 2 +title: BNDM Algorithm +sidebar_label: BNDM Algorithm + +--- + +### Definition: + +The BNDM (Backward Nondeterministic Dawg Matching) algorithm is an efficient string matching algorithm derived from the Backward Dawg Matching (BDM) algorithm. It uses bitwise operations to simulate a nondeterministic automaton, matching the pattern in reverse order while scanning the text. + +### Characteristics: + +- **Bitwise Automaton Simulation**: + - BNDM represents the search pattern as a bitmask and simulates a nondeterministic automaton using bitwise operations. This reduces the number of character comparisons and enables efficient pattern matching. + +- **Reverse Pattern Matching**: + - The algorithm scans the pattern in reverse, comparing it against the text from right to left, which helps in faster identification of mismatches and skips. + +- **Efficient for Short Patterns**: + - BNDM is particularly efficient for short patterns, often outperforming other string matching algorithms like Boyer-Moore and Knuth-Morris-Pratt for small pattern sizes. + +- **Extension of BDM**: + - It improves upon the BDM algorithm by handling more general cases and providing better performance for non-trivial patterns. + +### Time Complexity: + +- **Best Case: O(n / w)** + In the best-case scenario, where `w` is the word size of the machine, the algorithm takes advantage of the word-level parallelism and makes few character comparisons. + +- **Average Case: O(n)** + On average, BNDM performs linear scans through the text, making it highly efficient for typical use cases, especially with short patterns. + +- **Worst Case: O(n * m)** + In the worst case, when the text and pattern have poor alignment, BNDM may require multiple full scans of the text, leading to quadratic complexity, where `n` is the text length and `m` is the pattern length. + +### Space Complexity: + +- **Space Complexity: O(m)** + The space complexity of BNDM is linear with respect to the pattern length, as the algorithm stores bitmasks and tables based on the pattern. + +### C++ Implementation: + +**Iterative Approach** +```cpp +#include +#include +#include +using namespace std; + +#define CHAR_SIZE 256 // Assuming extended ASCII + +void preprocessPattern(const string& pattern, vector& B) { + int m = pattern.length(); + for (int i = 0; i < CHAR_SIZE; ++i) { + B[i] = 0; + } + for (int i = 0; i < m; ++i) { + B[pattern[i]] |= (1 << i); + } +} + +void BNDMSearch(const string& text, const string& pattern) { + int n = text.length(); + int m = pattern.length(); + + if (m > n) return; + + vector B(CHAR_SIZE); + preprocessPattern(pattern, B); + + for (int i = 0; i <= n - m; ) { + int j = m - 1; + int mask = (1 << j); + int D = -1; // Bit mask for the current window + + while (D && j >= 0) { + D &= B[text[i + j]]; + if (D) { + --j; + D <<= 1; + } + } + + if (j < 0) { + cout << "Pattern found at index " << i << endl; + } + + // Shift the window based on the number of bits set in D + i += (m - __builtin_ctz(D)); + } +} + +int main() { + string text = "ABCABCABCD"; + string pattern = "ABC"; + + BNDMSearch(text, pattern); + + return 0; +} +``` + +### Summary: + +The BNDM (Backward Nondeterministic Dawg Matching) algorithm is an efficient and powerful string matching technique, especially for small patterns. It leverages bitwise operations and reverse pattern matching to minimize unnecessary character comparisons, making it highly suitable for short strings and quick searches. Its linear time complexity in average cases makes it a solid choice for string matching tasks in practical applications. \ No newline at end of file diff --git a/docs/algorithms/string-algorithms/Bitap.md b/docs/algorithms/string-algorithms/Bitap.md new file mode 100644 index 000000000..95e446626 --- /dev/null +++ b/docs/algorithms/string-algorithms/Bitap.md @@ -0,0 +1,108 @@ +--- + +id: bitap-algo +sidebar_position: 3 +title: Bitap Algorithm +sidebar_label: Bitap Algorithm + +--- + +### Definition: + +The Bitap algorithm, also known as the **Shift-Or**, **Shift-And**, or **Bitap for Approximate String Matching**, is a string matching algorithm that efficiently finds patterns in a text with possible mismatches or errors. The algorithm leverages bitwise operations to perform both exact and approximate string matching, making it ideal for fuzzy searching. + +### Characteristics: + +- **Bitwise Matching**: + - The Bitap algorithm uses bitwise operations to compare the pattern against the text. Each bit represents whether a character in the text matches a position in the pattern. + +- **Approximate Matching**: + - It supports approximate matching, where the pattern may have a certain number of mismatches, insertions, or deletions. This is especially useful in fields like text retrieval or DNA sequence matching. + +- **Pattern Masking**: + - The pattern is preprocessed into bitmasks, which are then used during the text scan to track how much of the pattern has been matched, including the handling of allowed errors. + +- **Linear Search with Errors**: + - The algorithm scans the text linearly, and the number of allowed errors (insertions, deletions, substitutions) is parameterized, allowing for flexible search criteria. + +### Time Complexity: + +- **Best Case: O(n / w)** + The best-case complexity is linear, as the algorithm processes `w` characters in parallel per word size `w` of the machine. + +- **Average Case: O(n)** + On average, the algorithm performs in linear time with respect to the text size `n`, especially for small patterns or when only a few errors are allowed. + +- **Worst Case: O(n * m)** + In the worst case, if the pattern is large or if there are many errors allowed, the time complexity can degrade to quadratic, where `m` is the pattern length. + +### Space Complexity: + +- **Space Complexity: O(m)** + The algorithm requires space proportional to the pattern length `m` for storing bitmasks, making it efficient in terms of memory usage. + +### C++ Implementation: + +**Approximate Matching with `k` Allowed Errors** +```cpp +#include +#include +#include +using namespace std; + +#define CHAR_SIZE 256 // Extended ASCII + +void preprocessPattern(const string& pattern, vector& patternMask) { + int m = pattern.size(); + for (int i = 0; i < CHAR_SIZE; ++i) { + patternMask[i] = ~0; + } + for (int i = 0; i < m; ++i) { + patternMask[pattern[i]] &= ~(1 << i); + } +} + +void bitapSearch(const string& text, const string& pattern, int maxErrors) { + int n = text.size(); + int m = pattern.size(); + + if (m > n) return; + + vector patternMask(CHAR_SIZE); + preprocessPattern(pattern, patternMask); + + vector R(maxErrors + 1, ~0); + for (int i = 0; i <= maxErrors; ++i) { + R[i] = ~1; // All bits set except the least significant bit + } + + for (int i = 0; i < n; ++i) { + int oldR_jMinus1 = ~0; + for (int j = 0; j <= maxErrors; ++j) { + int temp = R[j]; + R[j] = ((R[j] << 1) | patternMask[text[i]]); + if (j > 0) { + R[j] &= (oldR_jMinus1 << 1) | (R[j - 1] << 1) | oldR_jMinus1; + } + oldR_jMinus1 = temp; + } + if ((R[maxErrors] & (1 << (m - 1))) == 0) { + cout << "Pattern found at index " << i - m + 1 << " with " << maxErrors << " allowed errors." << endl; + } + } +} + +int main() { + string text = "this is a simple example"; + string pattern = "example"; + int maxErrors = 1; // Allow 1 error (insertion, deletion, or substitution) + + bitapSearch(text, pattern, maxErrors); + + return 0; +} +``` + +### Summary: + +The Bitap algorithm is a highly efficient string matching technique that supports approximate matching, making it ideal for applications requiring fuzzy search capabilities. Its use of bitwise operations allows for fast text scanning, while its flexibility in handling errors sets it apart from other exact matching algorithms. Despite its quadratic worst-case complexity, it performs well for small patterns and a limited number of errors. \ No newline at end of file diff --git a/docs/algorithms/string-algorithms/CommentzWalter.md b/docs/algorithms/string-algorithms/CommentzWalter.md new file mode 100644 index 000000000..1e67b275b --- /dev/null +++ b/docs/algorithms/string-algorithms/CommentzWalter.md @@ -0,0 +1,121 @@ +--- + +id: commentz-walter-algo +sidebar_position: 4 +title: Commentz-Walter Algorithm +sidebar_label: Commentz-Walter Algorithm + +--- + +### Definition: + +The Commentz-Walter algorithm is an efficient string matching algorithm that combines the ideas of the Boyer-Moore algorithm and the Aho-Corasick algorithm. It is designed for multi-pattern matching, making it useful when searching for multiple patterns within a large text. By employing efficient pattern shifts and automaton-based techniques, it minimizes unnecessary comparisons. + +### Characteristics: + +- **Multi-Pattern Matching**: + - The algorithm can handle multiple search patterns simultaneously, unlike other algorithms that focus on matching a single pattern. It builds an automaton for multiple patterns, enabling fast searching. + +- **Backward Matching**: + - Like the Boyer-Moore algorithm, the Commentz-Walter algorithm compares the pattern with the text from right to left, allowing it to skip over portions of the text when mismatches occur. + +- **Efficient Shift Table**: + - It utilizes a shift table, much like Boyer-Moore, to determine how far to skip when a mismatch occurs, improving the performance compared to simpler algorithms. + +- **Combines Automaton and Heuristic Approaches**: + - The algorithm uses an automaton (like in Aho-Corasick) to quickly match prefixes of the patterns and applies the Boyer-Moore heuristic to skip sections of the text, resulting in a powerful combination. + +### Time Complexity: + +- **Best Case: O(n / m)** + In the best case, the algorithm achieves sublinear performance due to the pattern skipping mechanism, where `n` is the text length and `m` is the minimum length of the patterns. + +- **Average Case: O(n)** + On average, the Commentz-Walter algorithm processes the text in linear time, making it highly efficient for most practical scenarios involving multiple patterns. + +- **Worst Case: O(n * m)** + The worst-case complexity occurs when the text and patterns have poor alignment, leading to more comparisons and resulting in quadratic time complexity. + +### Space Complexity: + +- **Space Complexity: O(m)** + The space complexity is linear with respect to the total length of the patterns, as the algorithm stores shift tables and automata structures for efficient pattern matching. + +### C++ Implementation: + +**Iterative Approach for Multi-Pattern Matching** +```cpp +#include +#include +#include +#include +using namespace std; + +#define CHAR_SIZE 256 // Extended ASCII + +// Preprocessing the pattern for building the bad character shift table +void preprocessBadCharacterShift(const vector& patterns, unordered_map& badCharShift, int maxPatternLength) { + for (int i = 0; i < CHAR_SIZE; ++i) { + badCharShift[i] = maxPatternLength; // Default shift is the length of the longest pattern + } + for (const auto& pattern : patterns) { + for (int i = 0; i < pattern.size(); ++i) { + badCharShift[pattern[i]] = maxPatternLength - i - 1; + } + } +} + +// Commentz-Walter Algorithm +void commentzWalterSearch(const string& text, const vector& patterns) { + int n = text.size(); + int maxPatternLength = 0; + + // Find the maximum length of all patterns + for (const auto& pattern : patterns) { + maxPatternLength = max(maxPatternLength, (int)pattern.size()); + } + + // Preprocess bad character shift + unordered_map badCharShift; + preprocessBadCharacterShift(patterns, badCharShift, maxPatternLength); + + int i = 0; + while (i <= n - maxPatternLength) { + bool matched = false; + + // Compare all patterns in reverse order + for (const auto& pattern : patterns) { + int m = pattern.size(); + int j = m - 1; + + while (j >= 0 && pattern[j] == text[i + j]) { + --j; + } + + if (j < 0) { + cout << "Pattern \"" << pattern << "\" found at index " << i << endl; + matched = true; + } + } + + if (!matched) { + i += badCharShift[text[i + maxPatternLength - 1]]; // Shift based on the bad character rule + } else { + i += maxPatternLength; // Shift by the full length if matched + } + } +} + +int main() { + string text = "ABCABCABCDABC"; + vector patterns = {"ABC", "ABCD"}; + + commentzWalterSearch(text, patterns); + + return 0; +} +``` + +### Summary: + +The Commentz-Walter algorithm is a hybrid string matching technique that efficiently handles multiple patterns by combining the strengths of the Boyer-Moore heuristic and the Aho-Corasick automaton. It performs well in scenarios requiring the simultaneous search for multiple patterns within a large text. While its worst-case performance can be quadratic, its average case is linear, making it highly effective for practical applications. \ No newline at end of file diff --git a/docs/algorithms/string-algorithms/ShiftOr.md b/docs/algorithms/string-algorithms/ShiftOr.md new file mode 100644 index 000000000..cf21c879d --- /dev/null +++ b/docs/algorithms/string-algorithms/ShiftOr.md @@ -0,0 +1,100 @@ +--- + +id: shift-or-algo +sidebar_position: 5 +title: Shift-Or Algorithm +sidebar_label: Shift-Or Algorithm + +--- + +### Definition: + +The Shift-Or algorithm, also known as the **Bitap algorithm for exact matching**, is a string matching technique that uses bitwise operations to perform efficient pattern searching. It is particularly suitable for exact matching tasks and handles fixed-length patterns by representing them as bitmasks. The algorithm processes the text and the pattern in parallel, allowing for quick and efficient searches. + +### Characteristics: + +- **Bitwise Matching**: + - The Shift-Or algorithm encodes the search pattern as a set of bitmasks, where each bit represents whether a character in the text matches a position in the pattern. This allows multiple pattern positions to be checked simultaneously using bitwise operations. + +- **Exact Matching**: + - The algorithm is designed for exact string matching, where no mismatches, insertions, or deletions are allowed. It performs efficiently for small patterns. + +- **Compact Representation**: + - Shift-Or uses bitwise shifting to represent pattern states, providing a compact and efficient approach to handling pattern matching. + +- **Linear Time Complexity**: + - The algorithm processes the input text linearly, making it highly efficient for exact matching tasks, particularly when the pattern length is small compared to the text length. + +### Time Complexity: + +- **Best Case: O(n / w)** + In the best case, where `w` is the word size of the machine, the algorithm processes multiple characters in parallel, leading to a faster search in practice. + +- **Average Case: O(n)** + On average, the algorithm performs in linear time with respect to the text length `n`, as it makes a single pass through the text. + +- **Worst Case: O(n)** + Even in the worst-case scenario, the Shift-Or algorithm maintains linear time complexity since it processes each character of the text once. + +### Space Complexity: + +- **Space Complexity: O(m)** + The algorithm requires space proportional to the pattern length `m` to store bitmasks, making it space-efficient for small patterns. + +### C++ Implementation: + +**Exact Matching** +```cpp +#include +#include +#include +using namespace std; + +#define CHAR_SIZE 256 // Extended ASCII + +void preprocessPattern(const string& pattern, vector& patternMask) { + int m = pattern.size(); + for (int i = 0; i < CHAR_SIZE; ++i) { + patternMask[i] = ~0; // Initialize all bits to 1 + } + for (int i = 0; i < m; ++i) { + patternMask[pattern[i]] &= ~(1 << i); // Set bitmask for the pattern + } +} + +void shiftOrSearch(const string& text, const string& pattern) { + int n = text.size(); + int m = pattern.size(); + + if (m > n) return; + + vector patternMask(CHAR_SIZE); + preprocessPattern(pattern, patternMask); + + int R = ~0; // All bits are initially set to 1 + int matchBit = 1 << (m - 1); // The bit that will indicate a match + + for (int i = 0; i < n; ++i) { + // Update the state + R = (R << 1) | patternMask[text[i]]; + + // If the matchBit is 0, a match is found + if ((R & matchBit) == 0) { + cout << "Pattern found at index " << i - m + 1 << endl; + } + } +} + +int main() { + string text = "abracadabra"; + string pattern = "abra"; + + shiftOrSearch(text, pattern); + + return 0; +} +``` + +### Summary: + +The Shift-Or algorithm is a highly efficient and compact exact string matching technique, using bitwise operations to process the text and the pattern in parallel. Its linear time complexity makes it ideal for exact matching tasks, especially for small patterns. With its ability to perform pattern matching using simple bitwise operations, the Shift-Or algorithm offers both speed and simplicity, making it a solid choice for exact string matching problems. \ No newline at end of file diff --git a/docs/algorithms/string-algorithms/_category_.json b/docs/algorithms/string-algorithms/_category_.json new file mode 100644 index 000000000..d87510339 --- /dev/null +++ b/docs/algorithms/string-algorithms/_category_.json @@ -0,0 +1,8 @@ +{ + "label": "String Algorithms", + "position": 3, + "link": { + "type": "generated-index", + "description": "Learn about some String Algorithms." + } + } \ No newline at end of file From eb522c06d8ebc54ba941d35064686cd91d32321c Mon Sep 17 00:00:00 2001 From: V Pratheek Date: Tue, 8 Oct 2024 17:49:32 +0530 Subject: [PATCH 2/4] Updated the names and contents as mentioned --- ...licoGiancarlo.md => apostolico-giancarlo-algorithm.md} | 8 ++++---- .../string-algorithms/{Bitap.md => bitap-algorithm.md} | 8 ++++---- .../string-algorithms/{BNDM.md => bndm-algorithm.md} | 8 ++++---- .../{CommentzWalter.md => commentz-walter-algorithm.md} | 8 ++++---- .../string-algorithms/{ShiftOr.md => shift-or-algorithm} | 8 ++++---- 5 files changed, 20 insertions(+), 20 deletions(-) rename docs/algorithms/string-algorithms/{ApostolicoGiancarlo.md => apostolico-giancarlo-algorithm.md} (96%) rename docs/algorithms/string-algorithms/{Bitap.md => bitap-algorithm.md} (97%) rename docs/algorithms/string-algorithms/{BNDM.md => bndm-algorithm.md} (96%) rename docs/algorithms/string-algorithms/{CommentzWalter.md => commentz-walter-algorithm.md} (97%) rename docs/algorithms/string-algorithms/{ShiftOr.md => shift-or-algorithm} (96%) diff --git a/docs/algorithms/string-algorithms/ApostolicoGiancarlo.md b/docs/algorithms/string-algorithms/apostolico-giancarlo-algorithm.md similarity index 96% rename from docs/algorithms/string-algorithms/ApostolicoGiancarlo.md rename to docs/algorithms/string-algorithms/apostolico-giancarlo-algorithm.md index 43ac87723..409e06097 100644 --- a/docs/algorithms/string-algorithms/ApostolicoGiancarlo.md +++ b/docs/algorithms/string-algorithms/apostolico-giancarlo-algorithm.md @@ -27,18 +27,18 @@ The Apostolico–Giancarlo algorithm is an advanced string matching algorithm de ### Time Complexity: -- **Best Case: O(n/m)** +- **Best Case: $O(n/m)$** In the best-case scenario, the algorithm performs optimally, making only a fraction of comparisons proportional to the length of the text divided by the length of the pattern. -- **Average Case: O(n)** +- **Average Case: $O(n)$** On average, the Apostolico–Giancarlo algorithm makes approximately linear scans through the text, resulting in efficient performance for most practical use cases. -- **Worst Case: O(n * m)** +- **Worst Case: $O(n * m)$** In the worst case, if the pattern has repeated sections that align poorly with the text, the algorithm could degrade to quadratic time complexity, where `n` is the text length and `m` is the pattern length. ### Space Complexity: -- **Space Complexity: O(m + n)** +- **Space Complexity: $O(m + n)$** The algorithm requires additional space for storing suffix and shift tables, but the space overhead is linear with respect to both the pattern and the text size. ### C++ Implementation: diff --git a/docs/algorithms/string-algorithms/Bitap.md b/docs/algorithms/string-algorithms/bitap-algorithm.md similarity index 97% rename from docs/algorithms/string-algorithms/Bitap.md rename to docs/algorithms/string-algorithms/bitap-algorithm.md index 95e446626..671bbe59c 100644 --- a/docs/algorithms/string-algorithms/Bitap.md +++ b/docs/algorithms/string-algorithms/bitap-algorithm.md @@ -27,18 +27,18 @@ The Bitap algorithm, also known as the **Shift-Or**, **Shift-And**, or **Bitap f ### Time Complexity: -- **Best Case: O(n / w)** +- **Best Case: $O(n / w)$** The best-case complexity is linear, as the algorithm processes `w` characters in parallel per word size `w` of the machine. -- **Average Case: O(n)** +- **Average Case: $O(n)$** On average, the algorithm performs in linear time with respect to the text size `n`, especially for small patterns or when only a few errors are allowed. -- **Worst Case: O(n * m)** +- **Worst Case: $O(n * m)$** In the worst case, if the pattern is large or if there are many errors allowed, the time complexity can degrade to quadratic, where `m` is the pattern length. ### Space Complexity: -- **Space Complexity: O(m)** +- **Space Complexity: $O(m)$** The algorithm requires space proportional to the pattern length `m` for storing bitmasks, making it efficient in terms of memory usage. ### C++ Implementation: diff --git a/docs/algorithms/string-algorithms/BNDM.md b/docs/algorithms/string-algorithms/bndm-algorithm.md similarity index 96% rename from docs/algorithms/string-algorithms/BNDM.md rename to docs/algorithms/string-algorithms/bndm-algorithm.md index 6adfc59b3..5dd6a1c8a 100644 --- a/docs/algorithms/string-algorithms/BNDM.md +++ b/docs/algorithms/string-algorithms/bndm-algorithm.md @@ -27,18 +27,18 @@ The BNDM (Backward Nondeterministic Dawg Matching) algorithm is an efficient str ### Time Complexity: -- **Best Case: O(n / w)** +- **Best Case: $O(n / w)$** In the best-case scenario, where `w` is the word size of the machine, the algorithm takes advantage of the word-level parallelism and makes few character comparisons. -- **Average Case: O(n)** +- **Average Case: $O(n)$** On average, BNDM performs linear scans through the text, making it highly efficient for typical use cases, especially with short patterns. -- **Worst Case: O(n * m)** +- **Worst Case: $O(n * m)$** In the worst case, when the text and pattern have poor alignment, BNDM may require multiple full scans of the text, leading to quadratic complexity, where `n` is the text length and `m` is the pattern length. ### Space Complexity: -- **Space Complexity: O(m)** +- **Space Complexity: $O(m)$** The space complexity of BNDM is linear with respect to the pattern length, as the algorithm stores bitmasks and tables based on the pattern. ### C++ Implementation: diff --git a/docs/algorithms/string-algorithms/CommentzWalter.md b/docs/algorithms/string-algorithms/commentz-walter-algorithm.md similarity index 97% rename from docs/algorithms/string-algorithms/CommentzWalter.md rename to docs/algorithms/string-algorithms/commentz-walter-algorithm.md index 1e67b275b..717b6a2c8 100644 --- a/docs/algorithms/string-algorithms/CommentzWalter.md +++ b/docs/algorithms/string-algorithms/commentz-walter-algorithm.md @@ -27,18 +27,18 @@ The Commentz-Walter algorithm is an efficient string matching algorithm that com ### Time Complexity: -- **Best Case: O(n / m)** +- **Best Case: $O(n / m)$** In the best case, the algorithm achieves sublinear performance due to the pattern skipping mechanism, where `n` is the text length and `m` is the minimum length of the patterns. -- **Average Case: O(n)** +- **Average Case: $O(n)$** On average, the Commentz-Walter algorithm processes the text in linear time, making it highly efficient for most practical scenarios involving multiple patterns. -- **Worst Case: O(n * m)** +- **Worst Case: $O(n * m)$** The worst-case complexity occurs when the text and patterns have poor alignment, leading to more comparisons and resulting in quadratic time complexity. ### Space Complexity: -- **Space Complexity: O(m)** +- **Space Complexity: $O(m)$** The space complexity is linear with respect to the total length of the patterns, as the algorithm stores shift tables and automata structures for efficient pattern matching. ### C++ Implementation: diff --git a/docs/algorithms/string-algorithms/ShiftOr.md b/docs/algorithms/string-algorithms/shift-or-algorithm similarity index 96% rename from docs/algorithms/string-algorithms/ShiftOr.md rename to docs/algorithms/string-algorithms/shift-or-algorithm index cf21c879d..ea374bd44 100644 --- a/docs/algorithms/string-algorithms/ShiftOr.md +++ b/docs/algorithms/string-algorithms/shift-or-algorithm @@ -27,18 +27,18 @@ The Shift-Or algorithm, also known as the **Bitap algorithm for exact matching** ### Time Complexity: -- **Best Case: O(n / w)** +- **Best Case: $O(n / w)$** In the best case, where `w` is the word size of the machine, the algorithm processes multiple characters in parallel, leading to a faster search in practice. -- **Average Case: O(n)** +- **Average Case: $O(n)$** On average, the algorithm performs in linear time with respect to the text length `n`, as it makes a single pass through the text. -- **Worst Case: O(n)** +- **Worst Case: $O(n)$** Even in the worst-case scenario, the Shift-Or algorithm maintains linear time complexity since it processes each character of the text once. ### Space Complexity: -- **Space Complexity: O(m)** +- **Space Complexity: $O(m)$** The algorithm requires space proportional to the pattern length `m` to store bitmasks, making it space-efficient for small patterns. ### C++ Implementation: From ef6c16a51bd2128a273a5af349cab8d7e6454fc2 Mon Sep 17 00:00:00 2001 From: V Pratheek Date: Tue, 8 Oct 2024 18:36:09 +0530 Subject: [PATCH 3/4] Final updates and changes as mentioned --- .../string-algorithms/apostolico-giancarlo-algorithm.md | 4 ++-- docs/algorithms/string-algorithms/bitap-algorithm.md | 4 ++-- docs/algorithms/string-algorithms/bndm-algorithm.md | 4 ++-- .../algorithms/string-algorithms/commentz-walter-algorithm.md | 4 ++-- .../{shift-or-algorithm => shift-or-algorithm.md} | 2 +- 5 files changed, 9 insertions(+), 9 deletions(-) rename docs/algorithms/string-algorithms/{shift-or-algorithm => shift-or-algorithm.md} (98%) diff --git a/docs/algorithms/string-algorithms/apostolico-giancarlo-algorithm.md b/docs/algorithms/string-algorithms/apostolico-giancarlo-algorithm.md index 409e06097..e490e1aa2 100644 --- a/docs/algorithms/string-algorithms/apostolico-giancarlo-algorithm.md +++ b/docs/algorithms/string-algorithms/apostolico-giancarlo-algorithm.md @@ -27,13 +27,13 @@ The Apostolico–Giancarlo algorithm is an advanced string matching algorithm de ### Time Complexity: -- **Best Case: $O(n/m)$** +- **Best Case: $O\left(\frac{n}{w}\right)$** In the best-case scenario, the algorithm performs optimally, making only a fraction of comparisons proportional to the length of the text divided by the length of the pattern. - **Average Case: $O(n)$** On average, the Apostolico–Giancarlo algorithm makes approximately linear scans through the text, resulting in efficient performance for most practical use cases. -- **Worst Case: $O(n * m)$** +- **Worst Case: $O(n \times m)$** In the worst case, if the pattern has repeated sections that align poorly with the text, the algorithm could degrade to quadratic time complexity, where `n` is the text length and `m` is the pattern length. ### Space Complexity: diff --git a/docs/algorithms/string-algorithms/bitap-algorithm.md b/docs/algorithms/string-algorithms/bitap-algorithm.md index 671bbe59c..fa2d59b5e 100644 --- a/docs/algorithms/string-algorithms/bitap-algorithm.md +++ b/docs/algorithms/string-algorithms/bitap-algorithm.md @@ -27,13 +27,13 @@ The Bitap algorithm, also known as the **Shift-Or**, **Shift-And**, or **Bitap f ### Time Complexity: -- **Best Case: $O(n / w)$** +- **Best Case: $O\left(\frac{n}{w}\right)$** The best-case complexity is linear, as the algorithm processes `w` characters in parallel per word size `w` of the machine. - **Average Case: $O(n)$** On average, the algorithm performs in linear time with respect to the text size `n`, especially for small patterns or when only a few errors are allowed. -- **Worst Case: $O(n * m)$** +- **Worst Case: $O(n \times m)$** In the worst case, if the pattern is large or if there are many errors allowed, the time complexity can degrade to quadratic, where `m` is the pattern length. ### Space Complexity: diff --git a/docs/algorithms/string-algorithms/bndm-algorithm.md b/docs/algorithms/string-algorithms/bndm-algorithm.md index 5dd6a1c8a..1401ec716 100644 --- a/docs/algorithms/string-algorithms/bndm-algorithm.md +++ b/docs/algorithms/string-algorithms/bndm-algorithm.md @@ -27,13 +27,13 @@ The BNDM (Backward Nondeterministic Dawg Matching) algorithm is an efficient str ### Time Complexity: -- **Best Case: $O(n / w)$** +- **Best Case: $O\left(\frac{n}{w}\right)$** In the best-case scenario, where `w` is the word size of the machine, the algorithm takes advantage of the word-level parallelism and makes few character comparisons. - **Average Case: $O(n)$** On average, BNDM performs linear scans through the text, making it highly efficient for typical use cases, especially with short patterns. -- **Worst Case: $O(n * m)$** +- **Worst Case: $O(n \times m)$** In the worst case, when the text and pattern have poor alignment, BNDM may require multiple full scans of the text, leading to quadratic complexity, where `n` is the text length and `m` is the pattern length. ### Space Complexity: diff --git a/docs/algorithms/string-algorithms/commentz-walter-algorithm.md b/docs/algorithms/string-algorithms/commentz-walter-algorithm.md index 717b6a2c8..de2b204b6 100644 --- a/docs/algorithms/string-algorithms/commentz-walter-algorithm.md +++ b/docs/algorithms/string-algorithms/commentz-walter-algorithm.md @@ -27,13 +27,13 @@ The Commentz-Walter algorithm is an efficient string matching algorithm that com ### Time Complexity: -- **Best Case: $O(n / m)$** +- **Best Case: $O\left(\frac{n}{m}\right)$** In the best case, the algorithm achieves sublinear performance due to the pattern skipping mechanism, where `n` is the text length and `m` is the minimum length of the patterns. - **Average Case: $O(n)$** On average, the Commentz-Walter algorithm processes the text in linear time, making it highly efficient for most practical scenarios involving multiple patterns. -- **Worst Case: $O(n * m)$** +- **Worst Case: $O(n \times m)$** The worst-case complexity occurs when the text and patterns have poor alignment, leading to more comparisons and resulting in quadratic time complexity. ### Space Complexity: diff --git a/docs/algorithms/string-algorithms/shift-or-algorithm b/docs/algorithms/string-algorithms/shift-or-algorithm.md similarity index 98% rename from docs/algorithms/string-algorithms/shift-or-algorithm rename to docs/algorithms/string-algorithms/shift-or-algorithm.md index ea374bd44..0c8cf1def 100644 --- a/docs/algorithms/string-algorithms/shift-or-algorithm +++ b/docs/algorithms/string-algorithms/shift-or-algorithm.md @@ -27,7 +27,7 @@ The Shift-Or algorithm, also known as the **Bitap algorithm for exact matching** ### Time Complexity: -- **Best Case: $O(n / w)$** +- **Best Case: $O\left(\frac{n}{w}\right)$** In the best case, where `w` is the word size of the machine, the algorithm processes multiple characters in parallel, leading to a faster search in practice. - **Average Case: $O(n)$** From a99d024e8316e1ecfe84edf04ef1f82735ee85ac Mon Sep 17 00:00:00 2001 From: V Pratheek Date: Tue, 8 Oct 2024 18:41:20 +0530 Subject: [PATCH 4/4] This is the final changed and updated algorithms. --- .../string-algorithms/apostolico-giancarlo-algorithm.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/algorithms/string-algorithms/apostolico-giancarlo-algorithm.md b/docs/algorithms/string-algorithms/apostolico-giancarlo-algorithm.md index e490e1aa2..a3d51e73f 100644 --- a/docs/algorithms/string-algorithms/apostolico-giancarlo-algorithm.md +++ b/docs/algorithms/string-algorithms/apostolico-giancarlo-algorithm.md @@ -27,7 +27,7 @@ The Apostolico–Giancarlo algorithm is an advanced string matching algorithm de ### Time Complexity: -- **Best Case: $O\left(\frac{n}{w}\right)$** +- **Best Case: $O\left(\frac{n}{m}\right)$** In the best-case scenario, the algorithm performs optimally, making only a fraction of comparisons proportional to the length of the text divided by the length of the pattern. - **Average Case: $O(n)$**