Kadane's Algorithm, KMP algorithm, Quick select algorithm.
Here are tree important algorithms for interviews and coding contests.
Algorithms that every programmer should know -
Hey Coders! 👋 Solving a problem is happiness, right? 🤗 What if we not only find a way to solve the problem but also understand, analyze, and improvise it? I call it double happiness🤗🤗.
The problems we are gonna solve today are asked in the top tech company interviews like Google, Facebook, and Microsoft. Why wait? Let’s get started with a widespread programming problem, the maximum subarray sum👍.
Maximum subarray sum
The maximum subarray sum is the largest sum obtained by contiguous elements in an array. If we’re given an array of integers and asked to find the subarray with the maximum sum, we use Kadane’s algorithm.
Why not figuring out if we really need an algorithm to find the subarray with maximum sum.
The first thought that pops up by looking at the question might be, considering the entire array.
Case 1:- Sum of the complete array:- This method doesn’t work because the array contains negative numbers as well.
It seems there is a problem with negative numbers🤔 . Then let’s try ignoring all the negative numbers.
Case 2:- Considering the longest subarray containing only positive numbers.
Oh no! 🙃 This isn’t working. It must be so, as we ignored the larger number(s) present after the negative number.
How about finding all the possible subarrays ( brute force approach) to obtain the maximum sum? It definitely works 😁. But it's time-consuming⌛ ( time complexity would be O(n2) ). Implementing Kadane’s algorithm would be a better idea💡; let's see how it works.
Working of Kadane's algorithm:-
If the given array is:-
The subarray with maximum sum is:-
Kadane's algorithm aims to look for all contiguous segments of the array with a positive-sum and keeps track👁️ of the maximum sum among all positive sums.
Let's understand the approach in detail.
Kadane's algorithm uses two variables to track the current sum and maximum sum.
- 1.For current sum:- We start from 0 and keep adding array elements one by one from left to right ( it's the sum of elements till that index ). If the value becomes negative, we ignore all the elements till that index and make the current sum zero.
Here if we consider [ 7, -13] as their sum is negative, they will reduce the overall sum of the subarray in which they are included. Hence we avoid such segments and assign the sum to 0.
- 2.The maximum sum stores the maximum value of all the possible sums. ( it's updated when we encounter greater sum value).
Kadane's algorithm uses an iterative 🔄 dynamic programming approach. Although there is more about dynamic programming, we can understand it as the process of remembering the results of simpler subproblem to save computation time later.
Kadane's algorithm pseudo-code:-
Initialize the current sum to 0 and the max sum either to INT_MIN or the first element of the array.
Iterate through the array
Add each array element to the current sum.
Update max sum if the current sum is greater than the max sum.
Make sure the current sum is non-negative (assign to 0 if it becomes negative ).
The drawback of Kadane’s algorithm:- Kadane's algorithm requires at least one positive number, so a complete negative array is an invalid input.
How do we overcome this❓
After understanding Kadane’s algorithm, overcoming this drawback is a cakewalk🎂🚶♂️.
If the array contains only negative numbers, then the lowest negative number is the answer.
In order to check if the array is all negative, we maintain a flag 🚩. On encounter of a positive number, the flag becomes false. If the flag remains true till the end, we will return the lowest negative number.
Here is the code 👇 that implements the above 👆 logic.
###Improvised version of Kadane's algorithm for maximum subarray sum def maxSubArraySum(array , size): max_sum = array current_sum = 0 only_negative = True negative_result = array for i in range(0, size): current_sum = current_sum + array[i] if current_sum < 0: if(only_negative): negative_result = max(current_sum, negative_result) current_sum = 0 else: only_negative = False max_sum = max(current_sum, max_sum) return max_sum arr=list(map(int,input().split())) n=len(arr) print(maxSubArraySum(arr,n))
Time Complexity:- O(n)
Space Complexity:- O(1)
7 -13 4 6 22 -3 9 -8
Kadane's algorithm uses two variables, one to store the current non-negative sum and the other to store the maximum sum till that point (index).
Let's look at an example and sift through each iteration 👀.
The maximum sum is updated by comparing it with all the positive values ➕ of the current sum. After reaching the end of the array, we return↩️ the maximum sum.
As Kadane's algorithm is a simple method that tracks the maximum at each position using the previously obtained value, we can find the maximum subarray sum with O(n) runtime.
Wanna search🔍 for patterns in a text? Wanna find all the occurrences of the pattern? Scroll down 🏂.
String matching algorithm
We use Knuth-Morris-Pratt (KMP) algorithm to solve a classical problem, the string matching problem. String matching algorithms play a crucial role in solving real-world🌐 problems. Some of its masterful ⚡ applications are bioinformatics and DNA sequencing🧬, plagiarism detection, spell checkers✅, spam filtering, information retrieval systems🖥️, etc.
Why KMP algorithm?
KMP algorithm is the first linear time complexity algorithm for string matching. Hence it's preferred when the search string is larger or for processing large files📂.
The KMP algorithm guarantees linear worst-case complexity⏱️ ( fault resistant for accidental inputs).
To better understand the importance of the KMP, we will first look at the brute force approach.
Brute force approach:-
Brute-force string matching compares the pattern with all substrings of the text. Those comparisons between substring and pattern proceed character by character.
In the brute-force approach, a window with the same length as a pattern moves over the text. Though this algorithm requires no preprocessing and extra space, too many comparisons make it slow🦥, thereby making the worst-case time complexity O(m(n-m+1)).
( As the total number of comparisons are n-m+1 and each comparison checks for a pattern match, which is of length m, Hence the complexity turns out to be m(n-m+1) )
We are going to reduce the complexity from O(mn) to O(n) using the KMP algorithm, exciting right💃? Let's proceed.
Working of KMP algorithm:-
The idea of KMP algorithm:-
The KMP algorithm reduces the runtime by figuring out the text parts known to match, which helps skip⏩ a few comparisons.
KMP algorithm consists of 2 parts:-
We generate a prefix array to find matches within the given pattern. Here, finding a match means checking if the characters match the pattern's prefix.
Let's understand how to generate a prefix array with an example.
Initially, the prefix array starts with 0, so for ‘A’ at 1st position, the value is 0. Coming to the 2nd position, we just carry forward the previous prefix value since there is no match⛔. And for the 3rd position, we not only carry forward the previous value but also increment it as it matches with the prefix ( as the 3rd character matches with the 1st )
Basically, the prefix arrays store the number of characters that match the pattern's prefix till that point. This analysis of the pattern before processing⚙️ helps to reduce iterations.
Searching🔎 is to find the occurrences of the pattern in the text. The advantage of KMP is that we can skip ⏩ a few iterations by using data in the prefix array.
Here's a diagram to show the flow of searching in the KMP algorithm
Meaning of the pattern pointer ACC to prefix array:- For instance, if we are at index ‘i’ of the pattern, we assign prefixArray[i-1] to the pattern pointer.
Don't worry if all of this seems complicated😅. Let's understand by considering the example where the pattern is "ABA" and the text is "ABAABA." Here, we start by comparing the first three characters and find that it is a perfect match.
Since we already know that the 3rd character of the text and the 1st character of the pattern are the same ( both are 'A' ). Instead of comparing them again, we start comparing the next characters😎.
KMP algorithm pseudo-code:-
- Start the prefix array with 0.
- Store the number of matches in the pattern with its prefix.
- If a match occurs increment text, pattern pointers
- If the pattern reaches the end
- Count occurrence and change pattern pointer according to prefix array.
- Else, if a mismatch occurred
- If the pattern pointer is not in the beginning, change it according to the prefix array.
- If the pattern pointer is in the beginning, increment the text pointer.
For a clear and complete idea, let’s reconsider the pattern “ABA” whose prefix array is [0, 0, 1] in a text.
We keep incrementing the text and pattern pointers as the characters are matching. Once we reach the end of the pattern🔚, since the pattern is found, we increment occurrence and skip one iteration 😃 ( as prefixArray = 1 ).
A mismatch occurred at the 2nd position⚠️. We assign the pattern pointer to the previous prefix array value ( patternPointer = prefixArray ) . As the new pattern pointer value is 0, we can’t skip any iterations😐.
We increase the occurrence count by one as the pattern is found.
Once we reach the end of the text, we stop🛑 iterating. And return the number of occurrences (return 2).
KMP works based on the degenerating property of the pattern. Degenerating property means the detection of multiple occurrences of a pattern in a text. By using this property, this algorithm will help to 🕵️♂️ all the occurrences of the pattern. Here partial matching doesn’t count; the pattern should be a perfect match👌 ( the entire pattern should be present in the string ).
In preprocessing⏪, the analysis of the pattern is kept ready, which is helpful to reduce iterations while searching.
### python3 code for KMP algorithm text = input() pattern = input() m = len(pattern) n = len(text) prefixArray = [0 for i in range(m)] def preprocessing(pattern, n, prefixArray): i = 1 patternPointer = 0 while (i < n): if pattern[i] == pattern[patternPointer]: patternPointer += 1 prefixArray[i] = patternPointer i += 1 else: if patternPointer != 0: patternPointer = prefixArray[patternPointer-1] else: prefixArray[i] = 0 i += 1 def searching(pattern,text): textPointer = 0 patternPointer = 0 occur_count = 0 preprocessing(pattern, m, prefixArray) while (textPointer < n): if text[textPointer] == pattern[patternPointer]: textPointer += 1 patternPointer += 1 if (patternPointer == m): occur_count += 1 patternPointer = prefixArray[patternPointer - 1] elif (textPointer < n and text[textPointer] != pattern[patternPointer]): if (patternPointer != 0): patternPointer = prefixArray[patternPointer - 1] else: textPointer += 1 return occur_count print(searching(pattern,text))
Time Complexity:- O(n)
Space Complexity:- O(m)
The KMP algorithm uses preprocessing to find all the pattern occurrences in the text in linear time.
KMP algorithm is one of the most widely used search algorithms due to its worst-case fault-resistant nature and efficiency in handling large files.
Yay! We understood a challenging algorithm🏁. I guess mastering the upcoming, quick select algorithm would be effortless😇.
Quick select algorithm
K-th smallest element in an unordered list
We use the quick select algorithm to find the kth smallest element in an unordered list. Here the kth smallest element is the element present at the kth position after sorting the array.
To show you what I mean, here is an example:
Array = [ 4, 7, 2 ] here the 1st smallest is 2, 2nd smallest is 4, and 3rd smallest is 7.
The quick select algorithm is similar to the Quicksort algorithm. Before we jump into the quick select algorithm, let’s take a glance at quicksort.
Quicksort:- As the name suggests, quicksort is significantly faster in practice than other O(nlogn) algorithms. Quicksort is a divide-and-conquer algorithm. It separates the list of elements into two parts and then sorts them recursively.
The separation happens based on the value of 'pivot' ( the comparison measure⚖️ ), in such a way that all the elements on the left side are less than the pivot, and on the right side, elements are greater than the pivot.
The quick select algorithm takes its roots from the quicksort algorithm. It uses the same divide and conquer technique but selects and iterates only through the subarray containing the kth smallest element.
Importance of quick select algorithm:-p
Like quicksort, quick select is fast in practice⏳. ( though the worst-case performance is poor 😢).
It's an in-place algorithm ( doesn’t use extra space )
- Quicksort is used in solving problems like:-
- Finding the median.
- Finding kth minimum or maximum element.
- Pivoting:- Bring the pivot to its appropriate position.
- Select the part that contains the kth smallest element.
- Select left if the index of pivot is greater than k.
- Else select right.
- Repeat these steps till we find the kth smallest element.
Quick select repeats🔂 the two steps until the kth element is found. They are swapping ( placing pivot at correct location ), partitioning ( choosing one side of pivot element ).
Example of Quick select algorithm implementation.
This step is called pivoting. (placing pivot element at correct position)
We select the left part of pivot (left part of 18) as pivot index ( 6 ) is greater than k-1 ( 3 ).
After placing the pivot element ( 3 ) at the correct position ( by swapping with 7 ), we select the right part of the pivot (as pivot index (0) is less than k-1 (3) ). Again we assign the last element to pivot.
Now, we place the new pivot element (7) at the appropriate position. Since the pivot index is placed at the kth position ( at 4th position ), we return the pivot element (7 is the answer).
Problem Statement:- Given an array of integers, find the kth smallest element.
The idea of the quick select algorithm:-
Instead of recurring for both sides (after finding pivot), the quick select algorithm recurs only for the part that contains the kth smallest element. The logic is simple: We recur for the left part⬅️ if the partitioned element index is more than k. If the index is k, we return↩️ ( found kth element). If the index is less than k, then we recur for the right part➡️.
### Python3 program of quick select def partition(arr, l, r): pivot = arr[r] pivotIndex = l for j in range(l, r): if arr[j] <= pivot: arr[pivotIndex], arr[j] = arr[j], arr[pivotIndex] pivotIndex += 1 arr[pivotIndex], arr[r] = arr[r], arr[pivotIndex] return pivotIndex def kthSmallest(arr, l, r, k): if (k > 0 and k <= r - l + 1): pivotIndex = partition(arr, l, r) if (pivotIndex - l == k - 1): return arr[pivotIndex] if (pivotIndex - l > k - 1): return kthSmallest(arr, l, pivotIndex - 1, k) return kthSmallest(arr, pivotIndex + 1, r, k - pivotIndex + l - 1) print("Index out of bound") arr = list(map(int,input().split())) n = len(arr) k = int(input()) print(kthSmallest(arr, 0, n - 1, k))
Worst:- O(n ^2)
Space Complexity:- O(1)
7 13 4 9 24 6 3 18
The recurrence relation of quicksort is
T(n) = n + 2T(n/2)
making its complexity O(n logn). While that of quick select is
T(n) = n + T(n/2).
The quick select algorithm uses a linear number of comparisons (on average) thus is more efficient than the sorting method. Selection after pivoting makes the quick select algorithm faster and more efficient than quicksort.
I hope you found the blog helpful. Keep learning with us 👨🎓👩🎓. Check out our latest videos and projects on our YouTube channel.