Given n integers, output the smallest k numbers.
To get the smallest k numbers, the simplest way is, we sort the array, then output the first k numbers.
As for the sorting algorithms,I guess quick-sort is a good choice ( as we known, the average time complexity of Quick-Sort is n*logn
), then we just output the first k numbers. So the total time complexity is: O(n * log n) + O(k)=O(n * log n)
.
The problem does not require the smallest k numbers to be sorted, neither are the last n-k numbers. So we do not need to sort all the numbers. We can use Selection-Sort or Swap-Sort:
- Traverse the n numbers, put the first k numbers into a new array, assume they are the smallest k numbers;
- For these k numbers, use Selection-Sort or Swap-Sort to get the biggest element kmax (need to traverse these k numbers, the time complexity is
O(k)
); - Traverse the rest n-k numbers. Suppose each time the new element is x, compare x with kmax: if
x < kmax
, replace kmax with x, then jump to step 2 and find the biggest element kmax; ifx >= kmax
, just continue traversing.
In each traverse, the time complexity is O(k)
if we update the array, O(1)
if we do not. So totally the time complexity is n*O(k) = O(n*k)
.
A better way is maintaining a max-heap with capacity k, it is similiar with the solution two:
- Store the first k numbers in a max-heap with capacity k, assume they are the smallest k numbers;
- Sort the heap, make k1<k2<...<kmax (kmax is the biggest number in the heap);
- Traverse the rest n-k numbers. Suppose each time the new element is x, compare x with kmax: if
x < kmax
, replace kmax with x and reheapify the heap (O(logk)
); Otherwise do nothing.
In this case the time complexity is: O(k+(n-k)*logk)=O(n*logk)
. This is because in the heap we just need O(logk)
to operate a search or reheapify (In solution two, to find the biggest number in an array we need O(k)
).
The book <Date Structures and Algorithm Analysis in C>, in Chapter7, Section7.7.6, introduced a Quick-Selection algorithm. In average case, its time complexity is O(n)
:
- Suppose the number set is S, in S we choose a number v as the pivot. Divide S-{v} into S1 and S2, like the Quick-Sort.
- If k <= |S1|, the k-th smallest number is in S1. In this case, return QuickSelect(S1, k).
- If k = 1 + |S1|, the pivot is the k-th smallest number, return it.
- Otherwise, the k-th smallest number is in S2. It is the (k - |S1| - 1) smallest number in S2, so we return QuickSelect(S2, k - |S1| - 1).
The time complexity is O(n).
You can find the code below:
//Quick-Select, put the k-th smallest number in a[k-1]
void QuickSelect( int a[], int k, int left, int right )
{
int i, j;
int pivot;
if( left + cutoff <= right )
{
pivot = median3( a, left, right );
//set the median as the pivot, this can avoid the wort case, to some extent
i = left; j = right - 1;
for( ; ; )
{
while( a[ ++i ] < pivot ){ }
while( a[ --j ] > pivot ){ }
if( i < j )
swap( &a[ i ], &a[ j ] );
else
break;
}
//reset the pivot
swap( &a[ i ], &a[ right - 1 ] );
if( k <= i )
QuickSelect( a, k, left, i - 1 );
else if( k > i + 1 )
QuickSelect( a, k, i + 1, right );
}
else
InsertSort( a + left, right - left + 1 );
}
This Quick-Select algorithm is kind of like the partition method in Quick-Sort. Store the N numbers in array S, set the 'median of median' as the pivot. Then divide array into Sa and Sb, Sa<=X<=Sb. If there are more than k elements in Sa, return the smallest k numbers in Sa, otherwise return all the elements in Sa and the smallest k-|Sa| numbers in Sb. In average case the time complexity is O(n)
.
Furthermore, the book <Introduction to Algorithms>, in Chapter9 Section9.3, introduced a Select Algorithm whose time complexity is O(n) in the worst case. I recommend you to have a look.
1.Google interview: input two arrays consisting of integers, sum each two of them we can get a new array, how to find the k-th smallest numbers in this new array?
Analysis:
‘Suppose the two arrays are A and B, they all have N numbers, sum each two of them we can get a new array C, which has N^2 elements.
Then consider these as N sorted arrays:
A[1]+B[1] <= A[1]+B[2] <= A[1]+B[3] <=…
A[2]+B[1] <= A[2]+B[2] <= A[2]+B[3] <=…
…
A[N]+B[1] <= A[N]+B[2] <= A[N]+B[3] <=…
The problem turns into finding the smallest k numbers in these N^2 sorted array’
2.Suppose there are two increasing arrays A and B, A=(a1,a2,...,ak), B=(b1,b2,...,bk). For 1<=i, j<=k, find the smallest k (ai+bj). The algorithm is required to be efficient.
3.Given an array a1,a2,a3,...,an and m triples, each triple represents a query. For each query (i, j, k), output the k-th smallest number in ai,ai+1,...,aj.