## Linear Sorting and Order Statistics

# Linear Sorting and Order Statistics

**Bart Massey**

### Review: Comparison Sorting is n lg n

Recall that comparison sorting requires a minimum of n lg n comparisons

- Building a decision tree to distinguish n! possible
orderings requires
*lg (n!) >= lg (n^n) = n lg n*comparisons

- Building a decision tree to distinguish n! possible
orderings requires
We are only counting comparisons here

- Recall that selection sort requires
*O(n)*swaps - Maybe we can find a more powerful operation than comparison?

- Recall that selection sort requires
Motivation: Comparison sorting is not "natural"

- Normally consider values of items as well as pairwise comparisons

### Preliminary: "Counting Sort"

Suppose that we are sorting objects whose keys are small non-negative integers.

Specifically, suppose that an array of size n is guaranteed to have no more than

*k(n)*keys in it, where*k*is in*O(n)*"Sort" the keys in linear time by just counting the number of occurrences of each, then spitting them back out in order

Can then use dynamic programming to compute the number of keys less than

*k*in*O(n)*timeCan then swap the actual elements into place

Stable sort

Note: disingenuous assumption about

*lg k*costs (but also for comp sort)Does constant-time array allocation help here?

### Preliminary: "Bucket Sort"

Suppose that we are sorting objects drawn uniform-random from

*1..n*Bucket sort is the obvious

*O(n)*expected-case sort:- Allocate an array of n "buckets"
- Dump keys around
*i*in the*i*-th bucket - Sort each bucket's entries some
*n^2*way - Output everything in order

Expected complexity (p. 202-204) is

*O(n)*

### LSD Radix Sort

Let's examine these ideas one more time:

- Counting sort will let us stably sort on small sets
- Let's talk about the set of
*digits*of a number - If we counting-sort rightmost digits of a set of numbers:
- We can use stability to preserve this ordering
- We can start sorting on the next-left digit

- This is least-significant-digit radix sort

What radix to use? largest power of

*2 < floor(lg n)*Memory efficiency and stuff is terrible (

*d*-digit number =*d*passes)

### MSD Radix Sort

Not in the book?

Bucket sort recursively from leftmost to rightmost digit

Still a linear algorithm

Easy to "stop early" and use a conventional sort

### Linear Time Order Statistics

Order Statistic = value at position

*i*in sorted sequence- Obvious way to compute this is
- sort the sequence
- look at the
*k*-th position

- Obvious way to compute this is
For some order statistics, can do something better and/or simpler

Obvious

*O(n)*algorithms for minimum (first), maximum (last)- Cute variant for simultaneous min/max:
- compare pairs, then compare larger and smaller against max / min
- special case for equal pairs
- goes from
*2n-2*to*floor(3n/2)-2*comparisons

- Cute variant for simultaneous min/max:
For any constant

*k*, can find*k*-th in*O(n)*by doing at most*k*passes

### Randomized Select

Basically quicksort, but only has to explore the partition in which the

*i*-th statistic liesAchieves

*O(n)*expected time for any*i*, when elements are distinct- p. 217-219;
*understand this proof*

- p. 217-219;
Not very satisfying: want

*O(n)*worst-case

### Linear Select

Modification of randomized select for distinct elements

*O(n)*worst-caseFairly magic: instead of picking the pivot randomly, pick it to be the recursive median-of-medians of groups of 5 values

Why 5? Well, 5 is the minimum odd number that makes the math come out; 7, 9 etc work too

Basic analysis idea: at each step in the recursion, at most

*(7n/10)+6*elements in the partition- recursion converges on amortized
*O(1)*(p. 222) - thus select is linear

- recursion converges on amortized

### Detail: Linear Select

```
To find the i-th element in a one-based array of n elements:
if n ≤ 5
sort a
return a[i]
b ← new array of n′ = ceil(n/5) elements
j′ ← 1
c ← new array of 5 elements
for j ← 1 to n - 4 by 5s
c ← a[j]..a[j+4]
b[j′] ← select third element from c
j′ ← j′ + 1
m ← n mod 5
c ← a[j]..a[m]
b[j′] ← select element m div 2 from c
m ← n′ div 2
p ← select element m from b
partition a around p
if i = m
return a[i]
if i < m
p ← select element i from a[1]..a[k]
return p
p ← select element i - m from a[m+1]..a[n]
return p
```

### Analysis of Select

Because the elements are distinct, each median-of-medians must be greater than the overall median by at most

*(1/2)(3n/5) = 3n/10*and less than by at most the same amount, except some fiddling for the tailThus, for large enough n, we get

`T[n] <= T[7n/10 + 6] + O(n)`

The Master Theorem doesn't quite apply, because "+ 6", so essentially re-prove for this case

End up with

*T[n]*in*O(n)*