HW 3: String Matching, Amortized Algorithms

CS 584/684 Homework 3: String Matching, Amortized Algorithms

Bart Massey
Due Tuesday 13 May 2014 before class

Problems

The main way Rabin-Karp is used today is as a way of searching for multiple pattern strings in a given target string in parallel. The basic idea is to use a Bloom filter to check whether the current rolling hash value might be one of a set of target hashes. If the Bloom filter returns positive, the hash is compared directly with each hash in the set. If a hash matches, then the pattern string is checked against the target string.

a) Give pseudocode for the basic operations of a Bloom filter. (You will have to look Bloom filters up somewhere other than in the textbook.)

b) Give pseudocode for the multiple-pattern Rabin-Karp algorithm described above.

c) Assume that for a given search for p pattern strings of length m the Bloom filter has been chosen to give a false-positive probability of 0.01, and the rolling hash function's false-positive probability is 0.02. Discuss the efficiency of the algorithm of part (b) when searching a long target string.
For applications like dynamic programming, it is often necessary to make a large array, all of whose elements are initialized to some constant value k. This seems to need O(n) time for the initialization. However, it can be done in amortized time O(1).

The basic idea is to initialize elements on first reference. The elements are kept on a stack, and the array becomes an array of indices into that stack. If the stack index at an array location is invalid, or if the stack element at that index claims to be for a different array index, then we allocate and initialize a new stack entry.

There are three fundamental operations: allocate, reference, and store. The allocate operation sets up the data structure, given the maximum size n of the array and the constant k to be used for initialization. It returns the new array.

allocate(n, k):
    a.r ← new array of n uninited indexes
    a.s ← new array of n uninited (v, i) pairs
    a.k ← k
    a.sp ← 0
    return a

The reference operation takes an array a and an index i, and returns the value a[i].

reference(a, i):
    if a.r[i] ≥ 0 and a.r[i] < a.sp and a.s[a.r[i]].i = i
        return a.s[a.r[i]].v
    a.s[a.sp].v ← a.k
    a.s[a.sp].i ← i
    a.r[i] ← a.sp
    increment a.sp
    return a.k

The store operation take an array a, an index i, and a new value v, and ensures that a[i] = v.

store(a, i, v):
    reference(a, i)
    a.s[a.r[i]].v ← v

a) Show that the amortized time complexity of a sequence of operations is O(1) for an array of n elements. Be sure to describe what costs you are counting.

b) Briefly explain what would be necessary to remove the limit n on the array size while still maintaining O(1) amortized costs.

Format

Your writeup may be in plain text. It is preferable to produce PDF or HTML. Try to make your writeup clear and readable. Document formats such as Word or OpenOffice are unacceptable. LaTeX is awesome, and you should learn how to use it anyhow, so PDF produced from that is best of all.

Submission

Submission is through http://crossgrade.svcs.cs.pdx.edu.

Last modified: Tuesday, 29 April 2014, 9:58 AM