HW3 Sample Solution


To create a Bloom filter f of n elements with k hashes:
    f.bits ← new array of n zero bits
    f.n ← n
    f.k ← k
    return f

To insert a value v in a Bloom filter f:
    for i ← 1 .. f.k
        f.bits[H[i](vmod n] ← 1

To check for a value v in a Bloom filter f:
    for i ← 1 .. f.k
        if f.bits[H[i](vmod n] ≠ 1
            return false
    return true

1b) Assume that all patterns are of length m.

To find the first occurrence of one of a list p of k pattern strings
(each of length m) in a target string t of length n:
    b ← new Bloom filter with appropriate parameters
    for j ← 1 .. m
        h'[j] ← H(a,n;p)
        insert h'[j] into b
    h ← H(a,n;t[1]..t[m])
    i ← m
        if h is in b
            for j ← 1 .. k
                if h'[j] = h
                    if t[i-m+1]..t[i] = p[j]
                        return i-m+1
        if i = n
        h ← (h - at*[i-m+1] + at*[*m*+1]) mod *n*
        i ← i + 1

1c) Assume that none of the patterns are contained in the target. In this case, the algorithm will only execute any of the code in the main loop other than the Bloom filter test if there is some kind of false positive. When there is a false positive due to Bloom filter error rather than hash collision (1% of the time), the code will execute p hash comparisons. When there is a false positive due to a hash collision (2% of the time), it will almost always be unique: thus on average p/2 of the hash comparisons and one full string comparison of length m will be executed. Thus, the total number of comparisons will be something close to

     T = n(1 + 0.01p + 0.02(p/2 + m))

Thus this solution is O(n(p + m)) comparisons rather than O(npm) (sort of, the terms we neglected matter to asymptotic analysis, but the constants are vanishingly small).

2a) To show that the amortized time complexity is constant it is sufficient to show that each of the operations on the array is itself constant. We count stores, references, array indexing and comparisons as O(1). It is easy to see that each of the fundamental operations on the data structure, being straight-line code with no recursion or loops, is the sum of O(1) terms and is therefore O(1).

2b) If you start the array at some initial size, you can double its size every time a reference or store is made "off the end". When doubling the array size, you will need to do O(n) work to set up the new array to point at the stack and to adjust the stack pointers to point back into the new array: both of these can be done with a stack scan. There is a pattern of exponentially increasing references that will break this, making accesses O(n2), but in practice you will run out of machine memory first.

Last modified: Friday, 23 May 2014, 7:49 PM