## HW3 Sample Solution

1a)

To create a Bloom filter

fofnelements withkhashes:

f.bits← new array ofnzero bits

f.n←n

f.k←k

returnf

To insert a valuevin a Bloom filterf:

fori← 1 ..f.k

f.bits[H[i](v)modn] ← 1

To checkfora valuevin a Bloom filterf:

fori← 1 ..f.k

iff.bits[H[i](v)modn] ≠ 1

returnfalse

returntrue

1b) Assume that all patterns are of length *m*.

To find the first occurrence of one of a list

pofkpattern strings

(each of lengthm) in a target stringtof lengthn:

b ← new Bloom filter with appropriate parameters

forj ← 1 ..m

h'[j] ← H(a,n;p)

insert h'[j] into b

h← H(a,n;t[1]..t[m])

i ←m

repeat

ifhis in b

forj ← 1 ..k

ifh'[j] =h

ift[i-m+1]..t[i] =p[j]

returni-m+1

ifi =n

fail

h← (h- at*[i-m+1] + at*[*m*+1])mod*n*

i ← i + 1

1c) Assume that none of the patterns are contained in the
target. In this case, the algorithm will only execute any of
the code in the main loop other than the Bloom filter test
if there is some kind of false positive. When there is a
false positive due to Bloom filter error rather than hash
collision (1% of the time), the code will execute *p* hash
comparisons. When there is a false positive due to a hash
collision (2% of the time), it will almost always be unique:
thus on average *p/2* of the hash comparisons and one
full string comparison of length *m* will be executed. Thus,
the total number of comparisons will be something close to

```
T = n(1 + 0.01p + 0.02(p/2 + m))
```

Thus this solution is *O(n(p + m))* comparisons rather than
*O(npm)* (sort of, the terms we neglected matter to
asymptotic analysis, but the constants are vanishingly
small).

2a) To show that the amortized time complexity is constant
it is sufficient to show that each of the operations on the
array is itself constant. We count stores, references,
array indexing and comparisons as *O(1)*. It is easy to see
that each of the fundamental operations on the data
structure, being straight-line code with no recursion or
loops, is the sum of *O(1)* terms and is therefore *O(1)*.

2b) If you start the array at some initial size, you can
double its size every time a reference or store is made "off
the end". When doubling the array size, you will need to do
*O(n)* work to set up the new array to point at the stack
and to adjust the stack pointers to point back into the new
array: both of these can be done with a stack scan. There is
a pattern of exponentially increasing references that will
break this, making accesses *O(n ^{2})*, but in
practice you will run out of machine memory first.