Strings and For Loops

Bart Massey 2013-01-31

Review: Values and Expressions

  • Values can be given by constants: 0, -17.3, True, "", "hello".

  • Expressions can be made by combining values with operators: 1 + 3 * 5.0, False and (True or False)

  • Expressions have whatever value they evaluate to; can be used wherever a value is needed

  • Values can be stored in variables: x = 3 + 8, y = False or False, z = "hello"

  • Variables can be referenced in expressions: x, 1 + x, y and True, z

  • Functions are another kind of operator that combines values to produce a value: int("3"), input("prompt? ")

Review: Types and Statements

  • Values have types: int, float, string, bool

  • Operators usually require and produce values of specific type: 3 + "hello" doesn't work. This is often true of functions as well: sqrt("5") fails, for example.

  • Operators can produce values of different type than they accept: 2 ⟨ 3, float("3")

  • Sometimes a value's type is "promoted". Mostly, ints may be promote to floats: sqrt(5) works, since it is interpreted as sqrt(5.0)

  • Control statements require values of specific type

    • if ⟨bool⟩:
    • elif ⟨bool⟩:
    • while ⟨bool⟩:

The string Type

  • So far, int, float and bool seem to have lots of operators and functions; string doesn't seem to have so much.

  • It turns out that + can be a string operator as well, "concatenation": "hello" + "world" == "helloworld" .

  • Also, * can take a string on the left and an int on the right and produce a string that is a "repetition": "hello" * 3 == "hellohellohello" .

  • But strings seem to have "structure": they look visually like they are made of characters. Treating them as atomic blobs sometimes isn't what you want.

  • Actually, strings are "sequences" of characters, and thus structured values. Any one-character string constant is also a character constant (???).

strings As Sequences

  • How do we reference a specific character in a string? With the [] operator: "hello"[0] == 'h'

    • This, by the way, is the main reason why we start counting at 0 in Python.
  • Can we assign a specific character in a string? Nope. "hello"[0] = 'j' does not work.

  • The [] operator is more versatile than it appears:

    • Negative indices count from the right: "hello"[-1] == 'o'

    • "Slices" grab subsequences / substrings via [:] operator: "hello"[1:3] == "el", "hello"[1:-2] == "el", "hello"[1:] == "ello", "hello"[:2] == "he", "hello"[:] == "hello"

Chopping Up and Pasting Together Strings

  • How do we make "jello" from "hello"? We now have enough machinery to do it: 'j' + "hello"[1:] == "jello"

  • Pretty common to want to process strings "character at a time". For this it's sometimes handy to know the number of characters in the string: len("hello") == 5

  • Example: put dots between every character in a string

    s = input("string to dotify? ")
    dot_s = s[0]
    i = 1
    while i < len(s):
        dot_s = dot_s + '.' + s[i]
        i += 1
    print(dot_s)
    

Aside: Character Codes

  • Characters are represented inside the computer using a code called Unicode, which gives a number to every possible character in the world (~100000).

    • Originally, was ASCII, which gave a number to every common typewriter character (~100).

    • Unicode is a superset of ASCII.

  • Can find out the code of a character with ord(): ord('h') == 104, ord('⁋') == 8267

  • Can make new characters with chr(): chr(8267) == '⁋'

  • Unicode is semi-sane: 'a' through 'z', 'A' through 'Z' and '0' through '9' are all together in order.

  • "Non-printing" characters (also "combining characters" etc.)

For Loops

  • The "pattern" of the previous program is really common: Set a loop control variable to starting value, then increase by one until it gets to ending value.

    • "iteration": many programs do little else.
  • The "for" loop captures this pattern in an easier to read and more reliable form:

    s = input("string to dotify? ")
    dot_s = s[0]
    for i in range(1, len(s)):
        dot_s = dot_s + '.' + s[i]
    print(dot_s)
    
    • Cannot forget to initialize the loop variable.
    • Cannot forget to increment the loop variable.
  • The syntax is a little weird (and unique to Python)

More About the range() Function

  • range(9) is the same as range(0, 9) .

  • range(0, 9, 2) hits 0, 2, 4, 6 and 8.

  • range(9, 0, -1) is sometimes quite useful, but watch for the boundary cases.

  • Note that range starts with the initial value, but finishes just before the final value: range(0, 3) hits 0, 1 and 2.

The range() Function Produces a Sequence (sort of)

  • Strings are cool because sequences are cool.

  • It turns out that Python also lets you have sequences of other things; for example, ints.

    • list(range(1, 5)) == [1, 2, 3, 4]

    • Why list()? Well, because range() actually produces a wacky object called a "generator" that can produce a list. Ow.

  • All the operators you learned for strings (sequences of characters) work for sequences of ints.

    • [1, 2, 3, 4][1 : -1] == [2, 3]

for and Sequences

  • The for loop just sets the loop control variable to each element of a sequence in turn. We write

    s = input("string to dotify? ")
    dot_s = s[0]
    for c in s[1:]:
        dot_s = dot_s + '.' + c
    print(dot_s)
    

More About sequences

  • Python doesn't actually care what type of values you put in its sequences: [1, 2.5, "hello", [1, 2, "goodbye"]] is a valid sequence.

    • However, these kinds of sequences tend to be less useful.
  • Can we change an element of an arbitrary sequence?

    x = ['a', 'b', 'c']
    x[0] = '!'
    
    • Yes, this works fine, and now x == ['a', '!', 'c'] .

    • So why can't we change our strings? Because strings are magic sequences of characters (cause Python is stupid sometimes): they are "immutable".

      • There's a workaround, but it's too ugly to show.

Tuples

  • Like lists, but with immutable elements. Different syntax: (1, 2) etc.

  • Singleton tuple is (1,) because (1) was already taken.

  • Not terribly important yet, but you should know they exist.

Lots Of Material!

  • You need to carry these ideas home and try them out. Tonight!

  • Sample problems:

    • Obfuscation: Input a string and then print that string with every character with an even character code changed to '.' . Is the string still readable?

    • Input a string and then print all subsequences of three consecutive characters from that string. These are called "trigrams" and are used in cryptography.

Last modified: Friday, 1 February 2013, 12:42 AM