# Strings and For Loops

Bart Massey 2013-01-31

## Review: Values and Expressions

• Values can be given by constants: 0, -17.3, True, "", "hello".

• Expressions can be made by combining values with operators: 1 + 3 * 5.0, False and (True or False)

• Expressions have whatever value they evaluate to; can be used wherever a value is needed

• Values can be stored in variables: x = 3 + 8, y = False or False, z = "hello"

• Variables can be referenced in expressions: x, 1 + x, y and True, z

• Functions are another kind of operator that combines values to produce a value: int("3"), input("prompt? ")

## Review: Types and Statements

• Values have types: int, float, string, bool

• Operators usually require and produce values of specific type: 3 + "hello" doesn't work. This is often true of functions as well: sqrt("5") fails, for example.

• Operators can produce values of different type than they accept: 2 &langle; 3, float("3")

• Sometimes a value's type is "promoted". Mostly, ints may be promote to floats: sqrt(5) works, since it is interpreted as sqrt(5.0)

• Control statements require values of specific type

• if ⟨bool⟩:
• elif ⟨bool⟩:
• while ⟨bool⟩:

## The string Type

• So far, int, float and bool seem to have lots of operators and functions; string doesn't seem to have so much.

• It turns out that + can be a string operator as well, "concatenation": "hello" + "world" == "helloworld" .

• Also, * can take a string on the left and an int on the right and produce a string that is a "repetition": "hello" * 3 == "hellohellohello" .

• But strings seem to have "structure": they look visually like they are made of characters. Treating them as atomic blobs sometimes isn't what you want.

• Actually, strings are "sequences" of characters, and thus structured values. Any one-character string constant is also a character constant (???).

## strings As Sequences

• How do we reference a specific character in a string? With the [] operator: "hello" == 'h'

• This, by the way, is the main reason why we start counting at 0 in Python.
• Can we assign a specific character in a string? Nope. "hello" = 'j' does not work.

• The [] operator is more versatile than it appears:

• Negative indices count from the right: "hello"[-1] == 'o'

• "Slices" grab subsequences / substrings via [:] operator: "hello"[1:3] == "el", "hello"[1:-2] == "el", "hello"[1:] == "ello", "hello"[:2] == "he", "hello"[:] == "hello"

## Chopping Up and Pasting Together Strings

• How do we make "jello" from "hello"? We now have enough machinery to do it: 'j' + "hello"[1:] == "jello"

• Pretty common to want to process strings "character at a time". For this it's sometimes handy to know the number of characters in the string: len("hello") == 5

• Example: put dots between every character in a string

s = input("string to dotify? ")
dot_s = s
i = 1
while i < len(s):
dot_s = dot_s + '.' + s[i]
i += 1
print(dot_s)


## Aside: Character Codes

• Characters are represented inside the computer using a code called Unicode, which gives a number to every possible character in the world (~100000).

• Originally, was ASCII, which gave a number to every common typewriter character (~100).

• Unicode is a superset of ASCII.

• Can find out the code of a character with ord(): ord('h') == 104, ord('⁋') == 8267

• Can make new characters with chr(): chr(8267) == '⁋'

• Unicode is semi-sane: 'a' through 'z', 'A' through 'Z' and '0' through '9' are all together in order.

• "Non-printing" characters (also "combining characters" etc.)

## For Loops

• The "pattern" of the previous program is really common: Set a loop control variable to starting value, then increase by one until it gets to ending value.

• "iteration": many programs do little else.
• The "for" loop captures this pattern in an easier to read and more reliable form:

s = input("string to dotify? ")
dot_s = s
for i in range(1, len(s)):
dot_s = dot_s + '.' + s[i]
print(dot_s)

• Cannot forget to initialize the loop variable.
• Cannot forget to increment the loop variable.
• The syntax is a little weird (and unique to Python)

## More About the range() Function

• range(9) is the same as range(0, 9) .

• range(0, 9, 2) hits 0, 2, 4, 6 and 8.

• range(9, 0, -1) is sometimes quite useful, but watch for the boundary cases.

• Note that range starts with the initial value, but finishes just before the final value: range(0, 3) hits 0, 1 and 2.

## The range() Function Produces a Sequence (sort of)

• Strings are cool because sequences are cool.

• It turns out that Python also lets you have sequences of other things; for example, ints.

• list(range(1, 5)) == [1, 2, 3, 4]

• Why list()? Well, because range() actually produces a wacky object called a "generator" that can produce a list. Ow.

• All the operators you learned for strings (sequences of characters) work for sequences of ints.

• [1, 2, 3, 4][1 : -1] == [2, 3]

## for and Sequences

• The for loop just sets the loop control variable to each element of a sequence in turn. We write

s = input("string to dotify? ")
dot_s = s
for c in s[1:]:
dot_s = dot_s + '.' + c
print(dot_s)


• Python doesn't actually care what type of values you put in its sequences: [1, 2.5, "hello", [1, 2, "goodbye"]] is a valid sequence.

• However, these kinds of sequences tend to be less useful.
• Can we change an element of an arbitrary sequence?

x = ['a', 'b', 'c']
x = '!'

• Yes, this works fine, and now x == ['a', '!', 'c'] .

• So why can't we change our strings? Because strings are magic sequences of characters (cause Python is stupid sometimes): they are "immutable".

• There's a workaround, but it's too ugly to show.

## Tuples

• Like lists, but with immutable elements. Different syntax: (1, 2) etc.

• Singleton tuple is (1,) because (1) was already taken.

• Not terribly important yet, but you should know they exist.

## Lots Of Material!

• You need to carry these ideas home and try them out. Tonight!

• Sample problems:

• Obfuscation: Input a string and then print that string with every character with an even character code changed to '.' . Is the string still readable?

• Input a string and then print all subsequences of three consecutive characters from that string. These are called "trigrams" and are used in cryptography.