Working with Strings#
Learning Objectives#
- Explain the similiarities between strings and lists in Python.
- Write code to retrieve a substring using slicing.
- Write code to join (concatenate) and repeat strings.
- Use the len function to retrieve the length of a string.
- Use the count, startswith, endswith, find, replace, lower, upper, strip, lstrip, rstrip, and split string methods.
Overview of Working with Strings#
You’re probably wondering why we are revisiting strings at the end of this unit. The reason is that, in many ways, Python strings behave like lists. You can slice them, join them using the + operator, duplicate them using the * operator – just like lists! We waited until after we had taught you about lists so that you could easily apply cool list features to strings.
We want you to become proficient with strings. Text data are very common in data analytics tasks and you will often find yourself cleaning textual data. The more comfortable you are working with strings, the easier your life will be.
Strings as Lists of Characters#
Coneptually, a string is just a list of characters. To see this, consider the following code and its output:
s = 'Beer is bread in a glass.'
print(list(s))
['B', 'e', 'e', 'r', ' ', 'i', 's', ' ', 'b', 'r', 'e', 'a', 'd', ' ', 'i', 'n', ' ', 'a', ' ', 'g', 'l', 'a', 's', 's', '.']
In the above, we used Python’s built-in list
function to split an ordinary string into a list of single-character strings. Note that you don’t have to do this every time you work with a string. Python automatically does this “under the hood” whenever you slice or manipulate a string.
In the rest of this section, we will show you more parallels between lists and strings.
Extracting Pieces of Strings#
Extracting a single character from a string#
To extract a single character from a string, use square brackets [ ]. Inside square brackets, put a number indicating the index of the character you want. The index 0 corresponds to the first character, 1 to the second character, and so on. If the string has length \(n\), then the index \(n-1\) corresponds to the last character of the string. You can also use negative indexes: -1 is the index of the last character, -2 is the index of the second-to-last character. And so on.
Here is an example:
s = 'HELLO'
for i in range(len(s)):
print(f's[{i}]: {s[i]}')
print()
for i in range(-1, -len(s)-1, -1):
print(f's[{i}]: {s[i]}')
s[0]: H
s[1]: E
s[2]: L
s[3]: L
s[4]: O
s[-1]: O
s[-2]: L
s[-3]: L
s[-4]: E
s[-5]: H
Notice that, in this case, strings behave exactly like lists.
It is also possible to extract a character from a string without using a variable. For example:
'HELLO'[0]
'H'
Slicing strings#
In Excel, you can extract a piece of a string using the SUBSTR function. In Python, you can use slicing. We find that slicing is very easy and intuitive, and hope you will agree.
Here are some examples.
s = 'ACCY 570 is awesome!'
Notice that, when you slice a string, the return value is also a string. Lists behave identically! When you slice a list, the return value is a list.
# Extract the substring '570'
s[5:8]
'570'
# The last two characters of the string
s[-2:]
'e!'
String Operations#
Joining Strings#
Use the + operator to join (concatenate) two strings. Note that this works the same with lists.
s1 = 'Moo'
s2 = 'cow'
joinedString = s1 + ' ' + s2
joinedString
'Moo cow'
Repeating Strings#
Use the * operator to repeat a string. Note that this works the same with lists.
catNoise = 'Meow'
catNoise * 10
'MeowMeowMeowMeowMeowMeowMeowMeowMeowMeow'
That’s hard to read. Let’s add a space between the cat noises!
(catNoise + ' ') * 10
'Meow Meow Meow Meow Meow Meow Meow Meow Meow Meow '
Checking whether one string contains another string#
You can use the in
operator to check whether one string contains a substring. The expression substr in s
checkes whether the string substr
can be found anywhere in the string s
. If it is found, the result will be True
. If not, the result will be False
.
Here are two examples:
# Check whether the string contains 'accy'
# It will be false because the string contains 'ACCY', not 'accy'
s = 'ACCY 570 is awesome!'
'accy' in s
False
# Check whether the string contains ' 57'
# It will be True because the string contains ' 57'
s = 'ACCY 570 is awesome!'
' 57' in s
True
Some Useful String Functions and Methods#
Python has many useful functions for working with strings. We will only show you a few here. We will revisit some of these functions again when we learn the Pandas module later in the course.
Getting the length of a string#
Use the len
function to get the length of a string. Just like with lists.
s = '12345678'
len(s)
8
Other String Methods#
The following functions are technically “methods”. In Python, strings are “objects”. In programming, an object is a thing that stores values (called properties), and that has certain functions that belong to it (called methods).
Method calls have a different syntax than function calls. To see this, think about the len
function. You call that by writing the function name and then putting arguments in parentheses (e.g. len(myvariable)
). You call a method by typing the object, then a dot, then the method name, then arguments in parentheses (e.g. mystring.method()
).
We realize this is complicated, and we have done our best to spare you such complications. For now, all you need to know is that:
Methods are functions.
Method are called differently than ordinary functions.
In the rest of this section, you will see more examples of methods and how they are called.
lower and upper#
lower
and upper
convert a string to all lower case or all upper case, respectively.
s = "Hello y'all"
upper_s = s.upper()
lower_s = s.lower()
print(f"s is '{s}'")
print(f"s in all lower case is '{lower_s}'")
print(f"s in all upper case is '{upper_s}'")
s is 'Hello y'all'
s in all lower case is 'hello y'all'
s in all upper case is 'HELLO Y'ALL'
strip, lstrip, and rstrip#
We know what you may be thinking and shame on you. Let’s remain focused. :)
Blank spaces are pesky and we often want to remove them from the beginning and end of strings. This is a common problem when cleaning text data. Luckily, the strip
method does that for you. strip
removes all “white space” characters from the beginning and end of a string. White space includes space, tab, and newline characters.
s = ' Dirty string with unnecessary spaces at beginning and end. '
s.strip()
'Dirty string with unnecessary spaces at beginning and end.'
Notice that strip
removed the spaces from the beginning and the end of the string.
lstrip
stands for left strip and it removes white space from the beginning (left) of the string. rstrip
does the same on the right.
# Notice this does not remove the white space at the end.
s.lstrip()
'Dirty string with unnecessary spaces at beginning and end. '
# Notice this does not remove the white space at the beginning.
s.rstrip()
' Dirty string with unnecessary spaces at beginning and end.'
By default, these methods remove white space, but you can also tell these methods what you want to strip!
s = 'Hello.'
# Get rid of period at the end
s.rstrip('.')
'Hello'
startswith and endswith#
The startswith
and endswith
methods tells you whether your string starts or ends with a given substring. These are like the in
operator, but they only test the beginning and and of your string.
s = '111 33'
s.startswith('111')
True
s.endswith('x')
False
replace#
This works just like find, but allows you to replace a substring with a different substring. By default, it replaces all occurrences of your substring.
s = 'ACCY 570 is great.'
s.replace('570', '575')
'ACCY 575 is great.'
split#
This may be one of the most useful functions ever invented. It splits a string and returns a list of substrings. By default, it splits using spaces but you can tell Python how you want it to split the string (the fancy word for the split character is delimiter).
s = 'Harare is the capital of Zimbabwe.'
s.split()
['Harare', 'is', 'the', 'capital', 'of', 'Zimbabwe.']
Isn’t that cool? It not only splits based on spaces, but it removes the spaces. You are left with a list of words!
What if you want to get rid of that period at the end?
s.rstrip('.').split()
['Harare', 'is', 'the', 'capital', 'of', 'Zimbabwe']
Now let’s split into sentences.
s = 'Harare is the capital of Zimbabwe. Ottawa is the capital of Canada. Andorra la Vella is the capital of Andorra.'
s.split('.')
['Harare is the capital of Zimbabwe',
' Ottawa is the capital of Canada',
' Andorra la Vella is the capital of Andorra',
'']
Count a Substring#
The count
method tells you how many times a substring occurs in a string.
s = 'pig dog pig pig cat'
s.count('pig')
3
find#
This is super useful. You can search for text within your string. If it exists, Python returns the index of the first occurrence. If it does not exist, Python returns -1.
s = 'hello everybody! How is everybody doing?'
# This returns 6 because that is the index of the first character of the first occurrence of 'every'
s.find('every')
6
# This returns -1 because 'dog' is not a substring of s
s.find('dog')
-1
Want more string functions? See here#
As we said previously, there are many string methods and we only showed you a few. If you want to see more, visit this website and scroll down to “Built-In String Methods”, or visit the official Python documentation, which has everything, and look for the heading “String Methods”.