Collections: Lists, Dictionaries, and Tuples#
Learning Objectives#
- Create a list, which may be empty or nonempty.
- Given a list on paper, identify the index of every element.
- Write code that retrieves a single element from a list.
- Write code that retrieves a single element from the end of a list.
- Write code that retrieves multiple consecutive elements from a list (slicing).
- Write code that retrieves multiple evenly-spaced elements from a list.
- Modify the value of a list element.
- Write code that joins two lists using the + operator.
- Write code that duplicates a list using the * operator.
- Explain the concept of shallow and deep copies of a list, and predict the output of code that relies on these concepts.
- Make a deep copy of a list using list's copy method.
- Compute the length of a list using the len function.
- Write code to append, insert, or remove list elements.
- Use list's reverse method to reverse a list.
- Explain the difference between a list and a dictionary, and explain when dictionaries are more appropriate than lists.
- Write code to create a dictionary.
- Write code to retrieve a value from a dictionary, given its key.
- Write code to add a key-value pair to a dictionary.
- Identify code that will fail because it attempts to create a duplicate key in a dictionary.
- Write code that modifies the value stored for a given dictionary key.
- Explain the difference between a tuple and a list.
- Write simple code to create a tuple.
Overview of Collections#
Earlier in this unit we introduced variables and data types. So far, we have only showed you how to store single values in variables. When working with data analytics, you will typically work with multiple pieces of data. In this section, we will teach you about how to work with multiple pieces of data.
Python provides many ways of storing collections of data. In this section, we will introduce you to three data structures for storing data collections: lists, dictionaries, and tuples. In this course, you will use lists and dictionaries very often so time you invest in learning them now will result in time savings and lower frustration later. Later in the course, we will introduce you to another data structure for storing collections of data, the Pandas data frame. Many of the ideas you learn now will apply to data frames. Also, much of the syntax for lists and dictionaries will apply to Pandas data frames, so again, we urge you to become proficient with lists and dictionaries!
Lists#
In programming, a list is a collection of objects. Typically, you will create a list when you want to store a set of related data items. Once you create the list, you can retrieve list elements, modify them, or delete them. It is usually more efficient to work with a list than to create a separate variable for each piece of data.
Assume you want to store the names of the employees in your organization. You could do something like this:
employee0 = 'Kim Mendoza'
employee1 = 'Vic Anand'
employee2 = 'Josh Herbold'
You could do something like that, but we wouldn’t recommend it! It’s inefficient, it would be extremely difficult to deal with situations where the number of employees changes, and you might need to create hundreds or thousands of variables every time you want to work with your data. Lists make this process much more efficient.
Creating a List#
Create a list by enclosing your data inside square brackets. Separate each data item with a comma. You may add spaces between commas if you like.
Let’s recreate our previous example using a list:
employees = ['Kim Mendoza', 'Vic Anand', 'Josh Herbold']
This code created a new variable called employees
. The type of this variable is list.
type(employees)
list
The list contains three items, all of which are text strings. You can print a list using the print
function, just like you print any other variable.
print(employees)
['Kim Mendoza', 'Vic Anand', 'Josh Herbold']
In Python, lists can hold elements of different types. For example, say I want to store multiple pieces of information about one employee. I could do that in one list. For example, say I want to store an employee’s name, employee ID, birthday, and home address. I could type something like the following:
employee_info = ['Kim Mendoza', 4573837, '1/1/1998', '616 E. Green St., Champaign, IL 61820']
This is useful as it allows me to store all information about a person in one location, but has many disadvantages. I have to remember the index of each piece of information. I have to assume that data is always entered in the same order; if it isn’t, errors are introduced into my work. And so on. Fortunately, there are better ways to store such data and we will show you one of those ways, dictionaries, later in this document.
It is also possible to store lists inside lists (nested lists). To do that, simply create a list inside a list. For example, I could store a matrix as follows:
identity_matrix_3 = [[1,0,0], [0,1,0], [0,0,1]]
You can think of a matrix as a list of rows. Each row is stored as a list, so the matrix is stored as a list of lists.
Retrieving List Elements#
How List Elements are Numbered#
In Python, each list element is assigned an index. List indexes begin at 0 and end at n-1, where n is the number of elements in the list. Remember that! It is a common source of bugs and errors for students new to programming.
Consider the code above where we created a list of employees:
employees = ['Kim Mendoza', 'Vic Anand', 'Josh Herbold']
Here’s a table showing you each list element and its index:
Index |
0 |
1 |
2 |
---|---|---|---|
List Element |
‘Kim Mendoza’ |
‘Vic Anand’ |
‘Josh Herbold’ |
Note that the index of the first element is 0 and, since the list has 3 elements, the index of the last element is 2.
Retrieving a Single List Element#
To access a single element of a list, type the name of the list and then enter the list index in square brackets. Here are some examples:
employees = ['Kim Mendoza', 'Vic Anand', 'Josh Herbold']
employees[0]
'Kim Mendoza'
employees[1]
'Vic Anand'
employees[2]
'Josh Herbold'
Let’s repeat the matrix example from earlier. We’ll recreate the matrix and show you how to print it. In the process, we will use what we just learned about retrieving list elements.
identity_matrix_3 = [[1,0,0], [0,1,0], [0,0,1]]
# Print the matrix
for i in range(3):
for j in range(3):
print(f'{identity_matrix_3[i][j]} ', end='')
print()
1 0 0
0 1 0
0 0 1
We want to ensure you understand one thing before we lay this example to rest. Let’s look at the expression identity_matrix_3[i][j]
. Let’s assume the current value of i
is 1 and the current value of j
is 0. When Python sees that expression, it resolves it as follows:
identity_matrix_3[i][j]
# Substitute the values of i and j into the expression
identity_matrix_3[1][0]
# Python resolves identity_matrix_3[1]. This gives the second element of
# identity_matrix_3, which is [0,1,0]
[0,1,0][0]
# Python resolves [0,1,0][0]. This gives the first element of the list
# [0,1,0], which is zero.
0
Please ensure you understand these steps. If anything is unclear, ask one of the teaching staff!
Common error: Index out of range#
Remember we said that list indices range from 0 to n-1, where n is the number of elements in the list? A common error is to forget that and enter an index that is too large. Many people think, “My list has 3 elements and I want the third element, so I’ll type the following…”
employees[3]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[9], line 1
----> 1 employees[3]
IndexError: list index out of range
Notice that Python tells you that “list index out of range”.
(optional) Why do indexes begin at zero?#
If you like nerdy stuff and you’re wondering why list indices begin at 0, read the following. If you don’t care, skip this paragraph. :) In early programming languages, like C, programmers had to manage the computer’s memory. To write C code, you had to think like a computer. The name of a list is the memory address (called a “pointer”) of the first element. The index is an “offset”. It tells C to start at the memory address of the first element and move forward. Thus, in C, employees[2]
tells the computer to locate the memory address of the first element of employees and go forward by 2 elements. That gives the memory address of the third element. If you want the first element, employees[0]
tells the computer to locate the memory address of the first element and go forward by 0 elements.
Modern languages like Python use what are called “zero-based” indexes to be consistent with older languages. Old programmers like us faculty are used to this, but it sometimes takes students awhile to get used to this.
Retrieving a Single List Element From the End of the List#
Python makes it easy to retrieve elements from the end of a list. The list index -1
refers to the last element of the list. -2
is the second-to-last element, and so on. Here is an example:
myList = ['First', 'Second', 3, 'Fourth']
print(f'The last element of the list is: {myList[-1]}')
print(f'The second-to-last element is: {myList[-2]}')
The last element of the list is: Fourth
The second-to-last element is: 3
Retrieving Multiple List Elements (Slicing)#
Sometimes, you may wish to retrieve more than one element of a list. Python makes it very easy to do so. The process of retrieving multiple list elements is called slicing a list (because it’s like taking a slice). In our opinion, list slicing is one of the coolest and most useful features of Python.
Retrieving Multiple Adjacent List Elements#
Python allows you to retrieve more than one element from a list. You can think of this as retrieving a subset of your list. When you ask Python for multiple elements, it will return them as a list.
Let’s begin with an example:
mylist = [0,1,2,3,4,5,6,7,8,9]
mylist[0:4]
[0, 1, 2, 3]
We retrieved the first four elements of the list by typing 0:4
in square brackets after the list’s name. Python interprets this as follows:
Start at element with index 0 and retrieve all elements with indexes 0, 1, 2, and 3. In other words, 4 is the “exclusive upper bound”. Python will get all elements with indices from 0 up to, but not including 4.
In general, a slice has the form a:b
. If b
is less than or equal to a
, Python will return an empty list since there are no indexes in the range. Here’s an example:
mylist = [0,1,2,3,4,5,6,7,8,9]
mylist[5:4]
[]
In the above example, we told Python to start at element 5 and retrieve all elements up to element 3. Since 3 < 5, Python returns an empty list.
There are some shortcuts you might find useful:
This says, “start at element with index 5 and retrieve all the remaining elements in the list.”
mylist = [0,1,2,3,4,5,6,7,8,9]
mylist[5:]
[5, 6, 7, 8, 9]
This says, “start at the beginning of the list and retrieve all elements through element with index 6.”
mylist = [0,1,2,3,4,5,6,7,8,9]
mylist[:7]
[0, 1, 2, 3, 4, 5, 6]
Retrieving Multiple List Elements Spaced Evenly#
When slicing, you can give Python a third parameter which specifies a “step” to use. By default, Python uses a step size of 1. However, say you want every other element. You could use a step of 2. Here’s how you would do that.
mylist = [0,1,2,3,4,5,6,7,8,9]
mylist[0:10:2]
This would return: [0,2,4,6,8]
. When Python sees 0:10:2
, it starts at element 0, adds 2 to the index and retrieves element 2. Then it adds 2 to the index and retrieves element 4. This continues as long as the index is less than 10.
Say we wanted the first and sixth elements. We could type:
mylist = [0,1,2,3,4,5,6,7,8,9]
mylist[0::5]
This would return [0,5]
. Python starts with the first element (index 0), adds 5 to get the element with index 5. It then adds 5 again; the resulting index is 10, which is not a valid index, so Python stops. Notice that we omitted the second parameter to tell Python to go to the end of the list. We could also have used ::5
as a slice to get the same thing.
Modifying List Elements#
To modify a single element of a list, simply treat it as you would any other variable. Here’s an example:
myList = ['a', 'b', 'c']
myList[1] = 'sheep'
myList
['a', 'sheep', 'c']
To modify multiple elements at the same time, you need to tell Python which list slice you want to modify and give Python a list.
myList = ['a', 'b', 'c', 'd', 'e']
myList[1:4:2] = ['hello', 'world']
myList
['a', 'hello', 'c', 'world', 'e']
Other List Operations#
Python provides many useful functions for working with lists. In this section, we will show you a few that you are likely to use in this course. Keep in mind that there are many more.
Joining Two Lists (Concatenating)#
The word ‘concatenate’ is a fancy word for join. We are teaching you this word since there is an Excel function CONCAT that you may find useful. There is also a Pandas function with the same name.
To join two lists together, use the + operator. It’s that simple.
list1 = ['a', 'b', 'c']
list2 = [1, 2, 3]
list1 + list2
['a', 'b', 'c', 1, 2, 3]
Duplicating a List#
The multiplication operator makes copies of a list and concatenates them into a new list. Check this out:
mylist = ['yo', 'dude']
mylist * 5
['yo', 'dude', 'yo', 'dude', 'yo', 'dude', 'yo', 'dude', 'yo', 'dude']
Making a Copy of a List#
You can make a copy of a list using the copy
function. Here’s an example of why you might want to do this.
list1 = [1,2,3]
list2 = list1
list2[0] = 'cat'
print(f'list1 is {list1}')
print(f'list2 is {list2}')
list1 is ['cat', 2, 3]
list2 is ['cat', 2, 3]
Confused yet? :) The statement list2 = list1
did not make a true copy of the original list. Instead, the variable list2
“points” to the same list as the variable list1
. Thus, when we modified list2
, we also modified list1
.
If we want to make a “deep” copy of list1
and store it as list2
, we can use the list copy function as follows:
list1 = [1,2,3]
# Notice the change here. list1.copy() makes a deep copy of list1 and stores it as list2
list2 = list1.copy()
list2[0] = 'cat'
print(f'list1 is {list1}')
print(f'list2 is {list2}')
list1 is [1, 2, 3]
list2 is ['cat', 2, 3]
Notice that we called copy
function differently than other functions. Based on what we have taught you so far, you probably would have expected a line of code that looks like this:
list2 = copy(list1)
Instead, we typed:
list2 = list1.copy()
Why? Well, a list is an “object” in Python. In programming, objects are things that contain variables (called fields) and functions (called methods) inside them. In this case, copy
is a method of the list object. Object methods are called differently than ordinary functions.
We will teach you more about objects later in the course. For now, just remember that there are two ways to call a function.
Getting the length of a List#
Use the len
function.
mylist = [1,2,3]
len(mylist)
3
Checking whether a list contains an element#
The in
and not in
operators can be used to check whether a list contains or does not contain an element.
mylist = [1,2,3]
print(f'Does mylist contain the number 2? {2 in mylist}')
Does mylist contain the number 2? True
Appending, Inserting, and Removing List Elements#
Append is a fancy word for “add something to the end”. You will see this word a lot in programming and data analytics.
To append a single element to a list, use the append
method of a list object.
mylist = [1,2,3]
mylist.append(-99)
mylist
[1, 2, 3, -99]
You can also insert an element at a specific location in a list by using the insert
function of a list object. Specifically, l.insert(i, x)
inserts x at index i of list l.
The following code inserts the word ‘chicken’ at index 3 of the list. It shifts the other list elements to the right.
mylist = [0,1,2,3,4]
mylist.insert(3, 'chicken')
mylist
[0, 1, 2, 'chicken', 3, 4]
Finally, you can remove an element from a list by using the remove
function. Specifically, remove(x)
removes the first instance of x
from the list.
mylist = ['cat', 'dog', 'pig', 'sheep', 'duck']
mylist.remove('pig')
mylist
['cat', 'dog', 'sheep', 'duck']
Reversing a List#
Reverse a list by calling the reverse
method of a list object.
mylist = [1,2,3,4,5]
mylist.reverse()
mylist
[5, 4, 3, 2, 1]
Dictionaries#
What is a Dictionary and Why Use One?#
When you think of a dictionary, you probably think of a big book of words and their meanings.
Computer scientists have generalized the idea of a dictionary. A computer scientist will see a dictionary as a list of key-value pairs. So if you have a book with words and their meanings (what we normally think of as a dictionary), the computer scientist would say that the words are the keys and the meanings are the values.
In Python and other programming languages, dictionaries are very useful data structures since they allow you to assign data items (values) to keys of your choice. This makes it easier to store and retrieve your data.
Let’s say I want to store the following income statement in Python.
Item |
Amount |
---|---|
Revenue |
100 |
COGS |
52 |
Gross margin |
48 |
SG&A |
40 |
Net Income |
8 |
I could do so using a list:
[100, 52, 48, 40, 8]
But this is not recommended for many reasons. First, to retrieve a data item, you need to know its index in the list. Second, someone accidentally could insert or remove an item into this list, which would screw up the ordering and meanings of the values. Third, it is difficult to enforce relationships in the data (e.g. gross margin equals revenue minus COGS).
A better way is to use a dictionary!
Creating a Dictionary#
To create a dictionary, use curly braces { }. Within the curly braces, enter a key, followed by a colon, followed by the value. Separate each key-value pair with a comma.
Following is a sample dictionary for our income statement above.
income_stmt = {'Revenue': 100, 'COGS': 52, 'Gross margin': 48, 'SG&A': 40, 'Net Income': 8}
You can put the items on separate lines to increase readability.
income_stmt = {'Revenue': 100,
'COGS': 52,
'Gross margin': 48,
'SG&A': 40,
'Net Income': 8}
Note on Dictionary Keys#
Keys must be unique. That means you cannot repeat a key.
Keys can be numbers, strings, or tuples (we’ll teach you about tuples in the next section)
Retrieving Data from a Dictionary#
To retrieve a value from a dictionary, you will use a syntax similar to that for lists. However, instead of using a numerical index, you will use the key!
Continuing with the income statement example from above:
income_stmt['Revenue']
100
Common Errors#
You attempt to use a key that is not in the dictionary. You will get a KeyError.
If you use strings as keys, they are case-sensitive. A common error is to get the spelling or capitalization wrong and get a KeyError.
Adding a Key-Value Pair to a Dictionary#
Python makes this really easy. Simply assign the value to the new key and Python will add the key and value to the dictionary.
Let’s add a fiscal year to our income statement dictionary:
income_stmt['Fiscal Year'] = 2018
print(income_stmt)
{'Revenue': 100, 'COGS': 52, 'Gross margin': 48, 'SG&A': 40, 'Net Income': 8, 'Fiscal Year': 2018}
Modifying a Value in the Dictionary#
Simply set the new value using your key. Let’s modify the fiscal year that we just added.
income_stmt['Fiscal Year'] = 1998
print(income_stmt)
{'Revenue': 100, 'COGS': 52, 'Gross margin': 48, 'SG&A': 40, 'Net Income': 8, 'Fiscal Year': 1998}
Common Error#
Say you want to modify a value in your dictionary. If you misspell or mistype its key, Python will think you want to add a new key. It will add that misspelled key and its value to the dictionary.
Other Dictionary Operations#
Python provides support for many built-in dictionary operations. You can get the length of the dictionary (number of key-value pairs); delete dictionary keys (and their values); clear all keys and values; copy the dictionary; retrieve only the keys; retrieve only the values; etc.
We will not show you all of these here. If you are interested, check out this webpage.
Tuples#
A tuple is like a list, but it cannot be changed after it is created. The syntax for tuples is nearly identical to that for lists. The main difference is that tuples use parentheses ( )
whereas lists use square brackets [ ]
We want to make you aware of tuples because many Python functions require tuples as arguments. One plotting function allows you to specify the size of the figure you want to create; the argument, figsize
must be a tuple containing the width and height. Some functions will return tuples. For example, the Pandas DataFrame is a table of data. Pandas will tell you the “shape” of a DataFrame: its shape method returns a tuple containing the number of rows and the number of columns.
Again, we do not expect you to learn the details of tuples. We just want you to be aware of their existence.
One quick example with tuples:
my_fig_size = (500, 300)
print(f'My figure will have a width of {my_fig_size[0]} and height of {my_fig_size[1]}.')
My figure will have a width of 500 and height of 300.