Anatomy of a DataFrame#
Learning Objectives#
- Create a DataFrame from a list of lists.
- Create a DataFrame from a dictionary.
- Define a row index.
- Explain the parallels between Pandas' row and column indexes and cell addresses in Excel.
- Give an example of a situation where it might be useful to use data values as the index.
- Retrieve and set the value in an individual cell using the at and iat methods.
- Extract the column names using the column property of DataFrame.
- Set the column names using the column property of DataFrame.
- Extract the row index values using the index property of DataFrame.
- Set the row index using the index property of DataFrame.
- Get the dimensions of a DataFrame using the shape property of DataFrame.
Overview#
In this chapter, we will view some simple DataFrames and learn about their indexes. Just like in Excel, where each cell has an “address”, e.g. A1, B4, Pandas has row and column indexes. However, Pandas provides much more flexibility and power in choosing indexes. That, however, means you need to learn how these indexes work and how to use them to retrieve data.
Creating a DataFrame from a List of Lists#
Typically, you will load data from an Excel or CSV file into Pandas. However, it is possible to construct a DataFrame from existing lists or dictionaries. In this chapter, we will construct simple DataFrames so that we can focus on their indexes.
Let’s begin by importing the necessary libraries. Run the cell below.
import numpy as np
import pandas as pd
Now, let’s create a really simple DataFrame in which we list some companies, their stock ticker symbol, and their closing stock price on October 18, 2019. Run the following cell and then take a look at it to see if you can figure out how it works.
dfStocks = pd.DataFrame([['Walmart, Inc.' ,'WMT', 119.14],
['Target Corporation', 'TGT', 112.81],
['Best Buy Co., Inc.', 'BBY', 70.52]])
dfStocks
0 | 1 | 2 | |
---|---|---|---|
0 | Walmart, Inc. | WMT | 119.14 |
1 | Target Corporation | TGT | 112.81 |
2 | Best Buy Co., Inc. | BBY | 70.52 |
In the above cell, we created a DataFrame from a list of lists (sort of like a matrix). Each sublist contains the company name, ticker symbol, and a stock price. Notice that Pandas converted each sublist into a row of the new DataFrame.
We used the function pd.DataFrame
. This is a special function called a “constructor”. It takes some input data, constructs a DataFrame object, and returns the DataFrame. We stored the return value in the variable dfStocks
.
Row and Column Indexes#
Default Index Labels#
In the code above, we gave some data to the function pd.DataFrame
and told it to construct a DataFrame. We did not give that function any information about the row and column names, so it assigned them automatically. Along each axis, it assigned the value 0 to the first item, then 1, then 2. You can think of these index values as addresses. There is one value located at each address. In our simple DataFrame, the addresses go from (0, 0), which has the value 'Walmart, Inc.'
, to (2, 2), which has the value '70.52'
. Later, you will see how to use these indexes to retrieve and set values.
Parallels to Excel#
Notice the parallels to Excel. Excel numbers the rows 1, 2, … and it labels the columns A, B, … Each cell in Excel has a unique address, e.g. C8. Most data-oriented software will behave similarly and assign indexes to each cell in a table.
Working with Column Indexes#
Our sample DataFrame currently has 0, 1, and 2 as column labels. This isn’t very useful, especially since we know that each column has specific information. We know that the first column contains company names, the second column contains tickers, and the third column contains stock prices. Can we assign custom labels so that our data is more meaningfully displayed? Of course we can!
If you already have a DataFrame and want to change the column names, there are two ways:
Pass a list to the columns property of DataFrame.
Use the
rename
method of DataFrame.
Let’s discuss each of these in turn. Before we do, let’s retrieve the current columns of our DataFrame.
Retrieving the Columns of a DataFrame#
To get or set the current column labels, use the columns
property of DataFrame, e.g.
dfStocks.columns
RangeIndex(start=0, stop=3, step=1)
The output of the above code tells us that Pandas assigned a RangeIndex
. This is like the range
function you learned about when we learned about for
loops. The above output tells us that the column labels start at 0 and go to 2 in steps of 1. Just like with the range
function, the last value is one less than the stop parameter.
Often, it’s just easier to work with a list. If we want the columns as a list, we can do it one of two ways:
dfStocks.columns.tolist()
[0, 1, 2]
list(dfStocks.columns)
[0, 1, 2]
Setting the Columns of a DataFrame Using a List#
This is easy. All you need to do is set the columns equal to a list. See below.
dfStocks.columns = ['Company Name', 'Ticker', 'Closing Price']
dfStocks
Company Name | Ticker | Closing Price | |
---|---|---|---|
0 | Walmart, Inc. | WMT | 119.14 |
1 | Target Corporation | TGT | 112.81 |
2 | Best Buy Co., Inc. | BBY | 70.52 |
See how easy that was? :) And now our DataFrame has meaningful column names.
Column names can be numbers, dates, or strings. I typically see strings as column names, but there’s no right or wrong way. It really depends on your data.
What changed under the hood? Let’s take a look at the columns
property of our DataFrame once again:
dfStocks.columns
Index(['Company Name', 'Ticker', 'Closing Price'], dtype='object')
So now, instead of a RangeIndex
, our columns are an Index
that consists of strings.
Setting the Columns of a DataFrame Using the rename
method#
You won’t always want to change all column names. And if your DataFrame has hundreds of column names, there should be an easier way to change the name of one or two columns without passing a list containing all the column names. Pandas provides the rename
method for this purpose. To use rename
to change specific columns, you will use the columns
keyword argument. The argument should be a dictionary containing the old and new names.
Let’s illustrate with an example. Say I want to change the column name Ticker to TICKER. Say that I also want to change Closing Price to Price. I would call the rename function as follows:
dfStocks = dfStocks.rename(columns = {'Ticker' : 'TICKER',
'Closing Price' : 'Price'})
dfStocks
Company Name | TICKER | Price | |
---|---|---|---|
0 | Walmart, Inc. | WMT | 119.14 |
1 | Target Corporation | TGT | 112.81 |
2 | Best Buy Co., Inc. | BBY | 70.52 |
Let’s take a look at that code. We passed it a dictionary. In the dictionary, each key was an existing column name. Each value was a new column name. The rename
method looked through the columns for a column named 'Ticker'
. When it found one, it replaced it with 'TICKER'
. It did the same for 'Closing Price'
.
Notice also that I had to assign the return value of the rename
method to dfStocks. That’s because, by default, the rename
method create a whole new DataFrame! If I just call the rename method and do not assign its return value to anything, the new DataFrame will not be saved anywhere.
If you want the rename
method to rename the columns of a DataFrame, and not create a new DataFrame, you can set the optional keyword argument inplace
to True. See the code below:
dfStocks.rename(columns={'Company Name': 'Name'}, inplace=True)
dfStocks
Name | TICKER | Price | |
---|---|---|---|
0 | Walmart, Inc. | WMT | 119.14 |
1 | Target Corporation | TGT | 112.81 |
2 | Best Buy Co., Inc. | BBY | 70.52 |
In summary, the argument inplace=True
tells the rename
method to modify the existing DataFrame and not create a new one.
Working with Row Indexes#
You can retrieve the existing row indexes with the index
property of DataFrame. For example, see below:
dfStocks.index
RangeIndex(start=0, stop=3, step=1)
dfStocks.index.tolist()
[0, 1, 2]
You can also set the row indexes by passing a list, just like we did with the columns. It is not as common to set row indexes as it is to set column indexes. Usually, it is done with time series data. We’ll show you a quick example of that, just to show you how it looks, but we won’t focus on custom row indexes in this class.
Let’s say I have closing stock prices for Altria Group (NYSE: MO). Since I will have one price per day, and I want to look up prices by date, I might want to use the dates as my row indexes. Here’s how that would look. Don’t worry if this code isn’t clear; we’re not going to hold you responsible for it.
dates = ["10/14/2019","10/15/2019","10/16/2019","10/17/2019","10/18/2019"]
prices = [42.52,42.75,43.45,43.9,44.28]
dfMO = pd.DataFrame(data=prices,
columns=['Closing Price'],
index=pd.to_datetime(dates))
dfMO
Closing Price | |
---|---|
2019-10-14 | 42.52 |
2019-10-15 | 42.75 |
2019-10-16 | 43.45 |
2019-10-17 | 43.90 |
2019-10-18 | 44.28 |
Look at the resulting DataFrame. The row indexes are dates. Who cares, you ask? Well if we want to plot the price over time, and we want to see the date on the x-axis, it’s very easy if the dates are the index. For example:
plt = dfMO.plot()

Creating a DataFrame from a Dictionary#
Sometimes, you already have your data in list form. Often, you will have one list per column. When that is the case, it is easy to create a DataFrame from a dictionary. Let’s recreate the sample DataFrame from above using a dictionary.
dfStocks = pd.DataFrame({'Company Name': ['Walmart, Inc.', 'Target Corporation', 'Best Buy Co., Inc.'],
'Ticker': ['WMT', 'TGT', 'BBY'],
'Closing Price': [119.14, 112.81, 70.52]
})
dfStocks
Company Name | Ticker | Closing Price | |
---|---|---|---|
0 | Walmart, Inc. | WMT | 119.14 |
1 | Target Corporation | TGT | 112.81 |
2 | Best Buy Co., Inc. | BBY | 70.52 |
The above code constructed a dictionary. Each key was a column name, and each value was a list containing the data for a column. This is a common and simple way to construct a DataFrame. It has the advantage of allowing you to conveniently specify the column names. Notice that, as before, the row indexes were assigned automatically using a zero-based index.
Retrieving and Setting Individual Values (Cells) in a DataFrame#
Pandas provides two methods for accessing (“getting”) and setting individual values in a DataFrame. These are the at
and iat
methods.
The at
Method#
Retrieving a Single Value with the at
method#
The at
method accesses a single value. It requires you to specify the row and column index of the value you want. Here are some examples:
dfStocks.at[0, 'Ticker']
'WMT'
dfStocks.at[1, 'Closing Price']
np.float64(112.81)
dfStocks.at[2, 'Company Name']
'Best Buy Co., Inc.'
Notice that the at
method takes square brackets, not parentheses. That’s because it is using indexing, just like you did with lists. Also notice that the first argument is a row label, and the second argument is a column label. Also notice that the data type has to be correct. The row indexes are integers, so I have to provide an integer.
To illustrate this point, let’s recreate our stocks DataFrame, this time without column labels:
dfStocks = pd.DataFrame([['Walmart, Inc.' ,'WMT', 119.14],
['Target Corporation', 'TGT', 112.81],
['Best Buy Co., Inc.', 'BBY', 70.52]])
dfStocks
0 | 1 | 2 | |
---|---|---|---|
0 | Walmart, Inc. | WMT | 119.14 |
1 | Target Corporation | TGT | 112.81 |
2 | Best Buy Co., Inc. | BBY | 70.52 |
In this DataFrame, the column labels are integers (because they were automatically assigned). Therefore, I need to pass integers to the at
method.
dfStocks.at[1, 1]
'TGT'
If I try to pass a column name as a string, I will get an error. For example, in the following code, there’s no column labeled '1'
, so I will get an error.
dfStocks.at[1, '1']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[20], line 1
----> 1 dfStocks.at[1, '1']
File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pandas/core/indexing.py:2575, in _AtIndexer.__getitem__(self, key)
2572 raise ValueError("Invalid call for scalar access (getting)!")
2573 return self.obj.loc[key]
-> 2575 return super().__getitem__(key)
File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pandas/core/indexing.py:2527, in _ScalarAccessIndexer.__getitem__(self, key)
2524 raise ValueError("Invalid call for scalar access (getting)!")
2526 key = self._convert_key(key)
-> 2527 return self.obj._get_value(*key, takeable=self._takeable)
File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pandas/core/frame.py:4214, in DataFrame._get_value(self, index, col, takeable)
4211 series = self._ixs(col, axis=1)
4212 return series._values[index]
-> 4214 series = self._get_item_cache(col)
4215 engine = self.index._engine
4217 if not isinstance(self.index, MultiIndex):
4218 # CategoricalIndex: Trying to use the engine fastpath may give incorrect
4219 # results if our categories are integers that dont match our codes
4220 # IntervalIndex: IntervalTree has no get_loc
File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pandas/core/frame.py:4638, in DataFrame._get_item_cache(self, item)
4633 res = cache.get(item)
4634 if res is None:
4635 # All places that call _get_item_cache have unique columns,
4636 # pending resolution of GH#33047
-> 4638 loc = self.columns.get_loc(item)
4639 res = self._ixs(loc, axis=1)
4641 cache[item] = res
File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pandas/core/indexes/range.py:417, in RangeIndex.get_loc(self, key)
415 raise KeyError(key) from err
416 if isinstance(key, Hashable):
--> 417 raise KeyError(key)
418 self._check_indexing_error(key)
419 raise KeyError(key)
KeyError: '1'
Setting a Single Value with the at
method#
To set a value, simply use an equals statement to assign the value. For example:
dfStocks = pd.DataFrame({'Company Name': ['Walmart, Inc.', 'Target Corporation', 'Best Buy Co., Inc.'],
'Ticker': ['WMT', 'TGT', 'BBY'],
'Closing Price': [119.14, 112.81, 70.52]
})
dfStocks
Company Name | Ticker | Closing Price | |
---|---|---|---|
0 | Walmart, Inc. | WMT | 119.14 |
1 | Target Corporation | TGT | 112.81 |
2 | Best Buy Co., Inc. | BBY | 70.52 |
dfStocks.at[0, 'Company Name'] = 'WAL-MART'
dfStocks
Company Name | Ticker | Closing Price | |
---|---|---|---|
0 | WAL-MART | WMT | 119.14 |
1 | Target Corporation | TGT | 112.81 |
2 | Best Buy Co., Inc. | BBY | 70.52 |
The iat
Method#
The iat
method performs “integer-based lookups”. It accesses a single value for a row/column pair by integer position. It ignores any custom indexes you have created and simply counts the rows and columns. For example, if I want to access the first row, third column, I will go to position [0, 2]. That’s because 0 is the integer index of the first row (remember, everything starts at zero), and 2 is the integer index of the third column.
Here are some examples of getting and setting:
dfStocks = pd.DataFrame({'Company Name': ['Walmart, Inc.', 'Target Corporation', 'Best Buy Co., Inc.'],
'Ticker': ['WMT', 'TGT', 'BBY'],
'Closing Price': [119.14, 112.81, 70.52]
})
dfStocks
Company Name | Ticker | Closing Price | |
---|---|---|---|
0 | Walmart, Inc. | WMT | 119.14 |
1 | Target Corporation | TGT | 112.81 |
2 | Best Buy Co., Inc. | BBY | 70.52 |
dfStocks.iat[1, 2]
np.float64(112.81)
dfStocks.iat[0, 0]
'Walmart, Inc.'
dfStocks.iat[2, 2]
np.float64(70.52)
To set a value using iat
, simply assign with an equals sign. For example:
dfStocks.iat[2, 2] = -9999
dfStocks
Company Name | Ticker | Closing Price | |
---|---|---|---|
0 | Walmart, Inc. | WMT | 119.14 |
1 | Target Corporation | TGT | 112.81 |
2 | Best Buy Co., Inc. | BBY | -9999.00 |
Getting the Dimensions of a DataFrame#
Use the shape
method to get the dimensions (number of rows and number of columns) of your DataFrame. The shape
method will return a tuple. Here’s an example:
df = pd.DataFrame({'Col1': [1,2,3,4,5,6], 'Col2': list('abcdef'), 'Col3': list(range(100,700,100))})
df
Col1 | Col2 | Col3 | |
---|---|---|---|
0 | 1 | a | 100 |
1 | 2 | b | 200 |
2 | 3 | c | 300 |
3 | 4 | d | 400 |
4 | 5 | e | 500 |
5 | 6 | f | 600 |
Let’s get the dimensions of our new DataFrame:
df.shape
(6, 3)
We got the tuple (6, 3)
indicating that there are 6 rows and 3 columns. Remember that you can retrieve values from a tuple just like you did from a list. See below:
print(f'The DataFrame df has {df.shape[0]} rows and {df.shape[1]} columns.')
The DataFrame df has 6 rows and 3 columns.