Creating New Columns

Creating New Columns#

Learning Objectives#

Create a new column in a Pandas DataFrame that contains the same value in every row.
Create a new column in a Pandas DataFrame from a list of values.
Recognize that, when creating a column from a list, the list length must exactly equal the number of rows in the DataFrame.
Create a new column in a Pandas DataFrame that is a transformation of an existing column.

Do this "in place", where you replace an existing column with new values.
Create a new column while preserving the original column.
Use Pandas built-in string methods.
Use arithmetic.

Create a column as a combination of multiple existing columns. The existing columns can be numeric or string.
Use numpy.where to create a new column using if-then-else logic.

Overview#

In this chapter, you will learn how to create new columns in a Pandas DataFrame. After you load, clean, and explore your data, creating new columns is typically the next step as it is a prerequisite for analysis. We will show you how to create a column from a single value (i.e. you want every value in the column to be the same thing). We will show you how to create a column from a list. We will then show you how to combine existing columns to create a new column. Sometimes, the combinations are mathematical. For example, you can compute return on assets as net income divided by total assets; you would want to do this for each row of the data. Sometimes, the combinations involve strings.

The key idea in this chapter is that Pandas works with Series. Each column of a DataFrame is a series, so when you create a new column, you are creating a Series. Pandas makes it very easy to work with Series objects. That, in turn, makes it easy to create new columns.

Imports and Loading the Data#

import numpy as np
import pandas as pd

In this chapter, we will work with accounting (balance sheet and income statement) and finance (stock price) data for Microsoft Corporation (NASDAQ: MSFT). The cell below loads the dataset that we have already cleaned for you. In the dataset there is one row per fiscal year. The data go back to 1986, the first year for which data is available for Microsoft. Following is a description of the columns in the data set.

Column	Meaning
Fiscal Year	The fiscal year for the row of data.
Assets - Total	Total assets as reported on Microsoft’s balance sheet at the close of the fiscal year. In millions.
Common Equity	Common stockholders’ equity as reported on Microsoft’s balance sheet at the close of the fiscal year. In millions.
Income Before Extraordinary Items	Income before extraordinary items as reported on Microsoft’s income statement at the close of the fiscal year. In millions.
Net Income	Net Income as reported on Microsoft’s income statement at the close of the fiscal year. In millions.
Adjusted Price Close - Fiscal	The stock price as reported on Yahoo! Finance. The price is adjusted for stock splits.
Address	Microsoft’s street address
ZIP Code	Microsoft’s ZIP (postal) code
City	The city in which Microsoft’s headquarters is located.
State	The state in which Microsoft’s headquarters is located.

dfMSFT = pd.read_csv('data/MSFT.csv')
dfMSFT.head()

	Fiscal Year	Assets - Total	Common Equity	Net Income	Adjusted Price Close - Fiscal	Address	ZIP Code	City	State
0	1986	170.739	139.332	39.254	0.068926	One Microsoft Way	98052	Redmond	wa
1	1987	287.754	239.105	71.878	0.228631	One Microsoft Way	98052	Redmond	wa
2	1988	493.019	375.498	123.908	0.300359	One Microsoft Way	98052	Redmond	wa
3	1989	720.598	561.780	170.538	0.237597	One Microsoft Way	98052	Redmond	wa
4	1990	1105.349	918.563	279.186	0.681410	One Microsoft Way	98052	Redmond	wa

dfMSFT.shape

(34, 9)

A Little Data Exploration (I couldn’t resist)#

Run the cells below. They will generate graphs of Microsoft’s net income and stock price over time. As you can see, Microsoft has done really well, especially in the last few years!

from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource
from bokeh.layouts import row
from bokeh.io import output_notebook
output_notebook()

Loading BokehJS ...

source = ColumnDataSource(dfMSFT)

p1 = figure(width=400, height=400, 
            x_axis_label='Fiscal Year', y_axis_label='$ millions',
            title='Net Income')
p1.line(x='Fiscal Year', y='Net Income', source=source)

p2 = figure(width=400, height=400,
            x_axis_label='Fiscal Year', y_axis_label='Stock Price, adjusted for splits',
            title='Stock Price (Adj.)')
p2.line(x='Fiscal Year', y='Adjusted Price Close - Fiscal', source=source)

show(row(p1, p2))

Creating a Column from a Scalar#

Look at the cell below, which shows the first few rows of the data.

dfMSFT.head()

	Fiscal Year	Assets - Total	Common Equity	Net Income	Adjusted Price Close - Fiscal	Address	ZIP Code	City	State
0	1986	170.739	139.332	39.254	0.068926	One Microsoft Way	98052	Redmond	wa
1	1987	287.754	239.105	71.878	0.228631	One Microsoft Way	98052	Redmond	wa
2	1988	493.019	375.498	123.908	0.300359	One Microsoft Way	98052	Redmond	wa
3	1989	720.598	561.780	170.538	0.237597	One Microsoft Way	98052	Redmond	wa
4	1990	1105.349	918.563	279.186	0.681410	One Microsoft Way	98052	Redmond	wa

Notice that the data is missing Microsoft’s stock ticker symbol, MSFT. Let’s add that as a new column. Run the cell below.

dfMSFT['Ticker'] = 'MSFT'
dfMSFT.head()

	Fiscal Year	Assets - Total	Common Equity	Net Income	Adjusted Price Close - Fiscal	Address	ZIP Code	City	State	Ticker
0	1986	170.739	139.332	39.254	0.068926	One Microsoft Way	98052	Redmond	wa	MSFT
1	1987	287.754	239.105	71.878	0.228631	One Microsoft Way	98052	Redmond	wa	MSFT
2	1988	493.019	375.498	123.908	0.300359	One Microsoft Way	98052	Redmond	wa	MSFT
3	1989	720.598	561.780	170.538	0.237597	One Microsoft Way	98052	Redmond	wa	MSFT
4	1990	1105.349	918.563	279.186	0.681410	One Microsoft Way	98052	Redmond	wa	MSFT

Look at the rightmost column, Ticker. Notice that it has the string ‘MSFT’ in every row. It’s that simple! That’s all that’s needed to create a new column with a single value.

Even though the above line of code, dfMSFT['Ticker'] = 'MSFT' appears really simple, there’s a lot going on behind the scenes. You need to understand a little of that before we proceed. When you wrote dfMSFT['Ticker'], that refers to a Series. Remember that single brackets means Series. So Pandas looked into the DataFrame dfMSFT and discovered that there is no existing column with the name Ticker. Pandas then inferred that you want to create that column. Now the DataFrame dfMSFT has 34 rows, but you gave Pandas only one value. Pandas therefore inferred that you want that same value in every row. It therefore created a Series for you, and repeated the string value ‘MSFT’ 34 times (the language R calls this “recycling”).

So to summarize, to create a new column from a single value:

Type the DataFrame name followed by single brackets
Inside the single brackets, type the name of the new column as a string
After the brackets, type an equals sign, and then the value you want.
This value will be repeated in every row. It can be numeric or string.

Creating a Column from a List#

It is possible to create a new column from a list. That is because Pandas can easily convert a list to a Series. The only catch is that the list must have the same length as the DataFrame! If the DataFrame has n rows, the list length must be exactly n.

When might you want to do this? Here’s a silly example. Many years ago, I read that Bill Gates used to eat M&M’s and drink Diet Coke. I think I heard that when Bill Gates was a guest on the Tonight Show with Jay Leno. But eventually he got married and his wife likely put a stop to that. Let’s say that for the first 10 years, Bill liked Diet Coke and M&M’s, but afterwards, he liked rice cakes. Let’s create a new column in the dataset called Bill Gates likes. We’ll populate that column using a list.

Take a look at the code cells below.

# Create a list of length 10
bill_likes = ['Diet Coke and M&Ms'] * 10
# Add 24 values to the list
bill_likes += ['Rice cakes'] * 24

print(bill_likes)
print()
print(f'The list length is: {len(bill_likes)}')

['Diet Coke and M&Ms', 'Diet Coke and M&Ms', 'Diet Coke and M&Ms', 'Diet Coke and M&Ms', 'Diet Coke and M&Ms', 'Diet Coke and M&Ms', 'Diet Coke and M&Ms', 'Diet Coke and M&Ms', 'Diet Coke and M&Ms', 'Diet Coke and M&Ms', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes', 'Rice cakes']

The list length is: 34

Now let’s create a new column in our dataset and populate it with our list. Look at the code cell below:

dfMSFT['Bill Gates likes'] = bill_likes
dfMSFT

	Fiscal Year	Assets - Total	Common Equity	Net Income	Adjusted Price Close - Fiscal	Address	ZIP Code	City	State	Ticker	Bill Gates likes
0	1986	170.739	139.332	39.254	0.068926	One Microsoft Way	98052	Redmond	wa	MSFT	Diet Coke and M&Ms
1	1987	287.754	239.105	71.878	0.228631	One Microsoft Way	98052	Redmond	wa	MSFT	Diet Coke and M&Ms
2	1988	493.019	375.498	123.908	0.300359	One Microsoft Way	98052	Redmond	wa	MSFT	Diet Coke and M&Ms
3	1989	720.598	561.780	170.538	0.237597	One Microsoft Way	98052	Redmond	wa	MSFT	Diet Coke and M&Ms
4	1990	1105.349	918.563	279.186	0.681410	One Microsoft Way	98052	Redmond	wa	MSFT	Diet Coke and M&Ms
5	1991	1644.184	1350.831	462.743	0.916206	One Microsoft Way	98052	Redmond	wa	MSFT	Diet Coke and M&Ms
6	1992	2639.903	2192.958	708.060	1.412133	One Microsoft Way	98052	Redmond	wa	MSFT	Diet Coke and M&Ms
7	1993	3805.000	3242.000	953.000	1.775253	One Microsoft Way	98052	Redmond	wa	MSFT	Diet Coke and M&Ms
8	1994	5363.000	4450.000	1146.000	2.082897	One Microsoft Way	98052	Redmond	wa	MSFT	Diet Coke and M&Ms
9	1995	7210.000	5333.000	1453.000	3.646332	One Microsoft Way	98052	Redmond	wa	MSFT	Diet Coke and M&Ms
10	1996	10093.000	6908.000	2195.000	4.846645	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
11	1997	14387.000	9797.000	3454.000	10.197618	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
12	1998	22357.000	15647.000	4490.000	17.490292	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
13	1999	37156.000	27458.000	7785.000	29.110128	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
14	2000	52150.000	41368.000	9421.000	25.821878	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
15	2001	59257.000	47289.000	7346.000	23.562454	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
16	2002	67646.000	52180.000	7829.000	17.655706	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
17	2003	79571.000	61020.000	9993.000	16.605047	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
18	2004	92389.000	74825.000	8168.000	18.599861	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
19	2005	70815.000	48115.000	12254.000	18.196650	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
20	2006	69597.000	40104.000	12599.000	17.294693	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
21	2007	63171.000	31097.000	14065.000	22.178057	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
22	2008	72793.000	36286.000	17681.000	21.002312	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
23	2009	77888.000	39558.000	14569.000	18.585894	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
24	2010	86113.000	46175.000	18760.000	18.340149	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
25	2011	108704.000	57083.000	23150.000	21.223999	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
26	2012	121271.000	66363.000	16978.000	25.651567	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
27	2013	142431.000	78944.000	21863.000	29.846550	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
28	2014	172384.000	89784.000	22074.000	37.096779	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
29	2015	176223.000	80083.000	12193.000	40.314255	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
30	2016	193694.000	71997.000	16798.000	48.023003	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
31	2017	241086.000	72394.000	21204.000	66.308907	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
32	2018	258848.000	82718.000	16571.000	96.712257	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes
33	2019	286556.000	102330.000	39240.000	133.515411	One Microsoft Way	98052	Redmond	wa	MSFT	Rice cakes

If you look at the entire DataFrame, you will notice that the last column contains our list. Under the hood, Pandas took our list, created a new Series (using the constructor pd.Series), and added that Series as a new column of the DataFrame.

As we mentioned before, the list length must be exactly equal to the number of rows of the DataFrame. If it differs, you will get an error. To see this, look at the code below.

dfMSFT['Bill Gates likes'] = ['Diet Coke and M&Ms', 'Rice cakes']

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[10], line 1
----> 1 dfMSFT['Bill Gates likes'] = ['Diet Coke and M&Ms', 'Rice cakes']

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pandas/core/frame.py:4311, in DataFrame.__setitem__(self, key, value)
   4308     self._setitem_array([key], value)
   4309 else:
   4310     # set column
-> 4311     self._set_item(key, value)

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pandas/core/frame.py:4524, in DataFrame._set_item(self, key, value)
   4514 def _set_item(self, key, value) -> None:
   4515     """
   4516     Add series to DataFrame in specified column.
   4517 
   (...)
   4522     ensure homogeneity.
   4523     """
-> 4524     value, refs = self._sanitize_column(value)
   4526     if (
   4527         key in self.columns
   4528         and value.ndim == 1
   4529         and not isinstance(value.dtype, ExtensionDtype)
   4530     ):
   4531         # broadcast across multiple columns if necessary
   4532         if not self.columns.is_unique or isinstance(self.columns, MultiIndex):

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pandas/core/frame.py:5266, in DataFrame._sanitize_column(self, value)
   5263     return _reindex_for_setitem(value, self.index)
   5265 if is_list_like(value):
-> 5266     com.require_length_match(value, self.index)
   5267 arr = sanitize_array(value, self.index, copy=True, allow_2d=True)
   5268 if (
   5269     isinstance(value, Index)
   5270     and value.dtype == "object"
   (...)
   5273     # TODO: Remove kludge in sanitize_array for string mode when enforcing
   5274     # this deprecation

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pandas/core/common.py:573, in require_length_match(data, index)
    569 """
    570 Check the length of data matches the length of the index.
    571 """
    572 if len(data) != len(index):
--> 573     raise ValueError(
    574         "Length of values "
    575         f"({len(data)}) "
    576         "does not match length of index "
    577         f"({len(index)})"
    578     )

ValueError: Length of values (2) does not match length of index (34)

Look at the last line of the error message. It tells you that the list length does not equal the number of rows.

Creating a Column as a Transformation of an Existing Column#

It is very common to transform a column. By transform, we mean do something to it, like strip out white space, or perform a mathematical transformation. In this section, we will provide some examples that show you how to do it. We will:

Strip whitespace from all string columns.
Convert state abbreviations to uppercase.
Convert all balance sheet and income statement columns from millions of dollars to actual dollars.

Often, when cleaning data like this, you want to clean the data “in place”. That means that you don’t want to create new columns. You want to transform the data and store it in the existing columns. We’ll show you how to do that with the string columns. For the numeric columns, we’ll create new columns.

Example: Strip whitespace from string columns; do it “in place”#

Let’s take a look at the values in the City column. If you look closely, you’ll find that there are unneeded whitespace characters. We put those there to dirty the data. See the next cell:

dfMSFT.at[0, 'City']

'  Redmond '

The following code cell strips whitespace from the City column and saves the result in the same column:

dfMSFT['City'] = dfMSFT['City'].str.strip()

Let’s analyze the previous code, specifically the right-hand side of the equals sign. There, we wrote dfMSFT['City'].str.strip(). That tells Pandas to take the column City as a Series and run the string method strip. With no arguments, that method removes leading and trailing whitespace characters from every value in the Series and returns a new Series. We then save that new Series in the existing column dfMSFT['City']. Let’s check whether it worked by running the code below:

dfMSFT.at[0, 'City']

'Redmond'

Now let’s repeat this for all string columns. The easiest way to do that is with a for loop. Notice that we are using the for loop to iterate over columns. That’s fine. Do not iterate over rows with a for loop unless absolutely necessary! Iterating over rows is very slow. It’s literally thousands of times faster to use Pandas methods to iterate over rows.

for col in ['Address', 'City', 'State', 'Ticker', 'Bill Gates likes']:
    dfMSFT[col] = dfMSFT[col].str.strip()

Notice how compact the code is! For each column, it strips the whitespace and saves the new, cleaned column in its original location.

Example: Convert a string column to uppercase; do it “in place”#

Notice that the State column contains the string ‘wa’. In the U.S., we typically write state abbreviations using uppercase. Let’s convert that column. The code is very similar to what we did in the previous section.

dfMSFT['State'] = dfMSFT['State'].str.upper()
dfMSFT.head(3)

	Fiscal Year	Assets - Total	Common Equity	Net Income	Adjusted Price Close - Fiscal	Address	ZIP Code	City	State	Ticker	Bill Gates likes
0	1986	170.739	139.332	39.254	0.068926	One Microsoft Way	98052	Redmond	WA	MSFT	Diet Coke and M&Ms
1	1987	287.754	239.105	71.878	0.228631	One Microsoft Way	98052	Redmond	WA	MSFT	Diet Coke and M&Ms
2	1988	493.019	375.498	123.908	0.300359	One Microsoft Way	98052	Redmond	WA	MSFT	Diet Coke and M&Ms

Notice that we called the built-in string method upper on the column and saved the result in the column State. The code took the column State, which is a Series, and ran a string method on it. That method returned a new Series, which we saved as the column State. Thus, the new Series overwrote the existing Series.

Link to Documentation for Pandas Built-In String Methods#

Pandas provides many useful methods for working with columns of strings. Here’s a link to the documentation. Check it out! We usually learn something new every time we visit this page.

Example: Convert numeric columns from millions to actual; create new columns#

Sometimes, you wish to transform your data but preserve the existing data. Let’s say we want to convert the balance sheet and income statement amounts from millions of dollars to actual dollars. We can do this by multiplying each value by one million. Here’s some code to do it:

for col in ['Assets - Total', 'Common Equity', 'Net Income']:
    dfMSFT[f"{col} ACTUAL"] = dfMSFT[col] * 1000000

Let’s walk through that code for the first column, ‘Assets - Total’. When the for loop executed for this value, the variable col was set to 'Assets - Total'. The code that was executed was:

dfMSFT[f"{col} ACTUAL"] = dfMSFT[col] * 1000000

The right-hand side took the Series dfMSFT['Assets - Total'] and multiplied each value by one million. It saved the result in a new Series. Notice that we didn’t have to write a for loop. Pandas automatically did the multiplication for every value in the series. We then stored the result in a new column ‘Assets - Total ACTUAL’. Notice we used an f-string to create the new column name.

Let’s take a look at the first few rows of the data, for only the Assets columns:

dfMSFT[['Assets - Total', 'Assets - Total ACTUAL']].head()

	Assets - Total	Assets - Total ACTUAL
0	170.739	1.707390e+08
1	287.754	2.877540e+08
2	493.019	4.930190e+08
3	720.598	7.205980e+08
4	1105.349	1.105349e+09

Notice that, in each row, the amount in the ‘Assets - Total ACTUAL’ column is exactly one million times the value in the ‘Assets - Total’ column.

Creating a Column as a Combination of Multiple Columns#

Often, you want to create a column that is a combination of multiple columns. Sometimes, you want to combine the columns using math. In other situations, you want to combine strings. In this section, we’ll show you examples of both.

Combining Numeric Columns Using Math#

Let’s compute return on assets (ROA) in each year. ROA is defined as net income divided by average total assets during the year. For simplicity, let’s just use total assets at the end of the year. Thus, ROA in year t will be computed as net income for year t divided by assets at the end of year t.

The code is really simple:

dfMSFT['ROA'] = dfMSFT['Net Income'] / dfMSFT['Assets - Total']
dfMSFT[['Net Income', 'Assets - Total', 'ROA']].head()

	Net Income	Assets - Total	ROA
0	39.254	170.739	0.229906
1	71.878	287.754	0.249790
2	123.908	493.019	0.251325
3	170.538	720.598	0.236662
4	279.186	1105.349	0.252577

First, notice that we do not need to adjust for millions. Both net income and total assets are denominated in millions, so we can simply divide the two.

Second, look at the line of code:

dfMSFT['ROA'] = dfMSFT['Net Income'] / dfMSFT['Assets - Total']

We divided a Series by a Series. When we do that, Pandas assumes you want to perform division row-by-row. And since both Series are columns in the same DataFrame, we don’t have to worry about length. Since they are columns of the same DataFrame, they have the exact same length.

Combining String Columns#

Our data has separate columns for the street address, city, state, and ZIP code. Let’s combine them into a complete postal address. Take a look at the code below. The string '\n' means “newline character”.

dfMSFT['Full Address'] = dfMSFT['Address'] + '\n' + dfMSFT['City'] + ', ' + dfMSFT['State'] + ' ' + dfMSFT['ZIP Code'].astype(str)

The above code adds multiple Series together. As with numeric columns, Pandas assumes you want to combine values in each row. Also notice that we can add a single string value to a Series and Pandas figures out you want to add that value to every row.

Finally, notice that we had to convert the ZIP Code column to string before we could add it to the other string columns.

Let’s take a look at one of the final address values:

print(dfMSFT.at[0, 'Full Address'])

One Microsoft Way
Redmond, WA 98052

You might think this is a trivial example because the address is the same in every year. However, companies sometimes move their headquarters. For example, a few years ago, Boeing Corp. moved their headquarters from Seattle to Chicago.

Conditionals: Creating a Column Using an if-statement#

Say you want to create a column whose values are based on a condition. Let’s say that in years where Microsoft’s ROA exceeds 20%, you were impressed. In other years, you were underwhelmed. You want a new column called Impression that contains the string “Wow!” or “Meh”, based on the ROA in that year.

There are many ways to do this. We recommend you use the NumPy function numpy.where. Here’s a link to the official documentation for that function.

Take a look at the code cell below:

dfMSFT['Performance'] = np.where(dfMSFT['ROA'] > 0.2, "Wow!", "Meh")

The documentation for numpy.where tells us that the first argument is a condition that can be “array like”. That means we can pass it a Series of True/False values, which we did. We passed it a Series that contains True for values that exceed 20%, and False otherwise. The next argument is “x”, which is the value returned for True values. The final argument is “y”, the value returned for False values.

Numpy.where created an “array”, which is like a list. Since Pandas is built on top of NumPy, Pandas converted that array to a Series and then created a new column called ‘Performance’.

Let’s take a look at 5 randomly chosen rows below. Notice that those rows that contain ROA greater (fiscal years 1990 and 2008) than 20% have “Wow!” in the new column. Rows that don’t meet that condition have “Meh” in the new column (fiscal years 2017, 2004, and 2003).

dfMSFT[['Fiscal Year', 'ROA', 'Performance']].sample(n=5, random_state=3)

	Fiscal Year	ROA	Performance
4	1990	0.252577	Wow!
31	2017	0.087952	Meh
18	2004	0.088409	Meh
22	2008	0.242894	Wow!
17	2003	0.125586	Meh