Viewing a DataFrame#

Learning Objectives#

  • Write the import statements needed to work with Pandas, in their most common form.
  • View an entire DataFrame in a Jupyter notebook.
  • Use the head and tail methods to view the first or last rows of a DataFrame.
  • Use the sample method to retrieve random rows from a DataFrame.
  • View a DataFrame in Variable Inspector.

Overview#

Before we teach you how to load data, we’re going to show you how to view data that has been loaded into a Pandas DataFrame. After you load your data, the first thing you will do is view it. Therefore, it makes sense to know how to view the data before learning how to load it.

In this chapter and the next (chapter 3-6), we’ll show you different ways to view data stored in Pandas DataFrames. We’ll also teach you the “anatomy” of a DataFrame. That way, after you learn how to load data, you can immediately start working with your data.

Required Import Statements#

Before you can work with Pandas, you need to import the Pandas library. The most common way to write the import statement is:

import pandas as pd

Most people worldwide use the alias pd for the Pandas library. We recommend you do the same. That way, when you search online for help with Pandas, you won’t be surprised to see pd. all over the place, and you can easily copy and paste the online code into your notebook.

The Pandas library is built on top of a numerical analysis library called NumPy. When working with DataFrames, you will find often yourself calling functions from NumPy. Therefore, most people include the following two import statements at the top of all of their notebooks:

import numpy as np
import pandas as pd

Let’s load these libraries now. Run the cell below.

import numpy as np
import pandas as pd

Loading a Sample DataFrame#

Run the code below to load some sample data. For now, don’t worry about how the code works. However, in the folder where this notebook is saved, there should be a subfolder data. Within that subfolder, there should be a file Admission_Predict.csv. This file contains data on graduate admissions that we got from Kaggle (see here).

df = pd.read_csv('data/Admission_Predict.csv')
type(df)
pandas.core.frame.DataFrame

Viewing the Entire DataFrame#

The code above loaded the data from a CSV and stored it in a Pandas DataFrame. The output of the type statement tells that the variable df is of type pandas.core.frame.DataFrame.

How do we view this data? Well, the simplest way to see the value of any variable is to type the variable’s name into an empty cell and run the cell. If we do that, here’s what we see:

df
Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit
0 1 337 118 4 4.5 4.5 9.65 1 0.92
1 2 324 107 4 4.0 4.5 8.87 1 0.76
2 3 316 104 3 3.0 3.5 8.00 1 0.72
3 4 322 110 3 3.5 2.5 8.67 1 0.80
4 5 314 103 2 2.0 3.0 8.21 0 0.65
... ... ... ... ... ... ... ... ... ...
395 396 324 110 3 3.5 3.5 9.04 1 0.82
396 397 325 107 3 3.0 3.5 9.11 1 0.84
397 398 330 116 4 5.0 4.5 9.45 1 0.91
398 399 312 103 3 3.5 4.0 8.78 0 0.67
399 400 333 117 4 5.0 4.0 9.66 1 0.95

400 rows × 9 columns

Let’s examine the output of the last code cell. You can see something that looks like an Excel worksheet. The column names are bolded, and at the bottom we are told that there are 400 rows and 9 columns. However, notice that Jupyter didn’t show us all the rows. By default, it only shows the first and last rows. It uses “…” to indicate that some rows weren’t displayed. This is the default behavior because many DataFrames contain millions of rows and displaying those would be time-consuming and somewhat useless to a data analyst.

It is possible to change the maximum number of displayed rows by change one of Pandas’ options (e.g. see here), but we don’t recommend that. There are better ways to view your data.

Viewing the First or Last Rows of a DataFrame#

You can usually get a sense for your data by looking at only a few rows. For example, if your data set didn’t load correctly, you will see that in the first few rows. If your CSV file was dirty, you will see that in the first few rows.

The head Method of DataFrame#

To see the first 5 rows of your dataset, use the head method of DataFrame. We show you this in the code cell below.

df.head()
Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit
0 1 337 118 4 4.5 4.5 9.65 1 0.92
1 2 324 107 4 4.0 4.5 8.87 1 0.76
2 3 316 104 3 3.0 3.5 8.00 1 0.72
3 4 322 110 3 3.5 2.5 8.67 1 0.80
4 5 314 103 2 2.0 3.0 8.21 0 0.65

The head method has one optional argument, n. By default, n is 5 and the head method will show the first 5 rows. If you want a different number, just pass a single, integer argument to head. See the code cells below for examples:

df.head(2)
Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit
0 1 337 118 4 4.5 4.5 9.65 1 0.92
1 2 324 107 4 4.0 4.5 8.87 1 0.76
df.head(n=6)
Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit
0 1 337 118 4 4.5 4.5 9.65 1 0.92
1 2 324 107 4 4.0 4.5 8.87 1 0.76
2 3 316 104 3 3.0 3.5 8.00 1 0.72
3 4 322 110 3 3.5 2.5 8.67 1 0.80
4 5 314 103 2 2.0 3.0 8.21 0 0.65
5 6 330 115 5 4.5 3.0 9.34 1 0.90

The tail Method of DataFrame#

The tail behaves identically to head, but shows the last rows of your data.

Viewing a Random Sample of the Rows of a DataFrame#

Sometimes you want to see randomly chosen rows from your DataFrame. Pandas makes this very easy. Simply call the sample method.

df.sample()
Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit
376 377 297 96 2 2.5 2.0 7.43 0 0.34

By default, sample returns one randomly-chosen row from the data. If you want more, pass the optional argument n.

df.sample(5)
Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit
181 182 305 107 2 2.5 2.5 8.42 0 0.71
2 3 316 104 3 3.0 3.5 8.00 1 0.72
231 232 319 106 3 3.5 2.5 8.33 1 0.74
297 298 320 120 3 4.0 4.5 9.11 0 0.86
197 198 310 106 2 3.5 2.5 8.33 0 0.73

Notice the row numbers in the left-hand column.

The sample function has many powerful features. For example, you can return a subset of the columns. You can sample with and without replacement. We will not describe those features at this time. If you’re interested, type the following into an empty cell and evaluate it. Or click here.:

?pd.core.frame.DataFrame.sample