Functions

Functions#

Learning Objectives#

Overview of Functions

Define a function (in programming) and explain why functions are useful.
Explain why a function can be thought of as a black box, and define the terms argument and return value.

Writing Functions

Given some input and output specifications, write a simple function that takes an arbitrary number of arguments.
Identify valid and invalid function names.

Variable Scope and Functions

Explain the concept of masking when it comes to variables inside functions.
Given sample code, identify the values of variables at different points in the code execution.

Using Predefined Functions

Write import statements, with and without aliases.
Write from statements to import specific functions from a module.

Define the terms positional argument and keyword argument.
Call an imported function using positional arguments.
Call an imported function using keyword arguments.

Reading Function Documentation

Read and interpret simple function signatures from the Python documentation.
Retrieve the help for a function inside a Jupyter notebook.

Overview of Functions#

Functions are reusable blocks of code.

Let’s think about that statement for a minute. Why might you want a reusable block of code? Well, when doing data analytics, you will typically use your computer to perform the same task more than once. Your life will be much easier if you do not have to rewrite the same code again and again.

Imagine auditing a list of a million transactions. One of your aims will be to find transactions that meet certain criteria. In the old days, this was done by hand and therefore auditors developed sampling procedures. Nowadays, computers can analyze all transactions very quickly and flag suspicious ones. How will you tell a computer to find transactions that meet your criteria? You will call a function! Without a function, you would have to write a for loop to iterate over all the transactions. Within each iteration, you would check your criteria and, if they are met, copy the transaction to a list of suspicious transactions. That’s a lot of code to write and you are likely to make errors. A better way is to use pre-built functions for searching that have been thoroughly tested. Later in the course, we will show you the Pandas library which has built-in functions for this purpose.

Functions are one of the most important concepts in programming. In this section, you will learn how functions work and write your own simple functions. You will also learn to read function “signatures” so that you will better understand the documentation for pre-built functions.

Functions as Black Boxes#

Function as a black box

A “black box” is a system that takes inputs, does something with them, and provides some output. You can think of mathematical functions and programming functions in this way. The inputs are called arguments and the output is called the return value.

When you use a function that someone else has written, you are treating it as a black box. You often will not know what is inside the black box and how it works. You will simply know what arguments to give your function and what output to expect. The documentation for a given function will usually describe the arguments, the return value, and the relationship between the return value and the arguments. The documentation will describe implementation details only when necessary.

To understand why you often do not need to know what’s inside the black box, let’s consider a built-in Python function called max. If we go to the official Python documentation for this function, we learn that the function takes two (or more) arguments and returns the largest of them. It does not tell us how it calculates this.

There are multiple ways to compute the maximum of two numbers, x and y. One is to use an if statement to check which number is larger.

if x > y:
    return x
else:
    return y

Another way to compute the maximum is to use the formula \(\frac{x + y + |x - y|}{2}\).

return (x + y + abs(x-y)) / 2

There may be other ways to compute the maximum of two numbers. The point is that you do not need to care how this function works. As long as you are confident that the function does what it claims to do, you don’t need to worry about its implementation details. You can treat the function as a black box.

Writing your own functions#

In this section, we will show you how to write your own functions. This will help you understand how functions work and how to use functions that other people wrote.

A Simple Function#

Let’s write a very simple function that returns the average of two numbers.

def MyAverage(x, y):
    return (x + y) / 2

The above code defines a function named MyAverage. A function definition begins with the keyword def, which stands for define. Following the def keyword is a name for the function. Following the function name is an argument list, and this list must be in parentheses and separated by commas. Our MyAverage function takes two arguments, x and y. These arguments will be variables, but they will only live inside our function. The first line of the definition ends with a colon. The next lines are the body of the function and they must be indented. Python uses the indentation to determine when the function ends. Python will execute the body of the function, line by line, until it reaches a return statement. It will evaluate whatever is given to the return statement and “return” it to whatever line of code called the function.

Let’s work through an example before explaining the subtleties.

revenue_2017 = 100
revenue_2018 = 200
avgRevenue = MyAverage(revenue_2017, revenue_2018)

print(f'The average revenue over the last two years was {avgRevenue}')

The above code snippet does the following. It creates two variables, revenue_2017 and revenue_2018 and assigns them values. It then creates a new variable called avgRevenue. However, when Python sees the line to create avgRevenue, it does not immediately know the value of right-hand-side of the equation. It has to call the function MyAverage. When it does, it passes as arguments the variables revenue_2017 and revenue_2018. When the function is called, Python executes the code that follows the line def MyAverage(x, y). Before it runs MyAverage, it assigns the value of revenue_2017 to x and revenue_2018 to y. It does this because MyAverage is expecting two arguments, x and y. We gave it two arguments. It therefore assigned the first argument to x and the second to y. Python then evaluates the statement return (x + y) / 2. It first does the arithmetic and computes the value 150. It then returns this to the calling statement, avgRevenue = MyAverage(revenue_2017, revenue_2018), and the value 150 is assigned to avgRevenue. The final statement prints “The average revenue over the last two years was 150.0”.

Syntax for Defining Functions#

To define a function, use the following syntax:

def FunctionName(arg1, arg2, ...):
    do_stuff_here
    return return_value

Notes:

You may define a function with zero arguments.
When you call a function with zero arguments, you must use empty parentheses.
A function does not have to return anything. If your function returns nothing, you may use a return statement with no return value, or you may omit the return statement altogether.

Here’s an example of a function that takes no arguments and returns nothing.

def MakeAnimalNoises():
    print('Cluck')
    print('Moo')
    print('Woof')

Notice I omitted the return statement. To call this function, you would type MakeAnimalNoises().

Rules for Naming Functions#

The rules for naming functions are the same as the rules for naming variables. Do you remember the rules? :) If not, here’s a refresher. The rules are:

Function names can contain:
- Uppercase letters (A-Z)
- Lowercase letters (a-z)
- Digits (0-9)
- The underscore character ( _ ). Note: this is different than the hyphen, or dash (-).
Function names cannot contain spaces.
A function name cannot begin with a digit.

Arguments are Local Variables#

Earlier in these notes, we talked about environments and state. When you call a function, Python creates a new environment. The new environment inherits everything from its calling environment and creates some new variables on top of that. This concept, and the associated rules, can quickly become confusing so let’s work through some simple examples to learn about this.

x = 15

def SampleFunction():
    print(f'x is {x}')
    
SampleFunction()

x is 15

The above code did the following:

Created a variable called x and set its value to 15.
Created a function SampleFunction that takes no arguments and prints x.
Called SampleFunction.

Notice that SampleFunction can see x. That’s because the environment for SampleFunction inherits its calling environment. However, within SampleFunction, x is considered a “global variable”. It’s global because it was created outside the function. Variables created inside the function are called “local variables”.

What happens if we try to change x inside the function?

x = 15

def SampleFunction2():
    x = 0
    print(f'Inside SampleFunction2, x is {x}')
    
SampleFunction2()
print(f'Outside SampleFunction2, x is {x}')

Inside SampleFunction2, x is 0
Outside SampleFunction2, x is 15

Confused? When we called SampleFunction2, Python created a new environment that inherited x and its value, 15. However, Python does not want you to accidentally change a global variable inside a function; that might screw up other parts of your program that rely on the global variable. To prevent this, when we set x to 0 inside SampleFunction2, Python created a new local variable called x that masked the global variable with the same name. When the function finished, its environment was destroyed and local variable x was destroyed.

The exact same thing happens if a function argument has the same name as a variable in the calling environment. The argument will create a local variable that masks the global variable in the calling environment. When the function terminates, its local variable will be destroyed. See the following example:

x = 3
y = -4

def SampleFunction3(x, y):
    print(f'Inside SampleFunction3, x is {x} and y is {y}')
    
SampleFunction3(0, 1)
print(f'Outside SampleFunction3, x is {x} and y is {y}')

Inside SampleFunction3, x is 0 and y is 1
Outside SampleFunction3, x is 3 and y is -4

In the above code, x and y are created and initialized to 3 and -4, respectively. The function SampleFunction3 is defined and then called with arguments 0 and 1. When the function is called, a new environment is created. Local variables x and y are created and assigned values 0 and 1, respectively, because those were the arguments given in the function call. When the function terminates, the local variables are destroyed.

SUMMARY OF USER-DEFINED FUNCTIONS#

A function is a reusable block of code. Its inputs are called arguments and its output is called a return value.

To define a function, use the following syntax:

def FunctionName(arg1, arg2, ...):
    do_stuff_here
    return return_value

Finally, note that a function does not require arguments and that it does not have to return anything.

Using Functions that Other People Wrote#

When doing data analytics, you will mostly work with functions that other people wrote. In this section, we will show you how to work with such functions. We will show you:

How functions are organized into modules.
How to use a function from a module.
How to call a function using:
- Positional arguments
- Keyword arguments
- Optional arguments
How to read documentation for a function.

Modules and Import Statements#

Since Python is so popular, people have written thousands of Python functions that you can download and use in your work. Without organizing those functions, many problems would result. For example, say I want to write a function to filter some data. A natural name for my function is filter. But chances are that somebody else has already written a function named filter. How do I tell Python that I want to use my filter function and not another one? Also, what if I write a set of related functions and I want to store them in the same place? The solution to these problems is to package related functions into a module (sometimes called a package or a library). A module is simply a group of functions, variables, and classes (we will define these later in the course) that reside in the same file.

When you open Python, it loads some “built-in” functions into your environment. You can find these here. Python also provides a “standard library” that contains many modules with useful functions. If you want to use one of those modules, you have to tell Python by using an import statement. Say we want to use Python’s math library. We would type:

import math

That statement tells Python to load all of the math functions from its math module into your environment. After you execute this import statement, you can use any of the functions. Here’s some sample code that computes the factorial of a number.

import math

x = 5
fact_x = math.factorial(x)

print(f'{x}! = {fact_x}')

5! = 120

Let’s examine this code, line-by-line. The first line tells Python to import its math module. All that does is to add the names of everything from the math module to your current environment. This documentation page lists everything that’s in the math module.

The second line, x = 5 creates a variable x and initializes it to 5. The third line calls the function math.factorial and computes the factorial of x. Note that the function’s name is math.factorial, not factorial. The fourth line prints some output.

You might be wondering why Python requires you to type math. to use something from the math module. That’s because it does not want to hide a function that you may have written with the same name. Also, you may have imported another library with a function named factorial, and Python needs a way to distinguish between the two functions.

If you think it’s a pain in the ass to have to type math. before every function, you have two options:

Import the functions you want from a module directly into your environment.
Define an alias for the module.

Option 1: Import the functions directly#

If you do not want to type math. before every function name, you can use something like the following:

from math import factorial, log, sin

That code imports specific functions from the math library, and you will not have to prefix your functions with “math.”. If you want to import everything from the math library this way, you can type:

from math import *

We do not recommend importing everything from a module because it is very likely you will hide some other function in your environment.

Option 2: Define an alias for the module#

Say you want to use a library with a long name, like the statistics module from the standard library. Say that you want to use the full names of the statistics functions, like statistics.median, but do not want to have to type out statistics. before every function call. Python allows you to create an “alias” for the library name. Here’s how:

import statistics as st

mydata = [1,3,5,7,9]
my_median = st.median(mydata)

print(f'The median of my data is {my_median}')

The median of my data is 5

In the above code, we tell Python that we want to use “st” as a shortcut for the statistics module. We can choose any valid variable name for our alias.

Calling Functions#

In the previous section, we showed you how to call simple functions. Soon, you will work with more complicated functions for data analysis. In order to call those functions, you need to learn a little more about passing arguments to functions in Python. We will teach you those details in this subsection.

In the remainder of this section, we will work with a function, square, from the popular plotting library Bokeh. We will (probably) teach you more about this library later in the course so do not worry if some of the details are confusing at this point. In the following examples, simply focus on the arguments to the function square.

Let’s begin by creating a simple scatter plot. Do that by executing the code cell below.

from bokeh.io import output_notebook
from bokeh.plotting import figure, show
output_notebook()

p = figure(title="Simple Plot Example", x_axis_label='x', y_axis_label='y', width=300, height=300)

x_coords = [-4,-3,-2,-1,0,1,2,3,4]
y_coords = [16,9,4,1,0,1,4,9,16]

p.scatter(x_coords, y_coords, 7, marker='square')

show(p)

Loading BokehJS ...

Let’s focus on one line of code from the code cell above:

p.square(x_coords, y_coords, 7)

We passed three arguments to that function. The first argument was a list of x-coordinates, the second argument was a list of y-coordinates, and the third argument was a number indicating the size of the squares. When we called that function, Bokeh plotted a square at each (x,y) coordinate and set the square’s size to 7.

Positional Arguments#

If you do not tell it otherwise, square assumes that the first argument is the x-coordinates, the second argument is the y-coordinates, and the third argument is a marker size. When we call functions in this manner, we rely on positional arguments. The position of the argument in the function call has meaning. You can only know the correct order of arguments if you look up the documentation for the function. We will show you more about documentation soon. For now, look at this “function signature” from the Bokeh documentation:

Signature: square(x, y, size=4, angle=0.0, **kwargs)
Args:
x : The x-axis coordinates for the center of the markers.
y : The y-axis coordinates for the center of the markers.
size : The size values for the markers in screen space units.

This “signature” tells us that the first argument is the x-coordinates, the second is the y-coordinates, the third is the size, and so on. It tells us more, but let’s ignore that for a moment.

If you want, you can use positional arguments almost every time you call a function, but we do not recommend it. You will see that some functions accept dozens of arguments, many of them optional, making it difficult to remember which argument is which. Therefore, we recommend you use a feature of Python called keyword arguments_. This feature allows you to tell the function which arguments you are giving it by passing in their names. Let’s do that by running the above code again. This time, notice that the call to square is different.

p.scatter(x=x_coords, y=y_coords, size=6, marker='square')

show(p)

Keyword Arguments#

Let’s examine the new call to square:

p.square(x=x_coords, y=y_coords, size=6)

This time, we explicitly told Python that the value of the x argument is x_coords, the value of y is y_coords, and the value of size is 6. A function called this way relies on keyword arguments; we tell Python the names of the arguments we want to pass.

Keyword arguments have some advantages:

Your code is more readable. You can clearly see what the arguments are.
You do not have to pass the arguments in order! You could have typed p.square(size=6, y=y_coords, x=x_coords) and you would have gotten the same result!
You are less likely to make an error by putting the arguments in the wrong order.

To drive this point home, let’s revisit a line of code from a previous cell:

p = figure(title="Simple Line Example", x_axis_label='x', y_axis_label='y', width=300, height=300)

This line of code relied on keyword arguments. Notice how easy it is to read! You can understand what the arguments mean without referring to the documentation. Also, it is common to write code and then revisit it months (or years) later. If you use named arguments, it will be much easier to remember how your code works.

Optional Arguments and Default Values#

The last thing you need to know about calling functions is that many functions have optional arguments. You do not have to specify values for these arguments. If you omit these optional arguments, default values are used.

Let’s take a look at the documentation for the square function from above. For brevity, we only show you a portion of the documentation. In the next section, we will teach you more about reading function documentation.

Signature: p.square(x, y, size=4, angle=0.0, **kwargs)

Notice the function signature says size=4, angle=0.0. When you see that, it means that the size argument is optional and if you do not specify it, it will automatically receive the value 4. The same logic applies to the angle argument.

Reading function documentation#

Determining which functions to look up#

Many former students of ACCY 570 and ACCY 575 went through the following steps every time they wanted to perform an analytics task in Python::

Wonder whether there are functions that do what you want.
Do Google searches.
Read web pages, often at StackOverflow, a site where people ask and answer questions about coding.
Copy and paste code into your notebook.
Get weird results or errors.
Say bad words.

We advocate a different approach that will be less frustrating in the long run. The first thing you should do is to plan your code. You should make an outline of the steps that you want Python (or Excel or whatever) to take. That should be step one. Once you have developed an outline, then you can determine whether there are already functions to do what you want. And guess what? We’re here to make your life easier so we will often tell you which functions to use. You can then look up the official documentation for those functions. If, after you read the documentation, you are still confused, then either ask the teaching staff for help, or do a specific Google search for the function you want to use.

In the remainder of this section, we will teach you about the official Python documentation. Later in the course, we will show you the documentation for other analytics-related modules, like Pandas.

Official Python Documentation#

Overview#

The official documentation for the Python programming language, and its standard libraries, is located at https://docs.python.org/3/. Please make sure you are reading the documentation for Python version 3.7!!

Before we discuss the specifics of this documentation, we want to warn you that the official documentation, well, sucks. It’s absolutely awful. It is written by professional programmers for other professional programmers. Very few, if any, examples are provided. In our opinion, this is one of the major shortcomings of working with Python. The good news is that, with what you have already learned in this unit, it will be easier to understand the documentation. Also, please ask us for help.

Go to the official documentation website, https://docs.python.org/3/. If possible, arrange your windows so you can see that webpage while reading this notebook. The heading should be Python 3.7.4rc1 documentation. It’s okay if it’s slightly different, as long as it’s Python 3.7. There is only one section that you need at this time, Library reference. If you wish, you can use the Tutorial section to learn more about specific topics covered in this notebook. Also, the Language reference, sections 7 and 8, provides the official rules for things like if and while statements, but you probably will not find it very readable.

Library Reference#

Go to the official Library Reference. This page provides a bulleted outline. Let’s examine the sections that may be useful to you.

Built-in Functions#

Click on the link Built-in Functions. Here, you will see a list of all functions that are available to you in Python without the need for an import statement. Let’s examine a few in detail. Click on abs. When you do, you will see something that looks like this:

abs(x)
Return the absolute value of a number. The argument may be an integer or a floating point number. If the argument is a complex number, its magnitude is returned.

From this, we learn that abs is a function that takes one argument, x, and returns the absolute value of x. We learn that x must be an int or a float (ignore the complex number stuff).

Let’s look at one more built-in function, max. Its documentation says:

max(iterable, *[, key, default]))
max(arg1, arg2, *args[, key])
Return the largest item in an iterable or the largest of two or more arguments…

The documentation for max provides two function signatures. The first function signature is for “iterables”, which are things like lists. You will learn about lists later in this unit. The second function signature is for individual arguments. The documentation begins by saying that max returns “the largest item in an iterable or the largest of two or more arguments.” So if we pass max a list, it will return the largest item in the list. If we pass max some numbers, it will return the largest of those numbers. Simple enough, right?

The documentation then says:

If one positional argument is provided, it should be an iterable. The largest item in the iterable is returned.

So that stuff on positional arguments comes in handy! If one positional argument is provided, it should be an iterable (like a list). What that tells you is that max([1,2,3,4]) is meaningful, but max(5) is not. Why? Because the author of the function could not imagine why you would want to compute the maximum of a single number. It’s not meaningful.

The signatures also tell you that max allows optional arguments. These are enclosed in square brackets [ ]. Let’s ignore those for now.

Finally, notice that the second function signature, max(arg1, arg2, *args[, key]), tells us that you can pass one or more optional arguments to max. So max(1,2,-7,9) is valid.

Takeaways: We do not want you to become experts on the max function. :) What we want to show you is that the documentation can be helpful if you know what you are looking for. Typically, you want to know what the function does. Usually the documentation tells you that right away. Here, you are told that max “returns the largest item in an iterable or the largest of two or more arguments.” You want to know which arguments are valid. You want to know if you can pass optional arguments. Once you have learned that, stop reading and do not worry about the other details!

Other useful pages#

On the library reference page, there are some other useful libraries (e.g. math, random, statistics). There is documentation on the built-in string functions (we will teach you about those later in this unit). There are libraries for working with files on your hard drive, zipping files, working with dates and times, and so on.

We do not recommend that you invest in these at this time. For now, focus on the functions we teach you. You can always explore these in the future if your work demands it.

Getting help inside a Jupyter notebook#

You can get help on a function within your Jupyter notebook! In a code cell, just type a question mark followed by the function and press CTRL+ENTER. Here’s an example:

   ? abs

Signature:  abs(x, /)
Docstring: Return the absolute value of the argument.
Type:      builtin_function_or_method

Summary#

When you need help on a built-in Python function or library, go to https://docs.python.org/3/. Go to the Library Reference section and find what you need.
We will teach you about other useful documentation later in the course.
Before you write any code, make an outline or plan for your code.
For now, do not begin with Google searches when you need help.
The most useful things in a function’s documentation are:
- Function signature
- Description of the function - what it does
- Return value and its type
- Description of required arguments
- Description of optional arguments
You can get help within your Jupyter notebook, e.g. ?print