Error Stories

Error Stories#

Objectives#

To practice documenting code for clarity and reuse
To troubleshoot Python errors/exceptions
To use conditional logic to prevent errors
To write code to test Python code

Introduction#

To err is human. And since programming is a human activity, it’s no less error-prone than anything else. Just as human stories tend to dramatize the protagonists’ mishaps, misteps, and mistakes en route to their triumphs and successes, the stories we tell in code necessarily traffic in errors.

Of course, we want our code to work, which usually means “to work without obvious errors or bugs.” But the important word there is obvious. A lot of work goes into making code appear free from errors.

An important part of that work doesn’t actually involve writing code. It involves writing documentation to communicate to others – including our future selves – the intentions behind a given piece of code: how it’s supposed to work and why.

Another part involves writing code in such a way that it can respond to errors, and/or respond to situations that might cause errors and avert them.

A third part involves testing code to confirm that it works as intended in a variety of scenarios.

We’ll look at each of these strategies below.

How to Use this Notebook

This notebook is intended for you to work through independently, in order to review and clarify the concepts introduced on Python Camp Day 3, and to lay the groundwork for the activities on Python Camp Day 4. However, feel free to collaborate with others in working through it. It is also intended to serve as a resource you can return to review as necessary.

Read the documentation above each cell containing code and run the cell (Ctrl+Enter or Cmd+Return) to view the output.
Follow the prompts labeled Try it out! that ask you to write your own code in the provided blank cells.
(Hidden) solutions to these exercises follow the blank cells; click the toggle bar to expand the solution to compare with your approach.
Some prompts include alternative exercises (Parsons Problems) that will be linked from the prompt. These alternatives may help clarify concepts (especially if you find yourself struggling to keep up with all the syntax).
Optional annotations (labeled For the curious...) provide additional explanation and/or context for those who want them. Feel free to skip these sections if you like. As a beginner, it’s important to maintain a balanced cognitive load: taking in too much information all at once can impede your progress toward understanding. This balance looks different for everyone, but we have tried to keep the main content focused on a few key concepts, tools, and techniques, while providing that additional context for those who might benefit from it.

I. Writing & Reading Documentation#

While good documentation won’t technically make your code work better, it can make you work better. Good documentation makes clear – to you or anyone else who might want to use your code – how code is intended to be used.

Conversely, not having documentation in your code is a recipe for frustration. Writing code is always a matter of choosing one path over many other possible paths, and your future self is not likely to remember why you chose a particular path in every case (nor even necessarily what you were trying to accomplish).

In the exercise below, you’ll practice documenting some code that has already been written. The code uses a logical pattern that we’ve seen before but in a novel way.

For this exercise, we’re using the bookstore dataset, so the first step is to load it from disk.

from urllib.request import urlretrieve
import json
urlretrieve('https://go.gwu.edu/pythoncampdata', 'bookstore-data.json')
with open('bookstore-data.json') as f:
    bkst_data = json.load(f)

I.1 Commenting on Code#

The code below loops through the bookstore data and counts the total number of textbooks where the type of the text is digital (as opposed to print).

Above each line of code is a blank line beginning with the hash symbol (#). This is a Python comment. The Python interpreter ignores comments when executing code, so they are present purely for the programmer’s benefit.

Try it out!

For each comment, write (in your own words) an explanation of what the line of code below the comment is doing. Your comment text can be anything that makes sense to you. Just make sure that your text follows the hash symbol. (If you want to make a comment that spans multiple lines, just create an extra line below the first and begin that new line with the hash symbol.)

#
digital_count = 0
#
for course in bkst_data:
    #
    for text in course['texts']:
        #
        if text['item_type'] == 'digital':
            #
            digital_count += 1
#
print('Number of digital textbooks:', digital_count)

Expand the cell below to see one possible way of documenting this code.

I.2 Using the Python documentation#

Lucky for us, documentation consists of much more than lines of comments on code. Both the Python standard library and a wide array of third-party Python libraries come with extensive documentation.

Learning how to read and navigate this documentation is a skill in itself.

The official Python documentation – for the core language and the standard library – resides at docs.python.org/3. This site will often appear in Google results when searching for documentation on specific functions, methods, etc.

Try it out!

In a previous homework, we used the str.split() method to separate a single string on its white space into a list of substrings.

The code

'CHEM 1001 10'.split()

returns the output

['CHEM', '1001', '10']

Here is the documentation for str.split().

Reading that documentation, can you tell how to use the str.split() method to separate a string on something other than white space?

For example, our bookstore dataset indicates whether a text is for sale or rental, new or used, by the following strings:

BUY_NEW
BUY_USED
RENTAL_NEW
RENTAL_USED

Write some code that will split such a string on the underscore character (_), so that we can separate each of these strings into two data points.

Expand the hidden cell below to see an explanation and a solution.

# Your code here

Hint

The crucial line of the Python documentation for methods is the first, called the method signature. For str.split() the method signature looks like this:

str.split(sep=None, maxsplit=-1)

As mentioned above, the str here refers to any Python value of type string (str). In other words, you can call the split method on anything between single or double quotes, or on any variable that has been assigned to a value surrounded by single or double quotes.
The part between parentheses defines the method’s arguments.
Each argument is given a default value, meaning that (in this case), these arguments are optional.
- The sep argument defaults to None.
- The maxsplit argument defaults to -1.

If the user of the method does not supply a given argument, the default value will be used. Reading the documentation below, we see that

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.

That’s a little dense, but basically, it describes the behavior we’ve seen when using str.split() (with nothing between the parentheses): the string is split on the white space.

To split on something else, we need to provide a value for the sep argument. We can do that in one of two ways:

"RENTAL_NEW".split(sep="_")

or

"RENTAL_NEW".split('_')

Either of those will yield the result we want: ["RENTAL", "NEW"]. Note that the underscore character is enclosed in quotation marks when passing it as an argument to str.split().

II. Debugging#

Debugging is the process of troubleshooting errors in code.

Apart from syntax errors, the frequency of which will decrease as you become more comfortable with Python’s syntactic rules, most errors in Python arise from a mismatch between the logic of the code, and the structure of the data or environment on which the code is being run.

A common source of errors lies in inconsistencies or unexpected elements within a dataset. In the code below, we’ll attempt to modify our bookstore data by separating the name in the instructor field into a first name and last name. (That could make it easier to look up a given course by instructor, for instance.)

Note that as written, this code will not run without error.

# Loop over courses in the dataset
for course in bkst_data:
    # Split each instructor name on white space
    first_name, last_name = course["instructor"].split()
    # Assign each part of the name to a new key in the dictionary course
    course["first_name"] = first_name
    course["last_name"] = last_name

Notes

Running this code produces an AttributeError. Until you gain some familiarity with Python exceptions, the name of the exception will be less useful than the message that follows it:

'NoneType' object has no attribute 'split'

Note also the green arrow pointing to the line of code that reads

first_name, last_name = course["instructor"].split()

The AttributeError tells that something went wrong with the call to the split() method. The phrase 'NoneType object' may seem opaque. But recall what we’ve learned about the split() method: that it’s defined to work with strings. That’s the meaning of the “str” when we write it as str.split(): it tells us that Python strings have access to this method. Other Python types do not.

An AttributeError occurs when we try to use a method on a type that doesn’t “have” that method. But the important thing is to find out when and why the value of course["instructor"] might not be a string.

II.1 Debugging with `print()`#

One of the best ways to debug code is also one of the simplest: using the print() function.

If we know that an error is ocurring inside a for loop, but we don’t know what data element triggered the error, we can take advantage of the fact that the loop will stop as soon as the error occurs.

By using print() to display the element we’re interested in each time through the loop, we can observe where the loop stops: if we’ve put our print() in the right place, the last value printed should be one that triggered the error.

Try it out!

Copy the buggy code from above and paste it into the cell below. Add one or more calls to the print() function and see if you can identify the source of the AttributeError.

# Your code here

Hint

One approach is as follows: to print the value of the 'instructor' key each time through the loop.

# Loop over courses in the dataset
for course in bkst_data:
    print(course["instructor"])
    # Split each instructor name on white space
    first_name, last_name = course["instructor"].split()
    # Assign each part of the name to a new key in the dictionary course
    course["first_name"] = first_name
    course["last_name"] = last_name

Doing so reveals that the last value printed before the error is not a name but None.

Note that None is not a string; it’s a special Python type (written without quotation marks) that stands for a null value.

As the AttributeError informs us, we cannot use the split method on a None value. (There’s not much you can do with None; that’s because it’s a null: used to designate the absence of a value.)

II.2 Preventing errors with `if`#

There are more sophisticated ways of handling errors in Python, but one of the most straightforward is to include an if statement to check for the condition that causes the error.

In this case, the error is caused when course["instructor"] is None. Python provides a concise way to check whether a value is None. We can simply write if value_to_test:, where value_to_test is a variable or other name that may or may not be null.

In the code below, we’ve incorporated this check into our code.

Note that the code still fails, but this time we get a different error.

# Loop over courses in the dataset
for course in bkst_data:
    # Check for null values
    if course["instructor"]:
        # Split each instructor name on white space
        first_name, last_name = course["instructor"].split()
        # Assign each part of the name to a new key in the dictionary course
        course["first_name"] = first_name
        course["last_name"] = last_name

Try it out!

Can you use our print() debugging technique to identify the cause of this ValueError?

# Your code here

Hint

The line

first_name, last_name = course["instructor"].split()

expects a string for the value of course["instructor"]that follows a familiar pattern: "first_name last_name", where white space separates the first and last name.

In other words, this code works only if the result of the split() method is a list with two elements. And str.split() will produce a list with two elements only if the string has a single instance of white space.

Try it out!

Reading the documentation for str.split(), can you identify an argument to the method that could prevent this error? How can we force split to return only two elements?

# Your code here

Expand the cell below to see a possible solution.

Notes

Note that the code in the provided solution runs without errors, but it doesn’t necessarily solve the problem posed by the instructors’ names in our dataset.

This code will handle names where the last name (surname) contains spaces. But where a middle name or middle initial is given, or where the first name (the given name) contains spaces, it will assign the wrong values to the "last_name" key.

This fact illustrates an important point.

Code that runs without errors is not necessarily code that works as intended.

II. Testing code#

Because of that fact, it’s useful to create tests that can confirm that our code works as intended. We obviously can’t test for every eventuality. But we can identify test cases that represent how our code should run under optimal conditions or in optimal scenarios.

Identifying the test cases allows us to specify with confidence the kinds of situations for which our code is expected to work, as well as to flag conditions where it doesn’t work (and that we might want to address in future work).

In what follows, we’ll write a test for our name-parsing code above. You’ll see tests of this sort in the final, submitted homework for Python Camp.

II.1 Writing testable code#

Code tends to be easier to test when it’s organized into discrete units. Writing functions is a great way to organize code in units that can be easily tested. (Functions also make our code more readable and easier to modify in response to new user stories, new datasets, etc.)

Try it out!

The following cell contains the function signature (the def line) and the return statement for a function called extract_instructor_names.

This function takes as argument a single dictionary, checks for the presence of a key called "instructor", and if the latter is present, adds "first_name" and "last_name" entries to the dictionary.

The function should do the same thing as the body of the for loop above, but without the loop: in other words, we want to be able to use our function to process a single course (not a list of courses).

See if you can fill out the body of the function below.

def extract_instructor_names(course_dict):
    # Given a dictionary with an 'instructor' entry, 
    # splits the associated string into two parts
    # and assigns them as separate entries to the dictionary
    
    # Your code here
    
    return course_dict

II.2 Writing tests#

Now that we have a function that works for a single course dictionary, we can write some tests.

We’ll be using the assert keyword, which is used almost exclusively in writing tests. Like an if statement, an assert statement evaluates a given condition. But instead of being followed by a block of code to execute if the condition is True, assert does nothing if the condition is true. However, if the condition is False, it raises an AssertionError.

Try it out!

Modify x in the code below so that the assert statement produces an error. Note that following the condition, we can provide (as a Python string) a message; this message will appear in the AssertionError itself.

x = 5
assert x > 4, 'x should be greater than 4.'

Using assert, below we create some tests for extract_instructor_names.

Each assert statement represents a different condition we want to test for.

# Define some test data
test_data = {"instructor": "Dolsy Smith"}
# Obtain a result by running the function
test_result = extract_instructor_names(test_data)
# Test the result against a condition
assert test_result["first_name"] == "Dolsy", "first_name should be 'Dolsy'"
assert test_result["last_name"] == "Smith", "last_name should be 'Smith'"

Here’s another test, to make sure our function works when the value for instructor is None.

Note that our test data is not an example of a full course as represented in the bkst_data dataset. Our function only deals with the value of the "instructor" key, so that’s the only key our test_data dictionary needs to contain. (We could add other keys, like "department" and "course_num" and "section", but in this case they would not add anything to the test.)

# Define some test data
test_data = {"instructor": None}
# Obtain a result by running the function
test_result = extract_instructor_names(test_data)
# Test the result against a condition
assert "first_name" not in test_result, 'first_name should not be present'
assert "last_name" not in test_result, 'last_name should not be present'

Wrap up#

In this homework, you practiced debugging errors with the print() function, and you used if statements to avert possible errors due to inconsistencies in the data. Finally, you worked with the assert keyword to construct tests for code (to ensure that your code is working as expected).

Error Stories

Contents

Error Stories#

Objectives#

Introduction#

I. Writing & Reading Documentation#

I.1 Commenting on Code#

I.2 Using the Python documentation#

II. Debugging#

II.1 Debugging with print()#

II.2 Preventing errors with if#

II. Testing code#

II.1 Writing testable code#

II.2 Writing tests#

Wrap up#

II.1 Debugging with `print()`#

II.2 Preventing errors with `if`#