Describing the Team#
Objectives#
To reflect on the utility of different kinds of data structures
To use Python dictionaries to capture information in a structured way
To compose complex data structures and interact with them using loops
I. Structured data#
Like most programming languages, Python gives us various tools for organizing data, each of which involves certain tradeoffs.
In the homework you worked with the following data types:
floats and integers
strings
lists
The real power of these tools lies in our ability to compose them together in order to reflect both the complexity of the data we’re working with, and our goals (or those of our end users).
But it can be challenging to decide which Python types to use in a given circumstance. Before we dive into more code, take a few minutes to discuss the following questions with your team.
For discussion
If you’re going to the store to buy groceries, how do you generally keep track of the items you need? What pieces of information are relevant to this task?
What about keeping track of the courses you’re taking in a given semester?
Many of us now rely on our phones to store our contacts (friends, family, etc.). What if you had to give up your phone for a month – how would you organize your contacts? What would that look like?
If you were going to record some information about the members of your team, what information would you capture? Make a list of fields that the team can agree on.
I.1 Creating a list#
In the homework, you split a single string containing some course information into a list of three strings. For example, by executing the code "CHEM 1001 10".split()
, we get back the list ["CHEM", "1001", "10"]
. In Python, we can put any valid Python values, or variables pointing to those values, inside of a list.
Try it out!
In the cell below, assign a variable called my_team
to a new list containing the names of the members of your team.
Then using that variable, print the first name in the list.
#Your code here
I.2 Lists or dictionaries?#
The examples above used lists of strings to hold some information. But note the difference between the two kinds of lists. The list you created holds a list of names, e.g.,
my_team = ["Alex", "Emily", "Marcus", "Max"]
In this list, each of the elements conveys the same kind of information; each element is a person’s name. On the other hand, the list we created by splitting the course string contains three different kinds of information:
my_course = ["CHEM", "1001", "10"]
The first element is the department code, the second is the course number, and the third is the section number. But there’s nothing in the list itself that tells us which is which. We just have to remember that the department code comes first, followed by the course number and then the section number. For this kind of data, it would be great if we could label each element, such that instead of writing
my_course[1]
to access the course number, we could write something like this:
my_course["course_num"]
As you might have guessed, Python has just the thing for this kind of situation. It’s called a dictionary.
I.3 Creating a dictionary to describe yourself#
In this activity, you’ll create a more complex data structure to hold information about your team, and you’ll write Python code to update it.
First, create a Python dictionary to hold information about yourself. Follow the template below, but feel free to add any additional fields that your team agreed upon in the first activity (Question 4 above).
Dictionary template#
dolsy = {"name": "Dolsy Smith",
"email_address": "dsmith@gwu.edu",
"team_role": "advocate",
"years_at_gw": 18}
Include your current role on your team, i.e., notetaker, reporter, or advocate.
Try it out!
Create your dictionary in the cell below and run the cell to make sure the code works.
# Your code here
Notes
Note the curly braces (
{}
) at the start and end of the dictionary. Python uses square brackets for lists and curly braces for dictionaries.The values on the left side of the colons are called keys.
The values on the right side of the colons are called values.
Keys are usually (but not always) Python strings.
Values can be strings, floats, integers, or any other valid Python data type.
Note the commas at the end of each line. These separate key-value pairs. It’s not required to break the dictionary into multiple lines, but if you do, you must break after the comma.
In the template, I assigned this dictionary of data about me to a variable called
dolsy
. Assign your dictionary to whatever variable name you like.
I.4 Using dictionaries#
If your dictionary was successfully created and assigned to a variable, you should be able to access the values by using their corresponding keys. Key are simply labels that point to values – like the labels on a file folder, for instance.
Keys allow us to access information within dictionaries according to the label (the key) we have assigned to it. Think about a file cabinet full of folders: such a structure would be pretty hard to use if we had to remember the position of every folder. But by affixing labels, we can easily find the folder we want, as long as the label makes sense as a guide to the folder’s contents.
The dictionary is a similar concept. To retrieve and print my email address from the dolsy
dictionary created above, I write the following code:
print(dolsy["email_address"])
Note that here we surround the key with square brackets rather than curly braces. The syntax is meant to remind you of how we access individual items in a list.
Try it out!
Try printing a few of the values in your dictionary.
# Your code here
Keys not only make it easy to access values in a dictionary. We can also use them to add new values. To add a key called "programming_languages"
to my dolsy
dictionary, I can write the following:
dolsy["programming_languages"] = "Python"
Try it out!
Add the "programming_languages"
key to your dictionary. If you know more than one programming language, instead of assigning the key to a string, go ahead and assign it to a list of strings, e.g.,
["Python", "R", "Javascript"]
For discussion
The following table summarizes the key similarities and differences between lists and dictionaries in Python.
List |
Dictionary |
|
---|---|---|
Holds multiple values |
✓ |
✓ |
Access values by position |
✓ |
|
Access values by key (e.g, label) |
✓ |
|
Can retrieve multiple values by slicing |
✓ |
|
Add new values by appending (adding to the end) |
✓ |
|
Add new values by inserting a new key/value pair |
✓ |
Question: What kinds of data do you think are more suitable for storing in lists? What kinds might be better suited to dictionaries? Discuss with your team and record your answers.
II. Lists and dictionaries together#
We unlock the real power of a programming language like Python when we start to compose more complex structures out of the basic data types provided by the language. If lists and dictionaries in Python were mutually exclusive, we would quickly encounter the limits of those types.
A thought experiment
Think about the few thousand courses taught every semester at GW.
There are multiple pieces of information about each course that we might want to keep track of, like who’s teaching it, what department it belongs to, etc. That structure might lend itself to a dictionary.
But dictionaries have an important constraint: each key in the dictionary must be unique.
How would we structure this course data as a dictionary? What would we use for keys?
Alternately, if we were to use a list, what would the data structure look like?
Discuss your thoughts with your team.
II.1 Nesting data#
There are many different ways to put lists and dictionaries together. When handling datasets like our list of courses, a common approach is to create a list of dictionaries.
You can think of this structure as similar to a spreadsheet.
Each dictionary is like a single row of the spreadsheet.
Each dictionary has the same keys but different values.
The keys correspond to the column names in the spreadsheet.
The values correspond to the data in the cells under those columns.
Try it out!
Just as you did for yourself, create one dictionary for each member of your team.
For each person, the keys will be the same, but the values will be different.
Assign each dictionary to a different variable.
Create as many new code cells below as you need, using the plus button in the toolbar above.
# Your code here
Now you should have multiple variables, each of which represents a dictionary of data about one of your team members.
Try it out!
Create a new variable to store a list of those dictionaries. For instance, if I have variables dolsy
, marcus
, debbie
, and alex
, to add them to a list, I could write
my_team = [dolsy, marcus, debbie, alex]
Note that because I’m putting variables in this list, not strings, I don’t use quotation marks. (I don’t want to create a list of strings, but a list of dictionaries, each of which is represented by one of these variables.)
To see the final result, run a code cell with just the name of your team variable in it, like so:
my_team
# Your code here
II.2 Working with nested data#
The my_team
variable is a list of dictionaries, so it’s a nested data structure. But the outer layer is still a Python list, so it has all the list behaviors you learned about in the lessons and homework for Day 1.
We can access a single element (a dictionary) by index:
my_team[0]
We can even access multiple elements by slicing:
my_team[:2]
But what if we want to work with the values inside one of these dictionaries? We can access a value by key from the first dictionary like so:
my_team[0]["email_address"]
And we can add or update a value using the same syntax:
my_team[-1]["role"] = "reporter"
Try it out!
Try updating a few of the keys in a few of the dictionaries in your team list. The diagram below might help you conceptualize the syntax you need.
II.3 Loops, lists, and dictionaries#
What if we want to add the same value to all of the dictionaries in our list? Let’s say we want to add a “programming_languages” key to everyone on the team and assign it to “Python.”
The following code might seem like it should work:
my_team["programming_languages"] = "Python"
But it doesn’t. The variable my_team
points to a list, and lists do not allow access by key, only by position. It doesn’t matter that my_team
points to a list of dictionaries; Python doesn’t let us take that shortcut.
So what do we do? We probably don’t want manually to add the same key/value to every dictionary in our list. The following would work, but it kind of defeats the point of programming:
my_team[0]["programming_languages"] = "Python"
my_team[1]["programming_languages"] = "Python"
my_team[2]["programming_languages"] = "Python"
# and so on
For-tunately, we can reach for one of the most powerful tools in the programmer’s toolkit, a tool that unlocks the superpower of the computer – its capacity for mindless iteration – with a humble, three-letter word.
Set it and for-get it
The for loop is the most common looping construct in Python. It lets us loop over any collection – that is, any data type that can contain multiple elements – and work with each element in turn. Probably the most common use of the for loop is for looping over lists.
In the code below, we use a for loop to print each team member’s email address.
for member in my_team:
email = member["email_adress"]
print(email)
Loop variables
Note the variable member
in the code immediately above. That’s not a variable we’ve used before. In creating your my_team
list, you probably used other variables, like this:
my_team = [dolsy, marcus, debbie, alex]
A loop variable is a special variable used in the context of a for loop. It’s job is to point to each element in the list (or other collection) sequentially.
Imagine that your’re baking a cake that requires several ingredients, and you have a single measuring cup. You put each ingredient into the cup in order to measure it – the cup is like a loop variable, because it gets reused with each ingredient.
The body of the for loop – the code underneath the line with for
, the code that forms an indented block – will be executed as many times as there are elements in the list. Usually, the loop variable is used somehow within that indented block – as in the code above, where we use the member
variable to access the value of the "email_address"
key in each dictionary, and use the latter as one of the arguments to print()
.
Try it out!
Write a for loop to add a "programming_languages"
key, and the value ["Python"]
, to each dictionary in your my_team
list.
Enclose the string “Python” inside a list (by wrapping it in square brackets): that will let us add other languages later, should we need to.
Then display your list to confirm that the operation was successful.
If you want additional help, consult this Parsons’ Problem.
# Your code here
III. From Code to Data#
Congratulations! You’ve created a small dataset with information about your team. Now we’ll save this dataset to your JupyterHub account so that we can use it later.
We’ll use the json
module and the open
function, just as we did in a previous lesson when loading the GW Bookstore dataset, except that this time we’ll be writing to the file, not reading from it.
First we import the json
library.
import json
Next we open the file for writing, and use the json.dump()
method to write our list of dictionaries to the file.
In the code below, I’ve used the variable my_team
to refer to the list of dictionaries created above. If you named your list something different, use that name in place of my_team
.
Note that we’re saving the data to a file called team-dataset.json
. The initial ./
indicates that this file will live in the same directory as this notebook.
with open('./team-dataset.json', 'w') as f:
json.dump(my_team, f, indent=4)
Try it out!
To confirm, look at the JSON file you just created in your browser.
Note: This will only work if you are using JupyterHub. If you’re using Google Colab or another coding environment, please ask a facilitator for help with this step.
Open a new tab.
Copy the URL from this tab and paste it in the new tab.
Remove the name of the notebook,
2_1_describing_the_team.ipynb
, and replace it with the name of the JSON file:team-dataset.json
.You should see the data displayed in your browser, just as when we first looked at the bookstore dataset.
One more thing
We’re going to re-use these team datasets on Day 3. To prepare for that, your team’s reporter should copy the data from the JSON as displayed in your browser and paste it into the Python Camp shared notes document.
Syntax review: Square brackets
Square brackets ([]
) are used in Python in a few different situations. Learning to distinguish among them will help you read and write Python more efficiently.
Lists
When square brackets stand by themselves, i.e., not attached to the name of a variable or to a string, they delimit a list. A list may contain:
Zero elements:
[]
One element:
["Python Camp"]
(a list containing a single string)Multiple elements:
[1, 2, 3]
(a list containing three integers)
When there are more than one element in a list, the elements are separated by commas. List elements may be any mix of Python data types: integers, floats, strings, even other lists!
Think of the square brackets enclosing lists as the brackets that hold up a shelf. Like a shelf, a list allows us to arrange things sequentially (i.e., in order, one after the other) but provides no additional structure.
Indexing and slicing
Square brackets “attached” to a Python string or list are used for indexing or slicing. A single integer between brackets retrieves a single element (by position), and two integers, separated by a colon (:
), retrieves multiple adjacent elements, e.g.,
"Python"[1]
–>"y"
, because the index 1 corresponds to the second element. (Python indexing starts from 0.)"Python Camp"[0:6]
–>"Python"
, starting with the first element (index 0) and going up to, but not including, the 7th (the space). We could also write this as"Python Camp"[:6]
, without the0
, since we want a slice starting from the beginning of the string.
The square brackets used for indexing/slicing are kind of like a window. They allow us to peek at one or more elements that are adjacent to one another.
Dictionary access
Square brackets “attached” to a Python dictionary should enclose a key to be found within the dictionary. Accessing a dictionary by key returns the value associated with that key. It’s not possible to retrieve values for more than one key at a time. We also use this notation to add keys and values to dictionaries.
team_member["email_address"]
retrieves the value associated with the"email_address"
key in theteam_member
dictionary.team_member["programming_language"] = "Python"
assigns the string"Python"
to the key"programming_language"
. If the key already exists, its value will be overwritten.
Wth dictionaries, the square brackets might remind you of the tab at the top of a file folder to which you can affix a label.