Lab 01 assignment (20 pts)

Contents

Lab 01 assignment (20 pts)#

UW Geospatial Data Analysis
CEE467/CEWA567
David Shean, Eric Gagliano, Quinn Brencher

Introduction#

Lab 1 is focused on understanding the potential of geospatial data, navigating the command line, getting comfortable with using python in jupyter notebooks, and using git and github to manage your work.

In the first section of this lab, we will see examples of geospatial data and show off some of the applications we will get to later in the quarter. In the second section of this lab, we’ll get more familiar with navigating the command line. In the third and fourth section of the lab, we’ll practice the basics of using python in jupyter notebooks. Finally, we will practice using git and github to submit your work.

Instructions#

  1. Please go through all cells in the notebook sequentially, making sure to complete all instructions

  2. Some answers are left in as a guide–you still need to fill in the code to produce the output on your own!

  3. Answers should be produced using code unless otherwise noted by “Written response”

  4. Follow submission instructions closely, making sure you save and submit your notebook with all your outputs preserved

Part 2: Navigating the command line (2 pts)#

We’ve got a couple repositories to keep track of, so let’s make sure we’re in the right place!

  • In the terminal, type cd ~ to return home. Type ls to see what’s in your home directory. You should see your two repositories of interest…

    • GDA_Wi25_jupyterbook

      • This contains our class materials, this is helpful so you can follow along with the demos during lecture

    • labs

      • This is your labs folder, each week a new folder within labs will be created when you git clone that weeks lab assignment

        • Make sure when you are using git clone, you are using it from within labs!

      • The path to this notebook will look something like labs/01_github_python-egagli/01_lab.ipynb

  • Navigate from your home directory to the directory containing this notebook. On the way, try out using tab completion so you don’t have to type a lot!

2a) Written response: Use the command line to find out how big each file in your 01 lab folder is. What command did you use, and how big is each file in MB?#

STUDENT WRITTEN RESPONSE HERE

Unzip words#

  • Fortunately, the basic linux operating system we’ve provided includes command-line utilities to unzip files. Navigate to the directory containing the data, and then use unzip gda_2025_data.zip

    • This should create a new data subdirectory, with three new files

2b) Written response: Using the command line, check out the top of the words file. What command did you use, and please paste the output.#

STUDENT WRITTEN RESPONSE HERE

Part 3: Python time! A play on words (8 pts)#

3a) Define a variable to store the path to the words file#

  • Can be absolute or relative path (try both!): https://www.geeksforgeeks.org/absolute-relative-pathnames-unix/

  • Note: Can use %pwd (print working directory, similar to pwd shell command) to get current directory path.

  • When defining paths in iPython, use /home/jovyan instead of ~ shortcut for your home directory

  • The path should be a string, enclosed in single quotes '/path/to/some/file.txt'

# STUDENT CODE HERE

3b) Use Python to read this file and populate a list of strings containing all words#

  • Use basic Python open function here, even if you know how to do this with other modules

  • Note: you will need to handle newline strings '\n' at the end of each word

# STUDENT CODE HERE

3c) How many words are there in the list? How many characters are in the first word of the list?#

# STUDENT CODE HERE
235886
# STUDENT CODE HERE

3d) What is total number of characters for all words in the list?#

  • Can use list comprehension here to loop through all words

# STUDENT CODE HERE
2257223

3e) What is the longest word? And how many characters are in the longest word?#

# STUDENT CODE HERE
# STUDENT CODE HERE
24

3f) Print the first 3 words, print the last 3 words#

  • Use relative list indices for slicing: https://stackoverflow.com/questions/509211/understanding-slice-notation

  • Note that the output is still a list object

# STUDENT CODE HERE
['A', 'a', 'aa']
['zythum', 'Zyzomys', 'Zyzzogeton']

3e) Define a function that will concatenate an input list of strings. Run your function three separate times below (Use indexing on the words list, don’t copy/paste strings from the list)#

  • Your function should return a single string (with no spaces)

  • This function should accept an input list with arbitrary length as an argument

    • So return inlist[0]+inlist[1]+inlist[2] won’t work

  • Example input: ['Geospatial', 'Data', 'Analysis']

  • Example output: 'GeospatialDataAnalysis'

# STUDENT CODE HERE
  • Pass in a list of the first 3 words

# STUDENT CODE HERE
'Aaaa'
  • Pass in a list of the first 5 words

# STUDENT CODE HERE
Aaaaaalaalii
  • Pass in a list of the last 3 words

# STUDENT CODE HERE
zythumZyzomysZyzzogeton
  • Pass in a list of the 1st, 3rd, 5th, and 7th word (don’t pass the words in separately!)

# STUDENT CODE HERE
AaaaaliiAani

3f) Does your list contain the nickname for the UW mascot? If so, what is the numerical index for that word?#

  • If you don’t know our mascot, ask a neighbor! Be careful about case

  • This should be simple boolean statement

  • Double check your index by printing the word at that index

# STUDENT CODE HERE
True
# STUDENT CODE HERE
58209
dubs

Part 4: Letter counter (6 pts)#

4a) How many words begin with each letter of the alphabet (case-insensitive)?#

  • Hint: Python has built-in list of lowercase letters stored as string.ascii_lowercase (in the string module, so need to import first!). Also, all string objects have methods that can change the case: https://docs.python.org/2.5/lib/string-methods.html

  • Hint: One possible approach could use nested loops:

    • Loop through each letter

      • Initialize some count variable or empty list

      • Loop through each word in the list of words

        • Check to see if the word starts with the letter (careful about case!)

        • If it does, increment your counter or append the word to your list

      • Print out the letter and the total count of words that met your criterion

  • Another possible approach could use a dictionary:

    • Create a new dictionary with a key for each lowercase letter

    • Initialize a counter for each value in the dictionary

    • Loop through words and increment the appropriate counter

If you want, try to implement both - which one is faster?

# STUDENT CODE HERE
# STUDENT CODE HERE

4b) What is the most common first letter? Use string formatting to print your answer#

  • While it is possible to just look at the output counts above, try to do this with code

  • If the above results are stored in a dictionary or lists, this should only require 1-2 lines of code - no need for additional loops

  • Output should be something like: “The most common first letter in words is ‘a’ with 17096 occurences”

    • Note that ‘a’ is not the correct answer - only 25 other possibilities to consider!

# STUDENT CODE HERE
"The most common first letter is 's' with 25162 occurrences"

Challenge question: Create a plot of letter counts (GS: Required / UG: +1 pts)#

  • We haven’t talked about matplotlib or other plotting libraries yet, but if you already feel pretty comfortable plotting, create a visualization your output counts. A bar plot (AKA histogram when counts are involved) might be a good choice

# STUDENT CODE HERE
../../_images/a42fadbe502c8ba17ebbf7e242eb5bcd3d271e1ed09d0aede63ca45805676506.png

Submit your work#

  1. Save this notebook with all code and output (Make sure when you save the notebook, all cells show their outputs).

  2. Use the terminal to stage, commit, and push your notebook to your GitHub repository. It should look something like this…

  • git add 01_lab.ipynb

  • git commit -m “Completed Lab 01 exercises”

  • git push

  1. Verify that your notebook appears in your GitHub repository. Double check to make sure all the ouputs are visible!