Lab03 assignment (20 pts)#
UW Geospatial Data Analysis
CEE467/CEWA567
David Shean
modified by Eric Gagliano and Quinn Brencher
Introduction#
Objectives#
Review some fundamental concepts that are common to most geospatial analysis
Learn basic operations with Geopandas
Explore coordinate systems, projections and transformations, geometry types
Understand how different projections can distort measurements and visualizations
Create more sophisticated visualizations involving multiple layers and data types
Instructions#
For each question or task below, write some code in the empty cell and execute to preserve your output
If you are in the graduate section of the class, please complete the challenge questions
Work together, consult resources we’ve discussed, post on slack!
Part 1: Exploring CRS and Projections (3 pts)#
Import necessary modules#
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, Polygon
import fiona
import pyproj
#%matplotlib widget
%matplotlib inline
Load sample vector data: world polygons#
See example (and lots of relevant info): https://geopandas.org/projections.html
This GeoDataFrame containing attributes and Polygon geometries for all countries is conveniently bundled with Geopandas
world = gpd.read_file("https://naciscdn.org/naturalearth/110m/cultural/ne_110m_admin_0_countries.zip")
/tmp/ipykernel_5512/3926210268.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
Inspect the world GeoDataFrame#
Review the columns
Note geometry types: both Polygon and MultiPolygon - why the difference? (no written answer needed)
world.head()
| pop_est | continent | name | iso_a3 | gdp_md_est | geometry | |
|---|---|---|---|---|---|---|
| 0 | 889953.0 | Oceania | Fiji | FJI | 5496 | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... |
| 1 | 58005463.0 | Africa | Tanzania | TZA | 63177 | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... |
| 2 | 603253.0 | Africa | W. Sahara | ESH | 907 | POLYGON ((-8.66559 27.65643, -8.66512 27.58948... |
| 3 | 37589262.0 | North America | Canada | CAN | 1736425 | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... |
| 4 | 328239523.0 | North America | United States of America | USA | 21433226 | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... |
Check the coordinate reference system (crs)#
world.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
Look up this EPSG code online#
See top of
crsoutputBurn this code in your brain
Written response: What are units? Would this be an appropriate crs to do area calculations? Why or why not?
STUDENT RESPONSE HERE
Plot the GeoDataFrame using the convenient plot method with default settings#
Note that this is a 2D representation of geographic coordinates (lon,lat), known as Equirectangular, Equidistant Cylindrical, and “Plate Carrée” (flat square in French) projection
https://en.wikipedia.org/wiki/Equirectangular_projection
# STUDENT CODE HERE

Change the world!#
Reproject the world GeoDataFrame to another common, global projection: Web Mercator
This is one of the most common projections used for online maps (e.g., Google Maps) and tiled basemaps
https://en.wikipedia.org/wiki/Web_Mercator_projection
Look up (or recall from memory) the appropriate EPSG code
Use the GeoPandas
to_crs()function to reproject the world GeoDataFrame to Web Mercator and create a new plotIf you didn’t review it earlier, now might be a good time to take a look at this documentation: https://geopandas.org/en/stable/docs/user_guide/projections.html#re-projecting
# STUDENT CODE HERE

Uh oh! Why didn’t this work? Let’s try a Robinson projection instead…
#Robinson - no issues, goes all the way to -90 to +90° latitude
world.to_crs('ESRI:54030').plot();
Workaround for Antarctica (more discussion later)#
world
| pop_est | continent | name | iso_a3 | gdp_md_est | geometry | |
|---|---|---|---|---|---|---|
| 0 | 889953.0 | Oceania | Fiji | FJI | 5496 | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... |
| 1 | 58005463.0 | Africa | Tanzania | TZA | 63177 | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... |
| 2 | 603253.0 | Africa | W. Sahara | ESH | 907 | POLYGON ((-8.66559 27.65643, -8.66512 27.58948... |
| 3 | 37589262.0 | North America | Canada | CAN | 1736425 | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... |
| 4 | 328239523.0 | North America | United States of America | USA | 21433226 | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... |
| ... | ... | ... | ... | ... | ... | ... |
| 172 | 6944975.0 | Europe | Serbia | SRB | 51475 | POLYGON ((18.82982 45.90887, 18.82984 45.90888... |
| 173 | 622137.0 | Europe | Montenegro | MNE | 5542 | POLYGON ((20.07070 42.58863, 19.80161 42.50009... |
| 174 | 1794248.0 | Europe | Kosovo | -99 | 7926 | POLYGON ((20.59025 41.85541, 20.52295 42.21787... |
| 175 | 1394973.0 | North America | Trinidad and Tobago | TTO | 24269 | POLYGON ((-61.68000 10.76000, -61.10500 10.890... |
| 176 | 11062113.0 | Africa | S. Sudan | SSD | 11998 | POLYGON ((30.83385 3.50917, 29.95350 4.17370, ... |
177 rows × 6 columns
#https://tipsfordev.com/how-to-remove-shapes-that-cause-a-problem-upon-reprojection-from-a-geopandas-dataframe
from shapely.geometry import box
crs = pyproj.CRS.from_epsg(3857)
bounds = crs.area_of_use.bounds
gpd.clip(world, box(*bounds)).to_crs('EPSG:3857').plot();
Take a moment to consider the two plots#
Note the range of axes tick labels and units
Written response:
Summarize the issue with our first attempt projecting to web mercator.
What is most distorted in this map projection?
STUDENT RESPONSE HERE
Define a custom projection#
Set the origin of your projection on the (self-proclaimed) “Center of the Universe” – Fremont, Seattle, WA, Earth
https://www.atlasobscura.com/places/center-of-the-universe-sign
You’ll probably need to look up some coordinates
Let’s start by creating a proj string (make sure you use sufficient precision for decimal degrees)
https://proj.org/usage/quickstart.html
Choose a simple projection that accepts a center latitude and center longitude (e.g., orthographic)
See reference: https://proj.org/operations/projections/ortho.html
Your string should look something like
'+proj=ortho +lon_0=YY.YYYYYYY +lat_0=XXX.XXXXXXX'Make sure the order and sign of your latitude and longitude values make sense (confidence check!)
# STUDENT CODE HERE
# STUDENT CODE HERE
Reproject and plot#
Use GeoPandas
to_crs()method to reproject the world GeoDataFrame using your local projection, reducing our beautiful multidimensional universe to a 2D plotThe
plot()method returns a matplotlib axes object - store this as new variable calledaxModify this axes object to add thin, black horizontal lines where y=0 and a vertical line where x=0, producing “crosshairs” on the origin
See the matplotlib
axvline()andaxhline()methodsThe “line width” keyword
lwmay be useful
# STUDENT CODE HERE

Try more projections!#
Let’s experiment with a few additional projections, creating plots like the one above with crosshairs for the origin
This time, use some EPSG codes. You can pass a string like
'EPSG:XXXX'to theto_crs()method!Plot the following:
EPSG:3031 - South Polar Stereographic (see if you can find Antarctica, might need to zoom in)
The EPSG code for the UTM zone on WGS84 ellipsoid that contains Seattle
see https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system
Note: If you search for “UTM 10N” in the EPSG registry, you may get several returns: https://epsg.io/?q=UTM+10N. These are all valid, but they use different ellipsoids/datums. You want the code for the WGS84 ellipsoid definition.
US National Atlas Equal area (EPSG:2163)
Two more additional EPSG codes of your choosing
Since we have to do this several times, define a function that accepts a geodataframe and an EPSG code as arguments, then does the reprojection and plotting (no need to return anything at this point). Then we could use a simple for loop to call the function for each EPSG code! Nice and clean, and better than copying/pasting.
Note: you may see some strange issues with “lines” across the map, or an apparently empty map (try zooming in on 0,0). You didn’t do anything wrong, see next section…
# STUDENT CODE HERE
# STUDENT CODE HERE

OK, something isn’t right with some of these maps#
Note some polygons (countries) cross the antimeridian (-180°/+180° lon) or one of the poles (+90° or -90° lat, like, say Antarctica). Or polygons could extend beyond the valid extent for the target projection. These polygons won’t render correctly.
If using a regional projection for local or regional analysis (e.g., UTM or state plane coordinate system), you probably don’t care about polygons from the other side of the planet anyway.
The solution is to isolate or clip polygons of interest before reprojecting.
Isolate North American polygons#
Start with a quick inspection of the
worldGeoDataFrameYou should be able to use the standard Pandas “selection by label” approach (
.loc) to isolate records for countries in North AmericaThis can be done using a conditional statement, which will return a boolean array for selection
Store the output in a new GeoDataFrame
Regenerate the same plots (with your handy function!) using this new GeoDataFrame
# STUDENT CODE HERE

# STUDENT CODE HERE

That looks better!#
Note the location of the origin, apparent size and distortion of polygons (e.g., Mexico, Alaska and Greenland)
Part 2: Loading volcano locations into a GeoDataFrame (3 pts)#
Let’s explore a new dataset containing Holocene volcanoes (those active in the last 10,000 year) in the contiguous US.
These data are a subset of the data provided by the Global Volcanism Program, Smithsonian Institution
Global Volcanism Program, 2024. [Database] Volcanoes of the World (v. 5.2.3; 20 Sep 2024). Distributed by Smithsonian Institution, compiled by Venzke, E. https://doi.org/10.5479/si.GVP.VOTW5-2024.5.2
Load the existing csv into a Pandas DataFrame#
Define the relative path to the csv as in previous labs
Use the amazing Pandas
read_csv()function to load as a Pandas DataFrameRun a quick
head()on your DataFrame to make sure everything looks right
volcanoes_fn = './data/conus_holocene_volcanoes.csv'
volcanoes_df = pd.read_csv(volcanoes_fn)
volcanoes_df.head()
| name | location | latitude | longitude | elevation | morphology | status | country | geometry | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Crater Lake | Oregon | 42.930 | -122.120 | 2487 | Caldera | Radiocarbon | United States | POINT (-122.12 42.93) |
| 1 | Craters of the Moon | Idaho | 43.420 | -113.500 | 2005 | Cinder cone | Radiocarbon | United States | POINT (-113.5 43.42) |
| 2 | Davis Lake | Oregon | 43.570 | -121.820 | 2163 | Volcanic field | Radiocarbon | United States | POINT (-121.82 43.57) |
| 3 | Devils Garden | Oregon | 43.512 | -120.861 | 1698 | Volcanic field | Holocene | United States | POINT (-120.861 43.512) |
| 4 | Diamond Craters | Oregon | 43.100 | -118.750 | 1435 | Volcanic field | Holocene | United States | POINT (-118.75 43.1) |
Convert the Pandas DataFrame to a GeoPandas GeoDataFrame#
See documentation here: https://geopandas.readthedocs.io/en/latest/gallery/create_geopandas_from_pandas.html
Careful about lon and lat order!
Store in a variable named
volcanoes_gdf(needed for sample code later)Run a quick
head()to make sure everything looks goodYou should have a new
geometrycolumn cointaining shapely Point objects
# STUDENT CODE HERE
| name | location | latitude | longitude | elevation | morphology | status | country | geometry | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Crater Lake | Oregon | 42.930 | -122.120 | 2487 | Caldera | Radiocarbon | United States | POINT (-122.12000 42.93000) |
| 1 | Craters of the Moon | Idaho | 43.420 | -113.500 | 2005 | Cinder cone | Radiocarbon | United States | POINT (-113.50000 43.42000) |
| 2 | Davis Lake | Oregon | 43.570 | -121.820 | 2163 | Volcanic field | Radiocarbon | United States | POINT (-121.82000 43.57000) |
| 3 | Devils Garden | Oregon | 43.512 | -120.861 | 1698 | Volcanic field | Holocene | United States | POINT (-120.86100 43.51200) |
| 4 | Diamond Craters | Oregon | 43.100 | -118.750 | 1435 | Volcanic field | Holocene | United States | POINT (-118.75000 43.10000) |
A note on Point geometry objects#
The coordiantes in the special
geometrycolumn of aGeoDataFrameare actuallyPointobjects from the Shapely libraryhttps://shapely.readthedocs.io/en/latest/manual.html
https://shapely.readthedocs.io/en/latest/manual.html#points
We will revisit geometry objects, and explore
Polygonobjects in more detail during the Vector2 lab
Set the GeoDataFrame CRS#
https://geopandas.org/en/latest/docs/user_guide/projections.html#setting-a-projection
Note that you can also define this during the initial GeoDataFrame creation, passing the appropriate EPSG code as an argument for the
crskeyword (gpd.GeoDataFrame(pandas_df, crs='EPSG:XXXX', geometry=...)
# STUDENT CODE HERE
Run a quick check to make sure CRS is set correctly!#
Inspect the
crsfor your GeoDataFrame
# STUDENT CODE HERE
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
Get bounding box (extent) and center (lon, lat) of volcano points#
See GeoPandas API reference. In this case, you want the
total_boundsattribute: https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.total_bounds.htmlTry to avoid copying/pasting and hardcoding values - store the output from
total_boundsas a new variable, then use the array indices to get the corresponding min/max valuesNote that
boundswill return the bounds of each record in the GeoDataFrame (in this case, just the point coordinates), whiletotal_boundsreturns the union for all records
Center can be calculated from the min/max extent values in each dimension
# STUDENT CODE HERE
array([-122.77 , 33.197, -105.93 , 48.777])
# STUDENT CODE HERE
array([-114.35 , 40.987])
#Alternative approach using convex hull
print(volcanoes_gdf.unary_union.convex_hull.centroid)
POINT (-115.1148366870538 39.67205108451101)
Plot the points using the convenience plot method of the GeoDataFrame#
Note that unlike the Pandas scatterplot function, you no longer need to specify the DataFrame columns containing the x and y values - GeoPandas will assume you want to use x and y values in Geometry objects
Color points by ‘elevation’ values
Note: GeoPandas syntax for this is slightly different than Pandas
Set point size appropriately
Add a colorbar
https://geopandas.org/en/latest/docs/user_guide/mapping.html#creating-a-legend
Note: if desired, you can pass a dictionary of colorbar properties (e.g.,
{'label':'Elevation (m)'}) tolegend_kwdsEven better, specify the vertical datum of the elevation values, in this case height above the WGS84 ellipsoid
Don’t specify a
figsizefor this plot (though fine to do this elsewhere), just use the default figure size.
# STUDENT CODE HERE

Geographic coordinate confidence check#
OK, great, GeoPandas makes creating this plot a little easier than last week.
Now, let’s dig a little deeper and think about this map.
Note that the default aspect ratio for the above map is ‘equal’, so the x and y scaling is the same
Check this by quickly using a ruler or piece of paper to measure the distance spanned by 10° on each axis on your screen (in mm)
Written response: Answer in the cell below: Are there any potential issues with this scaling for our geographic coordinates (latitude and longitude in decimal degrees)?#
STUDENT RESPONSE HERE
Do the following quick calculations for a spherical Earth (or attempt some more sophisticated geodetic distance calculations, if desired). Show your work! Drawing a quick sketch is likely useful as you think through this (no need to reproduce your sketch here).
Answer the following in the cell below:
What is the length (in km) of a degree of latitude at:
0° latitude (equator)
90° latitude (pole)
35° latitude? 49° latitude? (these are the approximate min and max latitude values of the volcanoes point data)
What is the length (in km) of a degree of longitude at:
0° latitude (equator)
90° latitude (pole)
35° latitude? 49° latitude?
Based on these values, does your map above have an equal aspect ratio in terms of true distance (in km)?
STUDENT RESPONSE HERE
# STUDENT RESPONSE HERE
Challenge Question (GS required/UG +0.5)#
Create a plot showing the relationship between the length of a degree of longitude in km as a function of latitude in decimal degrees for a range of 0 to 90 degrees latitude. Add two red points for 35 and 49. Fine to assume a spherical earth with average radius.
# STUDENT RESPONSE HERE

Hmmm. So how do we deal with these scaling issues?
Part 3: Using projected coordinate systems (3 pts)#
We need to choose a map projection that is appropriate for our data and objectives. This decision is important for visualization, but is also critical for precise quantitative analysis. For example, if you want to analyze area or volume change, you should use an equal-area projection. If you want to analyze distances between points, you should use an equidistant projection.
https://www.axismaps.com/guide/general/map-projections/
Sadly, there is no “perfect” projection. https://en.wikipedia.org/wiki/Map_projection#Which_projection_is_best?
You, as the mapmaker or data analyst, are responsible for choosing a projection with the right characteristics for your purposes. Let’s explore a bit further, and we’ll revisit some general guidelines later.
Use GeoPandas to reproject your GeoDataFrame#
Start by reprojecting the points to a Universal Transverse Mercator (UTM), Zone 11N on the WGS84 Ellipsoid
You’ll have to look up the appropriate EPSG code
Store the output as a new GeoDataFrame
volcanoes_gdf_utmDo a quick
head()and note the new values in thegeometrycolumn
# STUDENT CODE HERE
| name | location | latitude | longitude | elevation | morphology | status | country | geometry | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Crater Lake | Oregon | 42.930 | -122.120 | 2487 | Caldera | Radiocarbon | United States | POINT (82163.231 4765774.760) |
| 1 | Craters of the Moon | Idaho | 43.420 | -113.500 | 2005 | Cinder cone | Radiocarbon | United States | POINT (783338.410 4813408.933) |
| 2 | Davis Lake | Oregon | 43.570 | -121.820 | 2163 | Volcanic field | Radiocarbon | United States | POINT (110757.528 4835413.815) |
| 3 | Devils Garden | Oregon | 43.512 | -120.861 | 1698 | Volcanic field | Holocene | United States | POINT (187909.330 4824919.698) |
| 4 | Diamond Craters | Oregon | 43.100 | -118.750 | 1435 | Volcanic field | Holocene | United States | POINT (357590.415 4773405.966) |
Create a new plot of the reprojected points#
Note that the plot may look very similar to the original lat/lon points, but there are some subtle differences (read ahead for more details).
Note the axes labels of the new coordinate system - units, origin (0,0), aspect ratio
To confuse things further, the UTM projections include a “False Easting” of 500 km, so the x origin (0 on your plot) does not coincide with the center longitude of the projection (-117°W for 11N)! It is actually placed outside (west) of the 6°-wide zone. UTM zones in the southern hemisphere also include a “False Northing” of 10000 km. These offsets are introduced to avoid negative coordinates, which avoids sign errors and enables simpler calculations (more relevant in the old days with pen/paper or mental math). See http://wiki.gis.com/wiki/index.php/False_origin.
# STUDENT CODE HERE

OK, what did we just do?#
Under the hood, GeoPandas used the pyproj library (a Python API for PROJ) to transform each point from one coordinate system to another coordinate system.
I guarantee that you’ve all done this kind of thing before, you may just not remember it or recognize it in this context. See: https://en.wikipedia.org/wiki/List_of_common_coordinate_transformations
Transforming coordinates on the surface of the same reference ellipsoid is pretty straightforward. Things start to get more complicated when you include different ellipsoid models, horizontal/vertical datums, etc. Oh, also the Earth’s surface is not static - plate tectonics make everything more complicated, as time becomes important, some coordinate systems remain “fixed” relative to a moving plate (e.g., NAD83), and transformations must include a “kinematic” component.
Fortunately, the PROJ library (https://proj.org/about.html) has generalized much of the complicated math for geodetic coordinate transformations. It’s been under development since the 1980s, and our understanding of the Earth’s shape and plate motions has changed dramatically in that time period (thanks to GNSS like GPS and other satellite observations). So, still pretty complicated, and there are different levels of support/sophistication in the tools/libraries that use PROJ (like GeoPandas).
Define a custom projection for Western U.S#
The UTM projection we used above is not the best choice for our data, which actually span 4 UTM zones: https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system#/media/File:Utm-zones-USA.svg. Each UTM zone is a north-south “slice” of the Earth’s surface covering 6° of longitude around a central meridian.
We used UTM Zone 11N, and while distortion should be limited around the -117°W central meridian, it will increase with distance beyond the -120° to -114°W zone boundaries.
Let’s instead use a custom Albers Equal Area (AEA) projection to minimize area distoration over the full spatial extent of our volcano points.
To do this, we’ll define a PROJ string (https://proj.org/usage/quickstart.html?highlight=definition), which can be interpreted by most Python geopackages (like pyproj).
This interactive tool might be useful to explore some options for a user-defined bounding box: https://projectionwizard.org/
The Albers Equal Area projection definition requires two standard parallels: https://proj.org/operations/projections/aea.html. Here, we will also specify the center latitude and center longitude for the coordinate system origin.
Define a custom Albers Equal Area proj string
'+proj=aea...'https://en.wikipedia.org/wiki/Albers_projection
PROJ reference, with example: https://proj.org/operations/projections/aea.html
Use the center longitude and latitude of the volcano points you calculated earlier to set the
lon_0andlat_0parameters.Note that the PROJ documentation doesn’t list the
lat_0parameter, but this will properly set the (0,0) origin of your new projection.
Define the two standard parallels (lines of latitude) based on the latitude range of the points
Map scale will be true along these parallels, with distortion increases as you move north or south
One simple approach would be to use min and max latitude from the
total_boundsextent computed earlierIf this seems too easy, figure out how to place them slightly inside your min and max latitude to minimize distortion across the entire latitude range
if you attempt this, check out https://www.sciencedirect.com/science/article/pii/S009830041630053X (one of the “Existing Recommendations” is fine)
Use Python f string formatting to dynamically create your proj string (don’t just hardcode your values, but substitute variables in the string)
Print the final proj string
# STUDENT CODE HERE
# STUDENT CODE HERE
# STUDENT CODE HERE
Reproject the volcano points using your custom equal-area projection#
Store the output as a new GeoDataFrame and check that the
geometrywas updated.Confidence check with a scatter plot
Origin should be near center of volcano points
# STUDENT CODE HERE
Create scatter plots for comparison#
4 subplots (WGS84, UTM, custom AEA, and Web Mercator)
Might be useful to add gridlines with constant spacing (somewhere between ~200-500 km might work) to each plot for projected data using matplotlib
grid()
from matplotlib.ticker import MultipleLocator
def add_grid(ax, tick_interval=500000):
ax.xaxis.set_major_locator(MultipleLocator(tick_interval))
ax.yaxis.set_major_locator(MultipleLocator(tick_interval))
ax.grid()
# hint: sample starter code
# f, axa = plt.subplots(1,4, figsize=(15,3)) define your figure and axes!
# volcanoes_gdf.plot(ax=axa[0], column='volcanoes_z', cmap='inferno', markersize=1, legend=True) plot the gdf on the correct axis
# axa[0].set_title('WGS84') # set title
# axa[0].grid() # add grid
# volcanoes_gdf_utm........ plot the gdf on the correct axis
# set title
# add grid for the next three plots using add grid function in the cell above... example usage: add_grid(axa[1])
# and the rest :)
# STUDENT CODE HERE

These look similar, so why are we spending time on this?#
Note the location of the origin for each coordinate system
The (0,0) should be near the center of your points for the AEA projection
Note subtle distortion of points near the margins
Let’s dive into some analysis of distances, azimuths and areas to evaluate projection distortion
Part 4: Distance and azimuth distortion analysis (4 pts)#
We will now explore how these projections affect distances and angles
Let’s start by defining two points from our dataset separated by a large distance - we can use points corresponding to minimum and maximum longitude values. We will store these points in a new geodataframe
volcanoes_gdf_mmlon
min_idx = volcanoes_gdf['longitude'].argmin()
max_idx = volcanoes_gdf['longitude'].argmax()
volcanoes_gdf_mmlon = volcanoes_gdf.iloc[[min_idx, max_idx]]
volcanoes_gdf_mmlon
| name | location | latitude | longitude | elevation | morphology | status | country | geometry | |
|---|---|---|---|---|---|---|---|---|---|
| 31 | Clear Lake | California | 38.97 | -122.77 | 1439 | Volcanic field | Holocene | United States | POINT (-122.77000 38.97000) |
| 29 | Carrizozo | New Mexico | 33.78 | -105.93 | 1731 | Cinder cone | Holocene | United States | POINT (-105.93000 33.78000) |
volcanoes_gdf_mmlon.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
Create a quick plot to visualize#
f, ax = plt.subplots()
volcanoes_gdf.plot(ax=ax)
volcanoes_gdf_mmlon.plot(ax=ax, color='r');
Compute Geodetic distance and azimuth between these points#
This is our “truth” - the geodetic distance along the 3D surface of the ellipsoid
Turns out calculating this is not as straightforward as it may sound:
https://en.wikipedia.org/wiki/Geographical_distance
https://en.wikipedia.org/wiki/Geodesics_on_an_ellipsoid
This is one of the main reasons that we use 2D planar projections! Geometric calculations become much simpler in 2D compared to the curved 3D surface of an ellipsoid.
Fortunately, there are approximations and several mature tools/libraries with code to do this efficiently
We will use the pyproj functionality here
https://pyproj4.github.io/pyproj/stable/api/geod.html
from pyproj import Geod
#Define the ellipsoid to use for calculations
geod = Geod(ellps="WGS84")
#Compute geodetic distance between first and last point in a GeoDataFrame
#Return the distance and azimuth and back azimuth (degrees clockwise from north)
def geodetic_az_dist(gdf):
#Extract the points
p0 = gdf.iloc[0]
p1 = gdf.iloc[-1]
#Compute the geodesic azimuth, back azimuth and distance
az, backaz, dist = geod.inv(p0['longitude'], p0['latitude'], p1['longitude'], p1['latitude'])
#print('Distance: {:0.2f} km, Azimuth: {:0.2f}°, Back Azimuth: {:0.2f}°'.format(dist/1000., az, backaz))
#return {'dist':dist, 'az':az, 'backaz':backaz}
print('Distance: {:0.2f} km, Azimuth: {:0.2f}°'.format(dist/1000., az))
return {'dist':dist, 'az':az}
geo_da = geodetic_az_dist(volcanoes_gdf_mmlon)
Distance: 1614.09 km, Azimuth: 105.70°
Compute Euclidian distance and azimuth between these points for different projections#
Use the Pythagorean theorem in a cartesian coordinate system
https://en.wikipedia.org/wiki/Euclidean_distance
I’ve prepared some sample code here
You will need to reproject the
volcanoes_gdf_mmlondataframe and pass to theeuclidian_az_dist()function defined belowPlease do this for:
UTM Zone 11N
Web Mercator
Your AEA projection
Challenge Question (GS required/UG +0.5): an azimuthal equidistant projection with projection center defined using the first point in the GeoDataFrame (you’ll need to create a new proj string here, see https://proj.org/operations/projections/aeqd.html for reference)
Note that the
euclidian_az_distfunction prints out values, but you may want to store the returned dictionary, and then use for later calculations
def euclidian_az_dist(gdf):
unit = gdf.crs.axis_info[0].unit_name
if unit != 'metre':
print('Input CRS has units of {}, expected projected CRS'.format(unit))
else:
#Compute difference in x coordinates
dx = gdf.iloc[-1].geometry.x - gdf.iloc[0].geometry.x
#Compute difference in y coordinates
dy = gdf.iloc[-1].geometry.y - gdf.iloc[0].geometry.y
#Compute azimuth using x and y offsets
az = np.degrees(np.arctan2(dx,dy))
#Compute distance using GeoPandas
dist = gdf.distance(gdf.iloc[-1].geometry).iloc[0]
#Same as this:
#dist = np.sqrt(dx**2 + dy**2)
print('Distance: {:0.2f} km, Azimuth: {:0.2f}°'.format(dist/1000., az))
return {'dist':dist, 'az':az}
#Note that GeoPandas is smart enough to raise exception when inputs are geographic (lat,lon)
euclidian_az_dist(volcanoes_gdf_mmlon)
Input CRS has units of degree, expected projected CRS
# STUDENT CODE HERE
# STUDENT CODE HERE
# STUDENT CODE HERE
# STUDENT CODE HERE
Analysis: which of these projections is closest to our true distance?#
Use the sample function to compute percent difference from the “true” geodetic distance values
#Compute the percent difference
#Input should be two dictionaries, each containing 'dist' and 'az' values for a given projection
def percdiff(d1, d2):
dist_diff_perc = 100 * abs(d1['dist'] - d2['dist'])/d2['dist']
az_diff_perc = 100 * abs(d1['az'] - d2['az'])/d2['az']
print(f'Distance diff: {dist_diff_perc:.2f}%, Azimuth diff: {az_diff_perc:.2f}%')
#Compare web mercator values to the geodesic values
print("Web Mercator:", wm_da)
print("Geodetic (truth):", geo_da)
percdiff(wm_da, geo_da)
Web Mercator: {'dist': 2007444.577723981, 'az': 110.95940332739855}
Geodetic (truth): {'dist': 1614090.3484979372, 'az': 105.69597799717235}
Distance diff: 24.37%, Azimuth diff: 4.98%
# STUDENT CODE HERE
# STUDENT CODE HERE
# STUDENT CODE HERE
Written response: which of these projections is closest to our true distance?
STUDENT RESPONSE HERE
So what is going on here?#
You are seeing different types of distortion (distance, direction) for each projection, compared to the “true” geodetic values on the surface of the Ellipsoid.
In this case, the distance distortion for some projections is <1%, but it’s very possible that <1% will introduce unacceptable (and unnecessary) error for precise engineering applications. This distortion will increase as distances increase.
It’s important to pick a projection that is well-suited for your application. If you care about accurate representation of distances, you should use an equidistant projection. If you care about accurate representation of areas, use an equal-area projection.
Remember, there’s no “perfect” projection, so it’s on you to account for this properly. You would be surprised at how often errors due to projection decisions end up in published literature.
Part 5: Save the projected points to a GIS-ready file (1 pt)#
Often you want to preserve your awesome work and save a GeoDataFrame to use in some other software, or maybe to share with a colleague.
There are many potential options here for formats. For geospatial data, you often want a GIS-ready file format (like a shapefile). And in some cases, you just want a CSV with coordinates, which can be opened by most text readers (and many 3rd-party applications).
Try to create the following output files using Geopandas and the underlying Pandas functionality:
https://geopandas.org/en/stable/docs/user_guide/io.html#writing-spatial-data
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-store-in-csv
Just explore the I/O functionality, don’t add/commit and push to your github repo with solutions. Just keep them on the hub for use in a future lab.
Export GeoDataFrame to CSV#
For convenience, let’s add new columns for the x and y coordinates in the utm coordinates
This should be pretty simple - no need for loops or a custom function here (hint: use the
xandyattributes of the specialgeometrycolumn). Can store the projected x and y coordinates as new columns in your GeoDataFrame)Note: don’t need to do this, as the geometry column already stores this information, but sometimes you want to preserve projected coordinates (alongside lat/lon) if you’re going to export as a csv and analyze with other tools (e.g., AutoCAD)
Warning: no information about projection is preserved in a standard CSV, just the coordinate values. If you choose to store things as a CSV, it’s on you to document projection in metadata before distributing
A better option is to use a GIS-ready file format (below) that embeds the CRS metadata along with the coordinates
def gdf_add_xy(gdf):
gdf['proj_x'] = gdf['geometry'].x
gdf['proj_y'] = gdf['geometry'].y
gdf_add_xy(volcanoes_gdf_utm)
volcanoes_gdf_utm.head()
| name | location | latitude | longitude | elevation | morphology | status | country | geometry | proj_x | proj_y | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Crater Lake | Oregon | 42.930 | -122.120 | 2487 | Caldera | Radiocarbon | United States | POINT (82163.231 4765774.760) | 82163.230799 | 4.765775e+06 |
| 1 | Craters of the Moon | Idaho | 43.420 | -113.500 | 2005 | Cinder cone | Radiocarbon | United States | POINT (783338.410 4813408.933) | 783338.409673 | 4.813409e+06 |
| 2 | Davis Lake | Oregon | 43.570 | -121.820 | 2163 | Volcanic field | Radiocarbon | United States | POINT (110757.528 4835413.815) | 110757.528144 | 4.835414e+06 |
| 3 | Devils Garden | Oregon | 43.512 | -120.861 | 1698 | Volcanic field | Holocene | United States | POINT (187909.330 4824919.698) | 187909.330168 | 4.824920e+06 |
| 4 | Diamond Craters | Oregon | 43.100 | -118.750 | 1435 | Volcanic field | Holocene | United States | POINT (357590.415 4773405.966) | 357590.415274 | 4.773406e+06 |
out_fn='./data/conus_holocene_volcanoes_utm.csv'
volcanoes_gdf_utm.to_csv(out_fn)
Open the csv from Jupyter lab file browser to verify output looks good, with new columns#
Save GIS-ready files to disk#
Use fiona to get a list of available file type drivers for output
Note: the ‘r’ means fiona/geopandas can read this file type, ‘w’ means it can write this file type, ‘a’ means it can append to an existing file.
https://fiona.readthedocs.io/en/latest/manual.html#writing-vector-data
fiona.supported_drivers
How to choose a file format?#
I’m guessing that most of you have used ESRI shapefiles in the past.
Please stop 🛑. This is a legacy format, though it is still widely used.
http://switchfromshapefile.org/
Better options these days are Geopackage (GPKG) when spatial index is required, and simple GeoJSON for other cases
Both should be supported by any respectable GIS (including QGIS, ArcGIS, etc)
Note that GeoJSON is typically only used for geographic coordinates (EPSG:4326), not projected coordinates
https://feed.terramonitor.com/shapefile-vs-geopackage-vs-geojson/
Now that you’ve made an informed decision for an output format, let’s use the Geopandas
to_file()method to create this fileLet’s export to Geopackage for this exercise
Make sure you properly specify filename with extension and the
driveroptionNote: Writing out may take a minute, and may produce an intermediate ‘.gpkg-journal’ file
# STUDENT CODE HERE
# STUDENT CODE HERE
You can now directly load this gpkg file in any GIS, without defining a coordinate system or dealing with your original csv. You can also load this file directly into geopandas in the future using the read_file() method, without having to repeat the processing above. We’ll do this for lab exercises in a few weeks.
Optional: Download and open in QGIS or ArcGIS#
Try it! Right-click on file in the file browser on the left side of the JupyterLab interface, then select Download and pick a location on your local computer (e.g., your Downloads folder).
Then open this file in QGIS or ArcGIS on your local machine!
Check out your geopackage using gdal: gdalsrsinfo#
We covered basic shell and command line usage during in Week01. I included this because there are many powerful command-line utilities out there that can batch data processing and quick geospatial data inspection. The GDAL/OGR command-line utilities are worth exploring - we will discuss more during raster module. Some of these have equivalent Python functions, while others do not.
One of these utilties, gdalsrsinfo, is great for quickly inspecting dataset CRS, converting between different CRS formats (WKT, proj strings, EPSG codes, etc), and exploring transformations between different CRS.
Try it!#
Open a new terminal, navigate to the directory where you just created the output file. Alternatively, you can run these shell commands from a notebook cell. Run the following (substituting your filename):
gdalsrsinfo EPSG:32611gdalsrsinfo yourfilename.gpkg
By default, this will export a proj string and WKT for the crs. You can also output different formats (e.g., EPSG code). See gdalsrsinfo -h for usage. Try the -o all option.
# STUDENT CODE HERE
# STUDENT CODE HERE
# STUDENT CODE HERE
Part 6: Area Distortion (4pts)#
Get polygons for US States#
Hmmm, let’s see. Two choices:
We could go to ESRI or the U.S. Census website, identify and download a shapefile, unzip 4+ files, copy/paste the appropriate *.shp filename into the notebook. Wait, how can I download on a remote server? OK, maybe run something like
wget http://..., unzip, provide absolute path
- OR -Give geopandas a url string that points to a GeoJSON file somewhere on the web, and read dynamically
#2 is easier :)
Let’s use the US States 5M GeoJSON here: http://eric.clst.org/tech/usgeojson/
We’ve talked about JSON as the text format used by Jupyter notebooks. GeoJSON extends this format to include geospatial information. It’s pretty great. If you are unfamiliar, take a moment to read about GeoJSON: https://en.wikipedia.org/wiki/GeoJSON
Quick inspection#
Take a look at the 5M GeoJSON file (digitized to preserve details at a scale of 1:5000000) contents in your browser or download and open with a text editor.
Note organization structure. Think about how this compares to, say, a Python dictionary object.
Read directly with GeoPandas#
Read the file using GeoPandas by passing the url to gpd.read_file().
#states_url = 'http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_500k.json'
states_url = 'http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_5m.json'
#states_url = 'http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_20m.json'
# STUDENT CODE HERE
Inspect the GeoDataFrame#
Note the columns and geometry type(s) in the
geometrycolumnsDo a quick
plot()to preview
states_gdf.plot();
Note: The extent of the Aleutian islands (Alaska) crosses the -180/+180°W antimeridian, which is why the extent spans the full longitude range.
Check the CRS#
Note that this was automatically defined during the
read_file()step. Thanks GeoPandas!
# STUDENT CODE HERE
Compute the area (in $km^2$) for all states#
Our GeoDataFrame of states is still in geographic coordinates (decimal degrees)
We could compute area in decimal degrees, but remember our analysis above about how the length of a degree of longitude varies? This will also affect area calculations.
We want polygon area values in km2
What might you need to do to the GeoDataFrame before you proceed…
Answer: reproject the polygons!#
Let’s start by reprojecting the state polygons using the same UTM 11N projection we explored earlier
Store output as a new GeoDataFrame with unique name (e.g., states_gdf_utm)
Do a quick
head()on the output to verify a change in geometry coordinatesDo a quick
plot(), noting the new coordinate values.
# STUDENT CODE HERE
# STUDENT CODE HERE

Compute the area of the polygons in the projected GeoDataFrame#
Compute the area for each state in km^2
Don’t use a loop - this should be straightforward to do using geopandas: https://geopandas.org/en/latest/docs/reference/api/geopandas.GeoSeries.area.html
Compare your values with the values in the
CENSUSAREAcolumnCalcuate an area difference between your values and the
CENSUSAREAvaluesNote:
CENSUSAREAvalues are in square miles (sigh), so you’ll need to convert to square kmConvert the area difference to a percent difference
#Sample function
def gdf_area_analysis(gdf):
#Convert area from GeoPandas (m^2) to km^2
gdf['myarea'] = gdf.area/1E6
#Convert area from source data (mi^2) to km^2
gdf['CENSUSAREA_km2'] = gdf['CENSUSAREA']*2.58999
#Compute difference and store in new column
gdf['area_diff'] = gdf['myarea'] - gdf['CENSUSAREA_km2']
#Compute percent difference
gdf['area_diff_perc'] = 100 * gdf['area_diff']/gdf['CENSUSAREA_km2']
#Return a new dataframe with index sorted by percent difference
return gdf.sort_values(by='area_diff_perc', ascending=False)
# STUDENT CODE HERE
Written response:
Do your area percent differences agree? If not, any guess about why not?
What do you notice about the states with largest difference - how far are they from the UTM zone center longitude?
Hint: try sorting your GeoDataFrame by the percent difference column
STUDENT RESPONSE HERE
Repeat for your Albers Equal Area projection#
# STUDENT CODE HERE
# STUDENT CODE HERE

Repeat for Web Mercator#
# STUDENT CODE HERE
For each projection, create a choropleth plot showing the percent area difference for all states#
See https://geopandas.org/en/stable/docs/user_guide/mapping.html#choropleth-maps
Specify common vmin and vmax values for intercomparison (e.g., 0 and 100%)
Note: You may need to
ax.set_xlim(right=-5000000)for the web mercator plot, as the Aleutian Islands west of mainland Alaska actually cross the antimeridian (-180° longitude) and plot on the other side of the map
# STUDENT CODE HERE

Written response#
Consider how the area distortion differences vary for the different projections, and answer the following in the cell below:
Are they spatially uniform?
How do they vary with distance from the projection center or latitude/longitude of true scale?
Hopefully this helps to reinforce what touched on earlier with distances and the area - choice of projection is important depending on the intended analysis. If you intend to do area analysis, you want to use an equal-area projection
STUDENT RESPONSE HERE
Note from David…
If you look closely, there are some residual area differences between CENSUSAREA and our equal-area polygons. This is likely related to coastlines, inland water bodies, mapping scale, and how the CENSUSAREA is computed. Using the 20M or 500K polygons doesn’t make much of a difference.
This page says not to use the Census shapefiles for area analysis: https://www.census.gov/programs-surveys/geography/technical-documentation/naming-convention/cartographic-boundary-file.html.
This document outlines the methodology for Census area calculations, and describes issues for different bodies of water: https://www2.census.gov/geo/pdfs/reference/GARM/Ch15GARM.pdf.
Makes sense for states like Louisiana, Florida and Maryland, but need to look at this further for other states with complex coastlines.
🤷 They’re close enough to illustrate advantages of an equal-area projection for now.

Part 7: Combine Points and Polygons (2 pts)#
Create a combined plot of state outlines and volcano points#
You already reprojected both the volcano points GeoDataFrame and the states GeoDataFrame to the same UTM projected coordinate system with units of meters. Let’s add both to the same plot, so we have better context for our points!
See documentation here: https://geopandas.org/en/latest/docs/user_guide/mapping.html#maps-with-layers
Use the matplotlib object-oriented interface to plot on the same axes:
Can start with
f, myax = plt.subplots()to create a matplotlib.axes object, then passmyaxto the states GeoDataFrame plot() call:states_gdf_utm.plot(ax=myax,...).I recommend using
facecolor='white'andedgecolor='black'
Alternatively, remember that the GeoDataFrame
plot()function returns a matplotlib.axes object by default, which you can store as a new variable namedmyaxmyax = states_gdf_utm.plot(...)Note that you no longer see
<matplotlib.axes._subplots.AxesSubplot at 0x7f0da85f5a58>output in the notebook
Now, in the same cell, plot the reprojected volcano point GeoDataFrame on the same axes, passing
myaxto theaxargument of thevolcano_gdf_utm.plot()callYou should see your points drawn over the state polygons!
Make sure you get the plotting order correct, or appropriately set plotting arguments for transparency
Note that you can continue to update/modify the axes (e.g., add a title) by modifying your
myaxobject in the same cell
# STUDENT CODE HERE

OK, that looks good, but how do we limit the map extent to the Western US?#
Get the total bounding box (or extent) of the reprojected volcano points#
Hint: you did this earlier for the original lat/lon GeoDataFrame, should be easy to repeat for your projected GeoDataFrame.
# STUDENT CODE HERE
Set the x and y axes limits to your projected bounds to update your plot#
See
set_xlimandset_ylimYou should see your points with state borders overlaid for context
# STUDENT CODE HERE

Challenge Question: create a function to pad the bounds by user-specified distance (GS required/UG +0.5)#
Assume the user knows the units of their projection, and specifies this distance appropriately (e.g., 30000 meters, not 30000 degrees)
Play with padding distances and replot for a visually pleasing extent (i.e., don’t chop off California)
# STUDENT CODE HERE
# STUDENT CODE HERE
# STUDENT CODE HERE

Summary#
OK, we covered a lot of ground with this one. These concepts are fundemental to all aspects of geospatial data analysis, and we’ll see them throughout the rest of the quarter. So if some aspects are still a bit fuzzy, know that there will be more opportunities to revisit and discuss.
We covered basic operations, plotting, Geopandas GeoDataFrames and associated methods/attributes - these are great for most geospatial vector data and analysis.
Caveat: the standard GeoPandas tools are not ideally suited for workflows involving billions of points - these cases will require optimization (e.g., https://github.com/geopandas/dask-geopandas) and potentially a different suite of tools.
Hopefully you got a sense of the most common approaches to define projections (EPSG codes, proj strings, WKT), and now understand some of the tradeoffs between different projections for different objectives involving visualization or quantitative analysis.
For most applications, best to avoid using a geographic projection (lat/lon) for analysis (or visualization) due to scaling issues
In general, for local studies or smaller areas (<500x500 km), you can use the appropriate UTM zone, which has acceptable distortion in terms of distance, angles and areas
For larger areas, probably want to define a custom projection (remember https://projectionwizard.org/) or use a standard regional projection designed for your intended purpose
Be careful with Web Mercator - in the coming weeks we’ll be using public raster tile datasets prepared with this projection, so try to remember some of the issues highlighted here
When plotting and doing anlaysis, all of your input datasets must share the same coordinate system, which may require reprojecting one (or more) datasets to match. You can then plot them on the same axes. We will do more of this in coming weeks to combine raster and vector data.
Submission#
Save the completed notebook (make sure to fully run the notebook and check all cell output is visible)
Use the
git add; git commit -m 'message'; git pushworkflow to push your work to the remote repositoryideally you’ve been using add / commit / push as you make progress on this notebook
Check the remote repository to check all of the files you want to submit have been pushed
Scroll through your jupyter notebook on your remote repository and make sure all output and plots are visible
When you have completed your last push, submit the url pointing to your Github repository to the corresponding Canvas assignment