Lab03 assignment (20 pts)

Contents

Lab03 assignment (20 pts)#

UW Geospatial Data Analysis
CEE467/CEWA567
David Shean

modified by Eric Gagliano and Quinn Brencher

Introduction#

Objectives#

  1. Review some fundamental concepts that are common to most geospatial analysis

  2. Learn basic operations with Geopandas

  3. Explore coordinate systems, projections and transformations, geometry types

  4. Understand how different projections can distort measurements and visualizations

  5. Create more sophisticated visualizations involving multiple layers and data types

Instructions#

  • For each question or task below, write some code in the empty cell and execute to preserve your output

  • If you are in the graduate section of the class, please complete the challenge questions

  • Work together, consult resources we’ve discussed, post on slack!

Part 1: Exploring CRS and Projections (3 pts)#

Import necessary modules#

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, Polygon
import fiona
import pyproj
#%matplotlib widget
%matplotlib inline

Load sample vector data: world polygons#

  • See example (and lots of relevant info): https://geopandas.org/projections.html

  • This GeoDataFrame containing attributes and Polygon geometries for all countries is conveniently bundled with Geopandas

world = gpd.read_file("https://naciscdn.org/naturalearth/110m/cultural/ne_110m_admin_0_countries.zip")
/tmp/ipykernel_5512/3926210268.py:1: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.
  world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

Inspect the world GeoDataFrame#

  • Review the columns

  • Note geometry types: both Polygon and MultiPolygon - why the difference? (no written answer needed)

world.head()
pop_est continent name iso_a3 gdp_md_est geometry
0 889953.0 Oceania Fiji FJI 5496 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 58005463.0 Africa Tanzania TZA 63177 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 603253.0 Africa W. Sahara ESH 907 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3 37589262.0 North America Canada CAN 1736425 MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4 328239523.0 North America United States of America USA 21433226 MULTIPOLYGON (((-122.84000 49.00000, -120.0000...

Check the coordinate reference system (crs)#

world.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

Look up this EPSG code online#

  • See top of crs output

  • Burn this code in your brain

  • Written response: What are units? Would this be an appropriate crs to do area calculations? Why or why not?

STUDENT RESPONSE HERE

Plot the GeoDataFrame using the convenient plot method with default settings#

  • Note that this is a 2D representation of geographic coordinates (lon,lat), known as Equirectangular, Equidistant Cylindrical, and “Plate Carrée” (flat square in French) projection

  • https://en.wikipedia.org/wiki/Equirectangular_projection

# STUDENT CODE HERE

(image

Change the world!#

  • Reproject the world GeoDataFrame to another common, global projection: Web Mercator

    • This is one of the most common projections used for online maps (e.g., Google Maps) and tiled basemaps

    • https://en.wikipedia.org/wiki/Web_Mercator_projection

  • Look up (or recall from memory) the appropriate EPSG code

  • Use the GeoPandas to_crs() function to reproject the world GeoDataFrame to Web Mercator and create a new plot

    • If you didn’t review it earlier, now might be a good time to take a look at this documentation: https://geopandas.org/en/stable/docs/user_guide/projections.html#re-projecting

# STUDENT CODE HERE

img

Uh oh! Why didn’t this work? Let’s try a Robinson projection instead…

#Robinson - no issues, goes all the way to -90 to +90° latitude
world.to_crs('ESRI:54030').plot();
../../_images/54176327f43b7a170c7a0539562f7c04ac24487a72a3012bbd07a3b17f90e38b.png

Workaround for Antarctica (more discussion later)#

world
pop_est continent name iso_a3 gdp_md_est geometry
0 889953.0 Oceania Fiji FJI 5496 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 58005463.0 Africa Tanzania TZA 63177 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 603253.0 Africa W. Sahara ESH 907 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3 37589262.0 North America Canada CAN 1736425 MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4 328239523.0 North America United States of America USA 21433226 MULTIPOLYGON (((-122.84000 49.00000, -120.0000...
... ... ... ... ... ... ...
172 6944975.0 Europe Serbia SRB 51475 POLYGON ((18.82982 45.90887, 18.82984 45.90888...
173 622137.0 Europe Montenegro MNE 5542 POLYGON ((20.07070 42.58863, 19.80161 42.50009...
174 1794248.0 Europe Kosovo -99 7926 POLYGON ((20.59025 41.85541, 20.52295 42.21787...
175 1394973.0 North America Trinidad and Tobago TTO 24269 POLYGON ((-61.68000 10.76000, -61.10500 10.890...
176 11062113.0 Africa S. Sudan SSD 11998 POLYGON ((30.83385 3.50917, 29.95350 4.17370, ...

177 rows × 6 columns

#https://tipsfordev.com/how-to-remove-shapes-that-cause-a-problem-upon-reprojection-from-a-geopandas-dataframe
from shapely.geometry import box
crs = pyproj.CRS.from_epsg(3857)
bounds = crs.area_of_use.bounds
gpd.clip(world, box(*bounds)).to_crs('EPSG:3857').plot();
../../_images/0363e19c1cd337211c86bfe4b2b17e77aeb3ad8afc867f252c8c6e6ed487b97c.png

Take a moment to consider the two plots#

  • Note the range of axes tick labels and units

  • Written response:

    • Summarize the issue with our first attempt projecting to web mercator.

    • What is most distorted in this map projection?

STUDENT RESPONSE HERE

Define a custom projection#

  • Set the origin of your projection on the (self-proclaimed) “Center of the Universe” – Fremont, Seattle, WA, Earth

    • https://www.atlasobscura.com/places/center-of-the-universe-sign

    • You’ll probably need to look up some coordinates

  • Let’s start by creating a proj string (make sure you use sufficient precision for decimal degrees)

    • https://proj.org/usage/quickstart.html

    • Choose a simple projection that accepts a center latitude and center longitude (e.g., orthographic)

      • See reference: https://proj.org/operations/projections/ortho.html

    • Your string should look something like '+proj=ortho +lon_0=YY.YYYYYYY +lat_0=XXX.XXXXXXX'

    • Make sure the order and sign of your latitude and longitude values make sense (confidence check!)

# STUDENT CODE HERE
# STUDENT CODE HERE

Reproject and plot#

  • Use GeoPandas to_crs() method to reproject the world GeoDataFrame using your local projection, reducing our beautiful multidimensional universe to a 2D plot

  • The plot() method returns a matplotlib axes object - store this as new variable called ax

  • Modify this axes object to add thin, black horizontal lines where y=0 and a vertical line where x=0, producing “crosshairs” on the origin

    • See the matplotlib axvline() and axhline() methods

    • The “line width” keyword lw may be useful

# STUDENT CODE HERE

img

Try more projections!#

  • Let’s experiment with a few additional projections, creating plots like the one above with crosshairs for the origin

  • This time, use some EPSG codes. You can pass a string like 'EPSG:XXXX' to the to_crs() method!

  • Plot the following:

    • EPSG:3031 - South Polar Stereographic (see if you can find Antarctica, might need to zoom in)

    • The EPSG code for the UTM zone on WGS84 ellipsoid that contains Seattle

      • see https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system

      • Note: If you search for “UTM 10N” in the EPSG registry, you may get several returns: https://epsg.io/?q=UTM+10N. These are all valid, but they use different ellipsoids/datums. You want the code for the WGS84 ellipsoid definition.

    • US National Atlas Equal area (EPSG:2163)

    • Two more additional EPSG codes of your choosing

  • Since we have to do this several times, define a function that accepts a geodataframe and an EPSG code as arguments, then does the reprojection and plotting (no need to return anything at this point). Then we could use a simple for loop to call the function for each EPSG code! Nice and clean, and better than copying/pasting.

  • Note: you may see some strange issues with “lines” across the map, or an apparently empty map (try zooming in on 0,0). You didn’t do anything wrong, see next section…

# STUDENT CODE HERE
# STUDENT CODE HERE

img img img img

OK, something isn’t right with some of these maps#

  • Note some polygons (countries) cross the antimeridian (-180°/+180° lon) or one of the poles (+90° or -90° lat, like, say Antarctica). Or polygons could extend beyond the valid extent for the target projection. These polygons won’t render correctly.

    • If using a regional projection for local or regional analysis (e.g., UTM or state plane coordinate system), you probably don’t care about polygons from the other side of the planet anyway.

    • The solution is to isolate or clip polygons of interest before reprojecting.

Isolate North American polygons#

  • Start with a quick inspection of the world GeoDataFrame

  • You should be able to use the standard Pandas “selection by label” approach (.loc) to isolate records for countries in North America

    • This can be done using a conditional statement, which will return a boolean array for selection

    • Store the output in a new GeoDataFrame

  • Regenerate the same plots (with your handy function!) using this new GeoDataFrame

# STUDENT CODE HERE

img

# STUDENT CODE HERE 

img img img img

That looks better!#

  • Note the location of the origin, apparent size and distortion of polygons (e.g., Mexico, Alaska and Greenland)

Part 2: Loading volcano locations into a GeoDataFrame (3 pts)#

  • Let’s explore a new dataset containing Holocene volcanoes (those active in the last 10,000 year) in the contiguous US.

  • These data are a subset of the data provided by the Global Volcanism Program, Smithsonian Institution

  • Global Volcanism Program, 2024. [Database] Volcanoes of the World (v. 5.2.3; 20 Sep 2024). Distributed by Smithsonian Institution, compiled by Venzke, E. https://doi.org/10.5479/si.GVP.VOTW5-2024.5.2

Load the existing csv into a Pandas DataFrame#

  • Define the relative path to the csv as in previous labs

  • Use the amazing Pandas read_csv() function to load as a Pandas DataFrame

  • Run a quick head() on your DataFrame to make sure everything looks right

volcanoes_fn = './data/conus_holocene_volcanoes.csv'
volcanoes_df = pd.read_csv(volcanoes_fn)
volcanoes_df.head()
name location latitude longitude elevation morphology status country geometry
0 Crater Lake Oregon 42.930 -122.120 2487 Caldera Radiocarbon United States POINT (-122.12 42.93)
1 Craters of the Moon Idaho 43.420 -113.500 2005 Cinder cone Radiocarbon United States POINT (-113.5 43.42)
2 Davis Lake Oregon 43.570 -121.820 2163 Volcanic field Radiocarbon United States POINT (-121.82 43.57)
3 Devils Garden Oregon 43.512 -120.861 1698 Volcanic field Holocene United States POINT (-120.861 43.512)
4 Diamond Craters Oregon 43.100 -118.750 1435 Volcanic field Holocene United States POINT (-118.75 43.1)

Convert the Pandas DataFrame to a GeoPandas GeoDataFrame#

  • See documentation here: https://geopandas.readthedocs.io/en/latest/gallery/create_geopandas_from_pandas.html

  • Careful about lon and lat order!

  • Store in a variable named volcanoes_gdf (needed for sample code later)

  • Run a quick head() to make sure everything looks good

  • You should have a new geometry column cointaining shapely Point objects

# STUDENT CODE HERE
name location latitude longitude elevation morphology status country geometry
0 Crater Lake Oregon 42.930 -122.120 2487 Caldera Radiocarbon United States POINT (-122.12000 42.93000)
1 Craters of the Moon Idaho 43.420 -113.500 2005 Cinder cone Radiocarbon United States POINT (-113.50000 43.42000)
2 Davis Lake Oregon 43.570 -121.820 2163 Volcanic field Radiocarbon United States POINT (-121.82000 43.57000)
3 Devils Garden Oregon 43.512 -120.861 1698 Volcanic field Holocene United States POINT (-120.86100 43.51200)
4 Diamond Craters Oregon 43.100 -118.750 1435 Volcanic field Holocene United States POINT (-118.75000 43.10000)

A note on Point geometry objects#

  • The coordiantes in the special geometry column of a GeoDataFrame are actually Point objects from the Shapely library

    • https://shapely.readthedocs.io/en/latest/manual.html

    • https://shapely.readthedocs.io/en/latest/manual.html#points

  • We will revisit geometry objects, and explore Polygon objects in more detail during the Vector2 lab

Set the GeoDataFrame CRS#

  • https://geopandas.org/en/latest/docs/user_guide/projections.html#setting-a-projection

  • Note that you can also define this during the initial GeoDataFrame creation, passing the appropriate EPSG code as an argument for the crs keyword (gpd.GeoDataFrame(pandas_df, crs='EPSG:XXXX', geometry=...)

# STUDENT CODE HERE

Run a quick check to make sure CRS is set correctly!#

  • Inspect the crs for your GeoDataFrame

# STUDENT CODE HERE
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

Get bounding box (extent) and center (lon, lat) of volcano points#

  • See GeoPandas API reference. In this case, you want the total_bounds attribute: https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.total_bounds.html

    • Try to avoid copying/pasting and hardcoding values - store the output from total_bounds as a new variable, then use the array indices to get the corresponding min/max values

    • Note that bounds will return the bounds of each record in the GeoDataFrame (in this case, just the point coordinates), while total_bounds returns the union for all records

  • Center can be calculated from the min/max extent values in each dimension

# STUDENT CODE HERE
array([-122.77 ,   33.197, -105.93 ,   48.777])
# STUDENT CODE HERE
array([-114.35 ,   40.987])
#Alternative approach using convex hull
print(volcanoes_gdf.unary_union.convex_hull.centroid)
POINT (-115.1148366870538 39.67205108451101)

Plot the points using the convenience plot method of the GeoDataFrame#

  • Note that unlike the Pandas scatterplot function, you no longer need to specify the DataFrame columns containing the x and y values - GeoPandas will assume you want to use x and y values in Geometry objects

  • Color points by ‘elevation’ values

    • Note: GeoPandas syntax for this is slightly different than Pandas

  • Set point size appropriately

  • Add a colorbar

    • https://geopandas.org/en/latest/docs/user_guide/mapping.html#creating-a-legend

    • Note: if desired, you can pass a dictionary of colorbar properties (e.g., {'label':'Elevation (m)'}) to legend_kwds

      • Even better, specify the vertical datum of the elevation values, in this case height above the WGS84 ellipsoid

  • Don’t specify a figsize for this plot (though fine to do this elsewhere), just use the default figure size.

# STUDENT CODE HERE

img

Geographic coordinate confidence check#

OK, great, GeoPandas makes creating this plot a little easier than last week.

Now, let’s dig a little deeper and think about this map.

Note that the default aspect ratio for the above map is ‘equal’, so the x and y scaling is the same

  • Check this by quickly using a ruler or piece of paper to measure the distance spanned by 10° on each axis on your screen (in mm)

Written response: Answer in the cell below: Are there any potential issues with this scaling for our geographic coordinates (latitude and longitude in decimal degrees)?#

STUDENT RESPONSE HERE

Do the following quick calculations for a spherical Earth (or attempt some more sophisticated geodetic distance calculations, if desired). Show your work! Drawing a quick sketch is likely useful as you think through this (no need to reproduce your sketch here).

Answer the following in the cell below:

  • What is the length (in km) of a degree of latitude at:

    • 0° latitude (equator)

    • 90° latitude (pole)

    • 35° latitude? 49° latitude? (these are the approximate min and max latitude values of the volcanoes point data)

  • What is the length (in km) of a degree of longitude at:

    • 0° latitude (equator)

    • 90° latitude (pole)

    • 35° latitude? 49° latitude?

  • Based on these values, does your map above have an equal aspect ratio in terms of true distance (in km)?

STUDENT RESPONSE HERE

# STUDENT RESPONSE HERE

Challenge Question (GS required/UG +0.5)#

Create a plot showing the relationship between the length of a degree of longitude in km as a function of latitude in decimal degrees for a range of 0 to 90 degrees latitude. Add two red points for 35 and 49. Fine to assume a spherical earth with average radius.

# STUDENT RESPONSE HERE

img

Hmmm. So how do we deal with these scaling issues?

Part 3: Using projected coordinate systems (3 pts)#

We need to choose a map projection that is appropriate for our data and objectives. This decision is important for visualization, but is also critical for precise quantitative analysis. For example, if you want to analyze area or volume change, you should use an equal-area projection. If you want to analyze distances between points, you should use an equidistant projection.

https://www.axismaps.com/guide/general/map-projections/

Sadly, there is no “perfect” projection. https://en.wikipedia.org/wiki/Map_projection#Which_projection_is_best?

You, as the mapmaker or data analyst, are responsible for choosing a projection with the right characteristics for your purposes. Let’s explore a bit further, and we’ll revisit some general guidelines later.

Use GeoPandas to reproject your GeoDataFrame#

  • Start by reprojecting the points to a Universal Transverse Mercator (UTM), Zone 11N on the WGS84 Ellipsoid

    • You’ll have to look up the appropriate EPSG code

  • Store the output as a new GeoDataFrame volcanoes_gdf_utm

  • Do a quick head() and note the new values in the geometry column

# STUDENT CODE HERE
name location latitude longitude elevation morphology status country geometry
0 Crater Lake Oregon 42.930 -122.120 2487 Caldera Radiocarbon United States POINT (82163.231 4765774.760)
1 Craters of the Moon Idaho 43.420 -113.500 2005 Cinder cone Radiocarbon United States POINT (783338.410 4813408.933)
2 Davis Lake Oregon 43.570 -121.820 2163 Volcanic field Radiocarbon United States POINT (110757.528 4835413.815)
3 Devils Garden Oregon 43.512 -120.861 1698 Volcanic field Holocene United States POINT (187909.330 4824919.698)
4 Diamond Craters Oregon 43.100 -118.750 1435 Volcanic field Holocene United States POINT (357590.415 4773405.966)

Create a new plot of the reprojected points#

  • Note that the plot may look very similar to the original lat/lon points, but there are some subtle differences (read ahead for more details).

  • Note the axes labels of the new coordinate system - units, origin (0,0), aspect ratio

    • To confuse things further, the UTM projections include a “False Easting” of 500 km, so the x origin (0 on your plot) does not coincide with the center longitude of the projection (-117°W for 11N)! It is actually placed outside (west) of the 6°-wide zone. UTM zones in the southern hemisphere also include a “False Northing” of 10000 km. These offsets are introduced to avoid negative coordinates, which avoids sign errors and enables simpler calculations (more relevant in the old days with pen/paper or mental math). See http://wiki.gis.com/wiki/index.php/False_origin.

# STUDENT CODE HERE

img

OK, what did we just do?#

Under the hood, GeoPandas used the pyproj library (a Python API for PROJ) to transform each point from one coordinate system to another coordinate system.

I guarantee that you’ve all done this kind of thing before, you may just not remember it or recognize it in this context. See: https://en.wikipedia.org/wiki/List_of_common_coordinate_transformations

Transforming coordinates on the surface of the same reference ellipsoid is pretty straightforward. Things start to get more complicated when you include different ellipsoid models, horizontal/vertical datums, etc. Oh, also the Earth’s surface is not static - plate tectonics make everything more complicated, as time becomes important, some coordinate systems remain “fixed” relative to a moving plate (e.g., NAD83), and transformations must include a “kinematic” component.

Fortunately, the PROJ library (https://proj.org/about.html) has generalized much of the complicated math for geodetic coordinate transformations. It’s been under development since the 1980s, and our understanding of the Earth’s shape and plate motions has changed dramatically in that time period (thanks to GNSS like GPS and other satellite observations). So, still pretty complicated, and there are different levels of support/sophistication in the tools/libraries that use PROJ (like GeoPandas).

Define a custom projection for Western U.S#

The UTM projection we used above is not the best choice for our data, which actually span 4 UTM zones: https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system#/media/File:Utm-zones-USA.svg. Each UTM zone is a north-south “slice” of the Earth’s surface covering 6° of longitude around a central meridian.

We used UTM Zone 11N, and while distortion should be limited around the -117°W central meridian, it will increase with distance beyond the -120° to -114°W zone boundaries.

Let’s instead use a custom Albers Equal Area (AEA) projection to minimize area distoration over the full spatial extent of our volcano points.

To do this, we’ll define a PROJ string (https://proj.org/usage/quickstart.html?highlight=definition), which can be interpreted by most Python geopackages (like pyproj).

This interactive tool might be useful to explore some options for a user-defined bounding box: https://projectionwizard.org/

The Albers Equal Area projection definition requires two standard parallels: https://proj.org/operations/projections/aea.html. Here, we will also specify the center latitude and center longitude for the coordinate system origin.

  • Define a custom Albers Equal Area proj string '+proj=aea...'

    • https://en.wikipedia.org/wiki/Albers_projection

    • PROJ reference, with example: https://proj.org/operations/projections/aea.html

  • Use the center longitude and latitude of the volcano points you calculated earlier to set the lon_0 and lat_0 parameters.

    • Note that the PROJ documentation doesn’t list the lat_0 parameter, but this will properly set the (0,0) origin of your new projection.

  • Define the two standard parallels (lines of latitude) based on the latitude range of the points

    • Map scale will be true along these parallels, with distortion increases as you move north or south

    • One simple approach would be to use min and max latitude from the total_bounds extent computed earlier

      • If this seems too easy, figure out how to place them slightly inside your min and max latitude to minimize distortion across the entire latitude range

        • if you attempt this, check out https://www.sciencedirect.com/science/article/pii/S009830041630053X (one of the “Existing Recommendations” is fine)

  • Use Python f string formatting to dynamically create your proj string (don’t just hardcode your values, but substitute variables in the string)

  • Print the final proj string

# STUDENT CODE HERE
# STUDENT CODE HERE
# STUDENT CODE HERE

Reproject the volcano points using your custom equal-area projection#

  • Store the output as a new GeoDataFrame and check that the geometry was updated.

  • Confidence check with a scatter plot

    • Origin should be near center of volcano points

# STUDENT CODE HERE

Create scatter plots for comparison#

  • 4 subplots (WGS84, UTM, custom AEA, and Web Mercator)

  • Might be useful to add gridlines with constant spacing (somewhere between ~200-500 km might work) to each plot for projected data using matplotlib grid()

from matplotlib.ticker import MultipleLocator

def add_grid(ax, tick_interval=500000):
    ax.xaxis.set_major_locator(MultipleLocator(tick_interval))
    ax.yaxis.set_major_locator(MultipleLocator(tick_interval))
    ax.grid()
# hint: sample starter code

# f, axa = plt.subplots(1,4, figsize=(15,3)) define your figure and axes!

# volcanoes_gdf.plot(ax=axa[0], column='volcanoes_z', cmap='inferno', markersize=1, legend=True) plot the gdf on the correct axis
# axa[0].set_title('WGS84') # set title
# axa[0].grid() # add grid

# volcanoes_gdf_utm........ plot the gdf on the correct axis
# set title
# add grid for the next three plots using add grid function in the cell above... example usage: add_grid(axa[1])

# and the rest :)
# STUDENT CODE HERE

img

These look similar, so why are we spending time on this?#

  • Note the location of the origin for each coordinate system

    • The (0,0) should be near the center of your points for the AEA projection

  • Note subtle distortion of points near the margins

  • Let’s dive into some analysis of distances, azimuths and areas to evaluate projection distortion

Part 4: Distance and azimuth distortion analysis (4 pts)#

  • We will now explore how these projections affect distances and angles

  • Let’s start by defining two points from our dataset separated by a large distance - we can use points corresponding to minimum and maximum longitude values. We will store these points in a new geodataframe volcanoes_gdf_mmlon

min_idx = volcanoes_gdf['longitude'].argmin()
max_idx = volcanoes_gdf['longitude'].argmax()
volcanoes_gdf_mmlon = volcanoes_gdf.iloc[[min_idx, max_idx]]
volcanoes_gdf_mmlon
name location latitude longitude elevation morphology status country geometry
31 Clear Lake California 38.97 -122.77 1439 Volcanic field Holocene United States POINT (-122.77000 38.97000)
29 Carrizozo New Mexico 33.78 -105.93 1731 Cinder cone Holocene United States POINT (-105.93000 33.78000)
volcanoes_gdf_mmlon.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

Create a quick plot to visualize#

f, ax = plt.subplots()
volcanoes_gdf.plot(ax=ax)
volcanoes_gdf_mmlon.plot(ax=ax, color='r');
../../_images/fa58ab545ebd76198e2c37f010402c73c972fad75367690c479a2b49c401b19e.png

Compute Geodetic distance and azimuth between these points#

  • This is our “truth” - the geodetic distance along the 3D surface of the ellipsoid

  • Turns out calculating this is not as straightforward as it may sound:

    • https://en.wikipedia.org/wiki/Geographical_distance

    • https://en.wikipedia.org/wiki/Geodesics_on_an_ellipsoid

    • This is one of the main reasons that we use 2D planar projections! Geometric calculations become much simpler in 2D compared to the curved 3D surface of an ellipsoid.

  • Fortunately, there are approximations and several mature tools/libraries with code to do this efficiently

  • We will use the pyproj functionality here

    • https://pyproj4.github.io/pyproj/stable/api/geod.html

from pyproj import Geod
#Define the ellipsoid to use for calculations
geod = Geod(ellps="WGS84")

#Compute geodetic distance between first and last point in a GeoDataFrame
#Return the distance and azimuth and back azimuth (degrees clockwise from north)
def geodetic_az_dist(gdf):
    #Extract the points
    p0 = gdf.iloc[0]
    p1 = gdf.iloc[-1]
    #Compute the geodesic azimuth, back azimuth and distance
    az, backaz, dist = geod.inv(p0['longitude'], p0['latitude'], p1['longitude'], p1['latitude'])
    #print('Distance: {:0.2f} km, Azimuth: {:0.2f}°, Back Azimuth: {:0.2f}°'.format(dist/1000., az, backaz))
    #return {'dist':dist, 'az':az, 'backaz':backaz}
    print('Distance: {:0.2f} km, Azimuth: {:0.2f}°'.format(dist/1000., az))
    return {'dist':dist, 'az':az}
geo_da = geodetic_az_dist(volcanoes_gdf_mmlon)
Distance: 1614.09 km, Azimuth: 105.70°

Compute Euclidian distance and azimuth between these points for different projections#

  • Use the Pythagorean theorem in a cartesian coordinate system

    • https://en.wikipedia.org/wiki/Euclidean_distance

  • I’ve prepared some sample code here

  • You will need to reproject the volcanoes_gdf_mmlon dataframe and pass to the euclidian_az_dist() function defined below

  • Please do this for:

    • UTM Zone 11N

    • Web Mercator

    • Your AEA projection

    • Challenge Question (GS required/UG +0.5): an azimuthal equidistant projection with projection center defined using the first point in the GeoDataFrame (you’ll need to create a new proj string here, see https://proj.org/operations/projections/aeqd.html for reference)

  • Note that the euclidian_az_dist function prints out values, but you may want to store the returned dictionary, and then use for later calculations

def euclidian_az_dist(gdf):
    unit = gdf.crs.axis_info[0].unit_name
    if unit != 'metre':
        print('Input CRS has units of {}, expected projected CRS'.format(unit))
    else:
        #Compute difference in x coordinates
        dx = gdf.iloc[-1].geometry.x - gdf.iloc[0].geometry.x
        #Compute difference in y coordinates
        dy = gdf.iloc[-1].geometry.y - gdf.iloc[0].geometry.y
        #Compute azimuth using x and y offsets
        az = np.degrees(np.arctan2(dx,dy))
        #Compute distance using GeoPandas
        dist = gdf.distance(gdf.iloc[-1].geometry).iloc[0]
        #Same as this:
        #dist = np.sqrt(dx**2 + dy**2)
        print('Distance: {:0.2f} km, Azimuth: {:0.2f}°'.format(dist/1000., az))
        return {'dist':dist, 'az':az}
#Note that GeoPandas is smart enough to raise exception when inputs are geographic (lat,lon)
euclidian_az_dist(volcanoes_gdf_mmlon)
Input CRS has units of degree, expected projected CRS
# STUDENT CODE HERE
# STUDENT CODE HERE
# STUDENT CODE HERE
# STUDENT CODE HERE

Analysis: which of these projections is closest to our true distance?#

  • Use the sample function to compute percent difference from the “true” geodetic distance values

#Compute the percent difference
#Input should be two dictionaries, each containing 'dist' and 'az' values for a given projection
def percdiff(d1, d2):
    dist_diff_perc = 100 * abs(d1['dist'] - d2['dist'])/d2['dist']
    az_diff_perc = 100 * abs(d1['az'] - d2['az'])/d2['az']
    print(f'Distance diff: {dist_diff_perc:.2f}%, Azimuth diff: {az_diff_perc:.2f}%')
#Compare web mercator values to the geodesic values
print("Web Mercator:", wm_da)
print("Geodetic (truth):", geo_da)
percdiff(wm_da, geo_da)
Web Mercator: {'dist': 2007444.577723981, 'az': 110.95940332739855}
Geodetic (truth): {'dist': 1614090.3484979372, 'az': 105.69597799717235}
Distance diff: 24.37%, Azimuth diff: 4.98%
# STUDENT CODE HERE
# STUDENT CODE HERE
# STUDENT CODE HERE

Written response: which of these projections is closest to our true distance?

STUDENT RESPONSE HERE

So what is going on here?#

  • You are seeing different types of distortion (distance, direction) for each projection, compared to the “true” geodetic values on the surface of the Ellipsoid.

  • In this case, the distance distortion for some projections is <1%, but it’s very possible that <1% will introduce unacceptable (and unnecessary) error for precise engineering applications. This distortion will increase as distances increase.

  • It’s important to pick a projection that is well-suited for your application. If you care about accurate representation of distances, you should use an equidistant projection. If you care about accurate representation of areas, use an equal-area projection.

  • Remember, there’s no “perfect” projection, so it’s on you to account for this properly. You would be surprised at how often errors due to projection decisions end up in published literature.

Part 5: Save the projected points to a GIS-ready file (1 pt)#

Often you want to preserve your awesome work and save a GeoDataFrame to use in some other software, or maybe to share with a colleague.

There are many potential options here for formats. For geospatial data, you often want a GIS-ready file format (like a shapefile). And in some cases, you just want a CSV with coordinates, which can be opened by most text readers (and many 3rd-party applications).

Try to create the following output files using Geopandas and the underlying Pandas functionality:

  • https://geopandas.org/en/stable/docs/user_guide/io.html#writing-spatial-data

  • https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-store-in-csv

Just explore the I/O functionality, don’t add/commit and push to your github repo with solutions. Just keep them on the hub for use in a future lab.

Export GeoDataFrame to CSV#

  • For convenience, let’s add new columns for the x and y coordinates in the utm coordinates

  • This should be pretty simple - no need for loops or a custom function here (hint: use the x and y attributes of the special geometry column). Can store the projected x and y coordinates as new columns in your GeoDataFrame)

  • Note: don’t need to do this, as the geometry column already stores this information, but sometimes you want to preserve projected coordinates (alongside lat/lon) if you’re going to export as a csv and analyze with other tools (e.g., AutoCAD)

    • Warning: no information about projection is preserved in a standard CSV, just the coordinate values. If you choose to store things as a CSV, it’s on you to document projection in metadata before distributing

    • A better option is to use a GIS-ready file format (below) that embeds the CRS metadata along with the coordinates

def gdf_add_xy(gdf):
    gdf['proj_x'] = gdf['geometry'].x
    gdf['proj_y'] = gdf['geometry'].y
gdf_add_xy(volcanoes_gdf_utm)
volcanoes_gdf_utm.head()
name location latitude longitude elevation morphology status country geometry proj_x proj_y
0 Crater Lake Oregon 42.930 -122.120 2487 Caldera Radiocarbon United States POINT (82163.231 4765774.760) 82163.230799 4.765775e+06
1 Craters of the Moon Idaho 43.420 -113.500 2005 Cinder cone Radiocarbon United States POINT (783338.410 4813408.933) 783338.409673 4.813409e+06
2 Davis Lake Oregon 43.570 -121.820 2163 Volcanic field Radiocarbon United States POINT (110757.528 4835413.815) 110757.528144 4.835414e+06
3 Devils Garden Oregon 43.512 -120.861 1698 Volcanic field Holocene United States POINT (187909.330 4824919.698) 187909.330168 4.824920e+06
4 Diamond Craters Oregon 43.100 -118.750 1435 Volcanic field Holocene United States POINT (357590.415 4773405.966) 357590.415274 4.773406e+06
out_fn='./data/conus_holocene_volcanoes_utm.csv'
volcanoes_gdf_utm.to_csv(out_fn)

Open the csv from Jupyter lab file browser to verify output looks good, with new columns#

Save GIS-ready files to disk#

  • Use fiona to get a list of available file type drivers for output

  • Note: the ‘r’ means fiona/geopandas can read this file type, ‘w’ means it can write this file type, ‘a’ means it can append to an existing file.

    • https://fiona.readthedocs.io/en/latest/manual.html#writing-vector-data

fiona.supported_drivers

How to choose a file format?#

  • I’m guessing that most of you have used ESRI shapefiles in the past.

  • Please stop 🛑. This is a legacy format, though it is still widely used.

    • http://switchfromshapefile.org/

  • Better options these days are Geopackage (GPKG) when spatial index is required, and simple GeoJSON for other cases

    • Both should be supported by any respectable GIS (including QGIS, ArcGIS, etc)

    • Note that GeoJSON is typically only used for geographic coordinates (EPSG:4326), not projected coordinates

    • https://feed.terramonitor.com/shapefile-vs-geopackage-vs-geojson/

  • Now that you’ve made an informed decision for an output format, let’s use the Geopandas to_file() method to create this file

    • Let’s export to Geopackage for this exercise

    • Make sure you properly specify filename with extension and the driver option

    • Note: Writing out may take a minute, and may produce an intermediate ‘.gpkg-journal’ file

# STUDENT CODE HERE
# STUDENT CODE HERE

You can now directly load this gpkg file in any GIS, without defining a coordinate system or dealing with your original csv. You can also load this file directly into geopandas in the future using the read_file() method, without having to repeat the processing above. We’ll do this for lab exercises in a few weeks.

Optional: Download and open in QGIS or ArcGIS#

Try it! Right-click on file in the file browser on the left side of the JupyterLab interface, then select Download and pick a location on your local computer (e.g., your Downloads folder).

Then open this file in QGIS or ArcGIS on your local machine!

Check out your geopackage using gdal: gdalsrsinfo#

We covered basic shell and command line usage during in Week01. I included this because there are many powerful command-line utilities out there that can batch data processing and quick geospatial data inspection. The GDAL/OGR command-line utilities are worth exploring - we will discuss more during raster module. Some of these have equivalent Python functions, while others do not.

One of these utilties, gdalsrsinfo, is great for quickly inspecting dataset CRS, converting between different CRS formats (WKT, proj strings, EPSG codes, etc), and exploring transformations between different CRS.

Try it!#

Open a new terminal, navigate to the directory where you just created the output file. Alternatively, you can run these shell commands from a notebook cell. Run the following (substituting your filename):

  • gdalsrsinfo EPSG:32611

  • gdalsrsinfo yourfilename.gpkg

By default, this will export a proj string and WKT for the crs. You can also output different formats (e.g., EPSG code). See gdalsrsinfo -h for usage. Try the -o all option.

# STUDENT CODE HERE
# STUDENT CODE HERE
# STUDENT CODE HERE

Part 6: Area Distortion (4pts)#

Get polygons for US States#

Hmmm, let’s see. Two choices:

  1. We could go to ESRI or the U.S. Census website, identify and download a shapefile, unzip 4+ files, copy/paste the appropriate *.shp filename into the notebook. Wait, how can I download on a remote server? OK, maybe run something like wget http://..., unzip, provide absolute path
    - OR -

  2. Give geopandas a url string that points to a GeoJSON file somewhere on the web, and read dynamically

#2 is easier :)

Let’s use the US States 5M GeoJSON here: http://eric.clst.org/tech/usgeojson/

  • We’ve talked about JSON as the text format used by Jupyter notebooks. GeoJSON extends this format to include geospatial information. It’s pretty great. If you are unfamiliar, take a moment to read about GeoJSON: https://en.wikipedia.org/wiki/GeoJSON

Quick inspection#

Take a look at the 5M GeoJSON file (digitized to preserve details at a scale of 1:5000000) contents in your browser or download and open with a text editor.

Note organization structure. Think about how this compares to, say, a Python dictionary object.

Read directly with GeoPandas#

Read the file using GeoPandas by passing the url to gpd.read_file().

#states_url = 'http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_500k.json'
states_url = 'http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_5m.json'
#states_url = 'http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_20m.json'
# STUDENT CODE HERE

Inspect the GeoDataFrame#

  • Note the columns and geometry type(s) in the geometry columns

  • Do a quick plot() to preview

states_gdf.plot();
../../_images/9ab1f419cdd7688cec4bd16466a319085a6ccd55889f76bf66d59aebcb03b5ee.png

Note: The extent of the Aleutian islands (Alaska) crosses the -180/+180°W antimeridian, which is why the extent spans the full longitude range.

Check the CRS#

  • Note that this was automatically defined during the read_file() step. Thanks GeoPandas!

# STUDENT CODE HERE

Compute the area (in $km^2$) for all states#

  • Our GeoDataFrame of states is still in geographic coordinates (decimal degrees)

    • We could compute area in decimal degrees, but remember our analysis above about how the length of a degree of longitude varies? This will also affect area calculations.

    • We want polygon area values in km2

    • What might you need to do to the GeoDataFrame before you proceed…

Answer: reproject the polygons!#

  • Let’s start by reprojecting the state polygons using the same UTM 11N projection we explored earlier

  • Store output as a new GeoDataFrame with unique name (e.g., states_gdf_utm)

  • Do a quick head() on the output to verify a change in geometry coordinates

  • Do a quick plot(), noting the new coordinate values.

# STUDENT CODE HERE
# STUDENT CODE HERE

img

Compute the area of the polygons in the projected GeoDataFrame#

  • Compute the area for each state in km^2

    • Don’t use a loop - this should be straightforward to do using geopandas: https://geopandas.org/en/latest/docs/reference/api/geopandas.GeoSeries.area.html

  • Compare your values with the values in the CENSUSAREA column

    • Calcuate an area difference between your values and the CENSUSAREA values

    • Note: CENSUSAREA values are in square miles (sigh), so you’ll need to convert to square km

    • Convert the area difference to a percent difference

#Sample function
def gdf_area_analysis(gdf):
    #Convert area from GeoPandas (m^2) to km^2
    gdf['myarea'] = gdf.area/1E6
    #Convert area from source data (mi^2) to km^2
    gdf['CENSUSAREA_km2'] = gdf['CENSUSAREA']*2.58999
    #Compute difference and store in new column
    gdf['area_diff'] = gdf['myarea'] - gdf['CENSUSAREA_km2']
    #Compute percent difference
    gdf['area_diff_perc'] = 100 * gdf['area_diff']/gdf['CENSUSAREA_km2']
    #Return a new dataframe with index sorted by percent difference
    return gdf.sort_values(by='area_diff_perc', ascending=False)
# STUDENT CODE HERE

Written response:

  • Do your area percent differences agree? If not, any guess about why not?

  • What do you notice about the states with largest difference - how far are they from the UTM zone center longitude?

  • Hint: try sorting your GeoDataFrame by the percent difference column

STUDENT RESPONSE HERE

Repeat for your Albers Equal Area projection#

# STUDENT CODE HERE
# STUDENT CODE HERE

img

Repeat for Web Mercator#

# STUDENT CODE HERE

For each projection, create a choropleth plot showing the percent area difference for all states#

  • See https://geopandas.org/en/stable/docs/user_guide/mapping.html#choropleth-maps

  • Specify common vmin and vmax values for intercomparison (e.g., 0 and 100%)

  • Note: You may need to ax.set_xlim(right=-5000000) for the web mercator plot, as the Aleutian Islands west of mainland Alaska actually cross the antimeridian (-180° longitude) and plot on the other side of the map

# STUDENT CODE HERE

img

Written response#

  • Consider how the area distortion differences vary for the different projections, and answer the following in the cell below:

    • Are they spatially uniform?

    • How do they vary with distance from the projection center or latitude/longitude of true scale?

  • Hopefully this helps to reinforce what touched on earlier with distances and the area - choice of projection is important depending on the intended analysis. If you intend to do area analysis, you want to use an equal-area projection

STUDENT RESPONSE HERE

Note from David…

If you look closely, there are some residual area differences between CENSUSAREA and our equal-area polygons. This is likely related to coastlines, inland water bodies, mapping scale, and how the CENSUSAREA is computed. Using the 20M or 500K polygons doesn’t make much of a difference.

  • This page says not to use the Census shapefiles for area analysis: https://www.census.gov/programs-surveys/geography/technical-documentation/naming-convention/cartographic-boundary-file.html.

  • This document outlines the methodology for Census area calculations, and describes issues for different bodies of water: https://www2.census.gov/geo/pdfs/reference/GARM/Ch15GARM.pdf.

  • Makes sense for states like Louisiana, Florida and Maryland, but need to look at this further for other states with complex coastlines.

  • 🤷 They’re close enough to illustrate advantages of an equal-area projection for now.

img

Part 7: Combine Points and Polygons (2 pts)#

Create a combined plot of state outlines and volcano points#

You already reprojected both the volcano points GeoDataFrame and the states GeoDataFrame to the same UTM projected coordinate system with units of meters. Let’s add both to the same plot, so we have better context for our points!

See documentation here: https://geopandas.org/en/latest/docs/user_guide/mapping.html#maps-with-layers

  • Use the matplotlib object-oriented interface to plot on the same axes:

    • Can start with f, myax = plt.subplots() to create a matplotlib.axes object, then pass myax to the states GeoDataFrame plot() call: states_gdf_utm.plot(ax=myax,...).

      • I recommend using facecolor='white' and edgecolor='black'

    • Alternatively, remember that the GeoDataFrame plot() function returns a matplotlib.axes object by default, which you can store as a new variable named myax

      • myax = states_gdf_utm.plot(...)

      • Note that you no longer see <matplotlib.axes._subplots.AxesSubplot at 0x7f0da85f5a58> output in the notebook

    • Now, in the same cell, plot the reprojected volcano point GeoDataFrame on the same axes, passing myax to the ax argument of the volcano_gdf_utm.plot() call

    • You should see your points drawn over the state polygons!

      • Make sure you get the plotting order correct, or appropriately set plotting arguments for transparency

    • Note that you can continue to update/modify the axes (e.g., add a title) by modifying your myax object in the same cell

# STUDENT CODE HERE

img

OK, that looks good, but how do we limit the map extent to the Western US?#

Get the total bounding box (or extent) of the reprojected volcano points#

  • Hint: you did this earlier for the original lat/lon GeoDataFrame, should be easy to repeat for your projected GeoDataFrame.

# STUDENT CODE HERE

Set the x and y axes limits to your projected bounds to update your plot#

  • See set_xlim and set_ylim

  • You should see your points with state borders overlaid for context

# STUDENT CODE HERE

img

Challenge Question: create a function to pad the bounds by user-specified distance (GS required/UG +0.5)#

  • Assume the user knows the units of their projection, and specifies this distance appropriately (e.g., 30000 meters, not 30000 degrees)

  • Play with padding distances and replot for a visually pleasing extent (i.e., don’t chop off California)

# STUDENT CODE HERE
# STUDENT CODE HERE
# STUDENT CODE HERE

img

Summary#

OK, we covered a lot of ground with this one. These concepts are fundemental to all aspects of geospatial data analysis, and we’ll see them throughout the rest of the quarter. So if some aspects are still a bit fuzzy, know that there will be more opportunities to revisit and discuss.

  • We covered basic operations, plotting, Geopandas GeoDataFrames and associated methods/attributes - these are great for most geospatial vector data and analysis.

    • Caveat: the standard GeoPandas tools are not ideally suited for workflows involving billions of points - these cases will require optimization (e.g., https://github.com/geopandas/dask-geopandas) and potentially a different suite of tools.

  • Hopefully you got a sense of the most common approaches to define projections (EPSG codes, proj strings, WKT), and now understand some of the tradeoffs between different projections for different objectives involving visualization or quantitative analysis.

    • For most applications, best to avoid using a geographic projection (lat/lon) for analysis (or visualization) due to scaling issues

    • In general, for local studies or smaller areas (<500x500 km), you can use the appropriate UTM zone, which has acceptable distortion in terms of distance, angles and areas

    • For larger areas, probably want to define a custom projection (remember https://projectionwizard.org/) or use a standard regional projection designed for your intended purpose

    • Be careful with Web Mercator - in the coming weeks we’ll be using public raster tile datasets prepared with this projection, so try to remember some of the issues highlighted here

  • When plotting and doing anlaysis, all of your input datasets must share the same coordinate system, which may require reprojecting one (or more) datasets to match. You can then plot them on the same axes. We will do more of this in coming weeks to combine raster and vector data.

Submission#

  • Save the completed notebook (make sure to fully run the notebook and check all cell output is visible)

  • Use the git add; git commit -m 'message'; git push workflow to push your work to the remote repository

    • ideally you’ve been using add / commit / push as you make progress on this notebook

  • Check the remote repository to check all of the files you want to submit have been pushed

  • Scroll through your jupyter notebook on your remote repository and make sure all output and plots are visible

  • When you have completed your last push, submit the url pointing to your Github repository to the corresponding Canvas assignment