Geographic Maps

In this reading, we'll learn how to use geopandas to manipulate and plot geographic data. geopandas is based on regular pandas, so we'll be using GeoDataFrames and GeoSeries, which are types that inherit from the regular ones we have been using.

Install

Before installing anything, make sure you are on version 19.0.0 or later of pip:

If not, pip can upgrade itself with this:

pip3 install --upgrade pip

You can then install geopandas and some other packages it uses like this:

pip3 install geopandas shapely descartes geopy

Finally, you'll need to install rtree with apt to enable some features we'll learn:

sudo apt install -y python3-rtree

Shapefiles

Shapefiles contain geographic infomation, such as coordinates of points, or regional boundaries. There are many places you can find shapefiles online, such as here for census data (https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html) and here for Madison data (https://data-cityofmadison.opendata.arcgis.com/search?tags=boundaries).

From the latter, we'll search for "Lakes and Rivers", then click "Download", then right-click "Shapefile" and "Copy Link Address".

wget https://opendata.arcgis.com/datasets/c46082b091a941f8b2ded1dd115a1a05_8.zip -O lakes.zip

That saves the shapefile as "lakes.zip". Let's peek inside using unzip -l:

Shapefiles are really a collection of files, including a .shp. Sometimes this collection of files is stored in a directory and sometimes in a zip.

geopandas.read_file knows how to read shapefiles. You can pass either a directory name, or a zip name; if the latter, prefix it with "zip://":

We can see that we get back a GeoDataFrame, which (based on __mro__) is a descendent of the regular pandas DataFrame (meaning we can everything we can with regular DataFrames, and then some):

GeoDataFrames contain a special GeoSeries column:

Which in turn contains various shape objects (mostly polygons) from the shapely module we installed earlier:

What is the index of the biggest shape?

What shape is that?

Lake Mendota!

We can plot them all at once:

In the map, the x-axis is longitude and the y-axis is latitude; these are degrees of an angle (out of 360). This is the system used for GPS (https://en.wikipedia.org/wiki/Geographic_coordinate_system), but there are also other coordinate reference systems (CRS) available.

We can view the current .crs being used as well as try another one (many different coordinate systems are assigned an EPSG code: https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset):

Notice that for df.crs, we are using a coordinate system based on degrees:

- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)

Whereas for dfs2.crs, we are using one based on meters:

- E[east]: Easting (metre)
- N[north]: Northing (metre)

One advantage of using a system based on meters instead of angles is that sizes and lengths are meaningful.

Wikipedia says it is 39.4 sq km (https://en.wikipedia.org/wiki/Lake_Mendota), so the latter is about right.

The disadvantage of a meter-based coordinate system is that we're effectively distorting earth to be flat. This doesn't create too much error for a small region, but it doesn't work for the whole planet, so the meter-based CRS are only for specific regions.

There are a collection of epsg:326?? and epsg:327?? CRS for the northern and southern hemospheres, respectively. The ?? represents one of 60 zones in the UTM system (https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system).

Madison, WI is zone 16 of the northern hemosphere (see map here: https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system#/media/File:Utm-zones-USA.svg), which is why we use epsg:32616.

If we plot using this CRS, we get x and y axes in meters, with the origin ("0,0" point) somewhere down and to the left of where we plotted the map.

Plotting an Address

Let's plot a point for Bascom Hall on the map. Where is it located? Geocoding is the process of converting an address or name of a landmark to coordinates. geopandas uses the geopy package we installed earlier for geocoding. geopy can use different online services to do geocoding:

These services vary in terms of accuracy, cost, and rate limits. Most want us to specify a user_agent so that the sites know a bit about who is using them. Let's try looking up Bascom Hall with the default:

Oops, we want the Bascom Hall in Madison, WI, not the one in Florida.

Still not great. Let's use the "nominatim" provider (which uses OpenStreetMaps).

Great, let's plot that on top of our lakes.

Not good! The problem is that the two GeoDataFrames are using different coordinate systems.

Let's convert bascom to the one df2 is using.

What if we want a particular radius around a point, say 1km? Calling .buffer on a point (or a GeoSeries containing a point) creates a circle with a specified radius. Actually, there is no circle type in shapely, so you just get a very roundish polygon or line with many sides.

We can add .boundary to the end of the above to get just a line around the circumference.

Note that the order of (1) converting to a metric coordinate system and then (2) creating a buffer of a given radius is important. If we try it in the other order, geopandas will complain with good reason: 1 degree of latitude is generally not the same distance as 1 degree of longitude, so we'll get a weird oval.

Going back to the correct example, we can pass label="..." and call ax.legend() if we want a legend for our annotations.

Geopandas lets use create intersections and unions of GeoDataFrames with geopandas.overlay(????, ????, how='intersection') and geopandas.overlay(????, ????, how='union'). If we put our circle around Bascom Hall in a GeoDataFrame with the same CRS as our lakes, we can identify/quantify all water within a 3km radius.

Just asking for the area tells us the individual area of each piece:

We could sum those parts, or get the unary_union, which converts it all to a single shape a (MultiPolygon).

Conclusion

geopandas gives us DataFrames where each row is a shape. We can use these like regular DataFrames, but we can also directly plot geographic maps. If we want a world map, we may want to use latitude/longitude, but converting to a meter-based coordinate system makes it easier to compute distances, areas, and radiuses around a point. GeoPandas makes it easy to intersect and union various shapes.