Data acquisition
Summary
This MR provides most of the content for the planned 3.0 release. The majority of the MR focuses on providing routines to acquire datasets over the internet. It also adds support for 4D broadcasting to the S17 and G14 models, and also provides fixes for the recently broken pysheds dependency library.
Data subpackage
The data
subpackage is the focus of this MR and provides access to various datasets over the internet. The subpackage is structured around data providers, as each provider has its own API/data distribution approach. The key folders are:
-
landfire
: LANDFIRE products, most notably EVT -
noaa
: Access to NOAA Atlas 14 -
retainments
: Debris retainment feature locations -
usgs
: Various USGS datasets-
usgs.tnm
: USGS datasets from the National Map-
usgs.tnm.nhd
: National Hydrologic Dataset, often referred to as HUCs -
usgs.tnm.dem
: Digital elvation models
-
-
usgs.statsgo
: Reserved for the STATSGO dataset, pending the COG data release
-
(Note that the retainments
folder is an exception to the general structure around data providers. This is because I expect retainment datasets to be small local/regional providers, which would be cumbersome in the base namespace. That said, the contents of the folder are still structured around data providers).
Most users will interact with these modules using a download
and/or read
function. The distinction being that download
saves files to the local filesystem, whereas read
loads a dataset directly into memory as a Raster
object. These commands are in turn implemented by functions that call the associated data APIs. When possible, I've exposed these API functions, so advanced users can interact with the APIs directly if needed.
Suggested review order
(This is just for the data
subpackage - the rest of the MR is probably fine in any order).
- Start with the
_utils
folder. Specifically, see therequests
file to see how the package acquires data from the internet. -
retainments.la_county
: Very basic module that just downloads a data folder. -
noaa.atlas14
: Uses a simple API to download a file given a small number of parameters -
landfire
: A more complex API that warrants the use of specific API functions
- Start with
api
- shows the commands used to interact with the API - Then see
_landfire
- shows how the API calls are used to acquire data
- Finally
usgs.tnm
: A complex API used to acquire multiple datasets
- Start with
api
- shows functions used to acquire info about TNM products - Then see
nhd
- Uses the API functions to download a data folder - Finally
dem
- Uses the API to locate (potentially many) DEM tiles and merge them together
Key changed files
(Files that were changed but not listed here are mostly just reorganization of the backend)
Pysheds fixes
-
pfdf._utils.patches
: Context manager classes that patch the broken pysheds code. Note that the longpatch
functions are not under review, as they're inherited from pysheds itself -
pfdf.watershed
: Uses the context managers to patch calls to pysheds
Model Broadcasting
-
pfdf.models.s17
: Reordered the dimensions of output arrays. Rewrote the docstrings to explain these dimensions. -
pfdf.models.g14
: Output arrays can now be 4D. Rewrote the docstrings to explain these dimensions.
New Raster features
-
raster._metadata.from_url
: New function to determines raster metadata over an internet connection. Nearly identical tofrom_file
-
raster._raster.from_url
: New function to load a raster over an internet connection. Nearly identical tofrom_file
-
raster._raster.fill
: Added option to disable copying -
raster._raster.set_range
: Added option to disable copying
Data Acquisition
-
pfdf.data
: This subpackage is new