Skip to content

Data acquisition

King, Jonathan M requested to merge ghsc/users/jking/pfdf:download into main

Summary

This MR provides most of the content for the planned 3.0 release. The majority of the MR focuses on providing routines to acquire datasets over the internet. It also adds support for 4D broadcasting to the S17 and G14 models, and also provides fixes for the recently broken pysheds dependency library.

Data subpackage

The data subpackage is the focus of this MR and provides access to various datasets over the internet. The subpackage is structured around data providers, as each provider has its own API/data distribution approach. The key folders are:

  • landfire: LANDFIRE products, most notably EVT
  • noaa: Access to NOAA Atlas 14
  • retainments: Debris retainment feature locations
  • usgs: Various USGS datasets
    • usgs.tnm: USGS datasets from the National Map
      • usgs.tnm.nhd: National Hydrologic Dataset, often referred to as HUCs
      • usgs.tnm.dem: Digital elvation models
    • usgs.statsgo: Reserved for the STATSGO dataset, pending the COG data release

(Note that the retainments folder is an exception to the general structure around data providers. This is because I expect retainment datasets to be small local/regional providers, which would be cumbersome in the base namespace. That said, the contents of the folder are still structured around data providers).

Most users will interact with these modules using a download and/or read function. The distinction being that download saves files to the local filesystem, whereas read loads a dataset directly into memory as a Raster object. These commands are in turn implemented by functions that call the associated data APIs. When possible, I've exposed these API functions, so advanced users can interact with the APIs directly if needed.

Suggested review order

(This is just for the data subpackage - the rest of the MR is probably fine in any order).

  1. Start with the _utils folder. Specifically, see the requests file to see how the package acquires data from the internet.
  2. retainments.la_county: Very basic module that just downloads a data folder.
  3. noaa.atlas14: Uses a simple API to download a file given a small number of parameters
  4. landfire: A more complex API that warrants the use of specific API functions
  • Start with api - shows the commands used to interact with the API
  • Then see _landfire - shows how the API calls are used to acquire data
  1. Finally usgs.tnm: A complex API used to acquire multiple datasets
  • Start with api - shows functions used to acquire info about TNM products
  • Then see nhd - Uses the API functions to download a data folder
  • Finally dem - Uses the API to locate (potentially many) DEM tiles and merge them together

Key changed files

(Files that were changed but not listed here are mostly just reorganization of the backend)

Pysheds fixes

  • pfdf._utils.patches: Context manager classes that patch the broken pysheds code. Note that the long patch functions are not under review, as they're inherited from pysheds itself
  • pfdf.watershed: Uses the context managers to patch calls to pysheds

Model Broadcasting

  • pfdf.models.s17: Reordered the dimensions of output arrays. Rewrote the docstrings to explain these dimensions.
  • pfdf.models.g14: Output arrays can now be 4D. Rewrote the docstrings to explain these dimensions.

New Raster features

  • raster._metadata.from_url: New function to determines raster metadata over an internet connection. Nearly identical to from_file
  • raster._raster.from_url: New function to load a raster over an internet connection. Nearly identical to from_file
  • raster._raster.fill: Added option to disable copying
  • raster._raster.set_range: Added option to disable copying

Data Acquisition

  • pfdf.data: This subpackage is new

Closes #194 #200 #209 #206 #213 #204 #199 #198

Edited by King, Jonathan M

Merge request reports

Loading