WUS_HSP Conversion to Zarr
Checklist for Workflow associated with dataset conversion:
Dataset Name: Western US Hydroclimate Scenarios Project
https://cida.usgs.gov/thredds/catalog/demo/thredds/WUS_HSP/catalog.html
Six OPeNDAP endpoints:
Folder Western US Hydroclimate Scenarios Project | -- | |
---|---|---|
Daily Observationally-Based Historical Data | -- | |
Statistically Downscaled Projections 2040s | -- | |
Statistically Downscaled Projections 2080s | -- | |
Dynamically Downscaled Data, ECHAM5 1970-1999 | -- | |
Dynamically Downscaled Projections, ECHAM5 2020s | -- | |
Dynamically Downscaled Projections, ECHAM5 2050s |
Each is generated from a massive NetCDF file that was created from an aggregated OPeNDAP endpoint. See: https://code.usgs.gov/wma/nhgf/geo-data-portal/thredds-config/-/tree/master/mnt_thredds/WUS_HSP?ref_type=heads for records of processing.
These large NetCDF files should be converted to ZARR directly most likely. Variables will need to be renamed as is done in joins like https://code.usgs.gov/wma/nhgf/geo-data-portal/thredds-config/-/blob/master/mnt_thredds/WUS_HSP/SD_A1B_2040s.ncml?ref_type=heads
-
Identify Source Data location and access (check the dataset spreadsheet) -
-
Collect ownership information (Who do we ask questions of if we have problems?) -
-
Create new workflow notebook from template; stash in the ./workflows
folder tree in an appropriate spot.-
Identify landing spot on S3 (currently somewhere in: https://s3.console.aws.amazon.com/s3/buckets/nhgf-development?prefix=workspace/®ion=us-west-2) -
Calculate chunking, layout, compression, etc -
Run notebook -
Read test (pattern to be determined by the dataset)
-
-
Create STAC catalog entry; -
Verify all metadata -
Create entry
-
-
Reportage -
add notebook and the dask performance report to the repo -
Calculate summary statistics on output (compression ratio, total size) -
Save STAC JSON snippet to repo
-
-
Merge and close the issue.
Edited by Blodgett, David L.