TTU_2019 Conversion to Zarr
Checklist for Workflow associated with dataset conversion:
Dataset Name: High-Resolution Precipitation Projections for the South Central U.S.
https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/TTU_2019/rcp45_gridded_annual_data https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/TTU_2019/rcp85_gridded_annual_data https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/TTU_2019/rcp45_time_slices https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/TTU_2019/rcp85_time_slices https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/TTU_2019/rcp45_station_data https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/TTU_2019/rcp85_station_data
https://cida.usgs.gov/thredds/catalog/demo/thredds/TTU_2019/catalog.html
Three spatio-temporal axes --
slices:
Float64 lat[lat = 198];
Float64 lon[lon = 337];
Float64 time[time = 5];
Gridded annual:
Float64 lat[lat = 198];
Float64 lon[lon = 337];
Float64 time[time = 151];
Stations:
String station_ids[stations = 8701];
Float64 time[time = 151];
Aggregations are semi complex -- suggest trying to subset OPeNDAP endpoints directly.
-
Identify Source Data location and access (check the dataset spreadsheet) -
-
Collect ownership information (Who do we ask questions of if we have problems?) -
-
Create new workflow notebook from template; stash in the ./workflows
folder tree in an appropriate spot.-
Identify landing spot on S3 (currently somewhere in: https://s3.console.aws.amazon.com/s3/buckets/nhgf-development?prefix=workspace/®ion=us-west-2) -
Calculate chunking, layout, compression, etc -
Run notebook -
Read test (pattern to be determined by the dataset)
-
-
Create STAC catalog entry; -
Verify all metadata -
Create entry
-
-
Reportage -
add notebook and the dask performance report to the repo -
Calculate summary statistics on output (compression ratio, total size) -
Save STAC JSON snippet to repo
-
-
Merge and close the issue.