PuertoRico Conversion to Zarr
Checklist for Workflow associated with dataset conversion:
Dataset Name: Weather Research and Forecasting (WRF): Puerto Rico & US Virgin Islands Dynamical Downscaled Climate Change Projections
https://cida.usgs.gov/thredds/catalog/demo/thredds/PuertoRico/catalog.html
Nine OPeNDAP endpoints like: https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/PuertoRico/pr_hourly_CESM_d03
Many variables in each aggregation:
Float32 XLAT[south_north = 110][west_east = 265];
Float32 XLONG[south_north = 110][west_east = 265];
Int32 Time[Time = 367920];
Suggest conversion to ZARR from OPeNDAP if at all possible.
-
Identify Source Data location and access (check the dataset spreadsheet)
-
-
Collect ownership information (Who do we ask questions of if we have problems?)
-
-
Create new workflow notebook from template; stash in the
./workflows
folder tree in an appropriate spot.- Identify landing spot on S3 (currently somewhere in: https://s3.console.aws.amazon.com/s3/buckets/nhgf-development?prefix=workspace/®ion=us-west-2)
- Calculate chunking, layout, compression, etc
- Run notebook
- Read test (pattern to be determined by the dataset)
-
Create STAC catalog entry;
- Verify all metadata
- Create entry
-
Reportage
- add notebook and the dask performance report to the repo
- Calculate summary statistics on output (compression ratio, total size)
- Save STAC JSON snippet to repo
- Merge and close the issue.