PuertoRico Conversion to Zarr

Checklist for Workflow associated with dataset conversion:

Dataset Name: Weather Research and Forecasting (WRF): Puerto Rico & US Virgin Islands Dynamical Downscaled Climate Change Projections

https://cida.usgs.gov/thredds/catalog/demo/thredds/PuertoRico/catalog.html

Nine OPeNDAP endpoints like: https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/PuertoRico/pr_hourly_CESM_d03

Many variables in each aggregation:

    Float32 XLAT[south_north = 110][west_east = 265];
    Float32 XLONG[south_north = 110][west_east = 265];
    Int32 Time[Time = 367920];

Suggest conversion to ZARR from OPeNDAP if at all possible.


  • Identify Source Data location and access (check the dataset spreadsheet)
  • Collect ownership information (Who do we ask questions of if we have problems?)
  • Create new workflow notebook from template; stash in the ./workflows folder tree in an appropriate spot.
    • Identify landing spot on S3 (currently somewhere in: https://s3.console.aws.amazon.com/s3/buckets/nhgf-development?prefix=workspace/&region=us-west-2)
    • Calculate chunking, layout, compression, etc
    • Run notebook
    • Read test (pattern to be determined by the dataset)
  • Create STAC catalog entry;
    • Verify all metadata
    • Create entry
  • Reportage
    • add notebook and the dask performance report to the repo
    • Calculate summary statistics on output (compression ratio, total size)
    • Save STAC JSON snippet to repo
  • Merge and close the issue.
Edited Apr 05, 2024 by Blodgett, David L.
Assignee Loading
Time tracking Loading