cprep Conversion to Zarr
Checklist for Workflow associated with dataset conversion:
Dataset Name: South Central Climate Projections Evaluation Project (C-PrEP)
https://cida.usgs.gov/thredds/cprep_catalog.html https://cida.usgs.gov/thredds/catalog/demo/thredds/cprep/catalog.html
Large collection of individual NetCDF files that are not aggregated.
Fairly simple dimensionality.
Dataset {
Int32 i_offset;
Int32 j_offset;
Float64 lat[lat = 140];
Float64 lat_bnds[lat = 140][bnds = 2];
Float64 lon[lon = 190];
Float64 lon_bnds[lon = 190][bnds = 2];
Grid {
ARRAY:
Float32 pr[time = 34310][lat = 140][lon = 190];
MAPS:
Float64 time[time = 34310];
Float64 lat[lat = 140];
Float64 lon[lon = 190];
} pr;
Float64 time[time = 34310];
Float64 time_bnds[time = 34310][bnds = 2];
} demo/thredds/cprep/pr_day_I35prp1-QDM-A28D01K00_rcp85_r1i1p1_I35Land_20060101-20991231.nc;
Recommend conversion from raw NetCDF files to ZARR as a 1:1 nc -> ZARR store process. No aggregation needed.
-
Identify Source Data location and access (check the dataset spreadsheet) -
-
Collect ownership information (Who do we ask questions of if we have problems?) -
-
Create new workflow notebook from template; stash in the ./workflows
folder tree in an appropriate spot.-
Identify landing spot on S3 (currently somewhere in: https://s3.console.aws.amazon.com/s3/buckets/nhgf-development?prefix=workspace/®ion=us-west-2) -
Calculate chunking, layout, compression, etc -
Run notebook -
Read test (pattern to be determined by the dataset)
-
-
Create STAC catalog entry; -
Verify all metadata -
Create entry
-
-
Reportage -
add notebook and the dask performance report to the repo -
Calculate summary statistics on output (compression ratio, total size) -
Save STAC JSON snippet to repo
-
-
Merge and close the issue.
Edited by Nathan Pasley