Skip to content

cprep Conversion to Zarr

Checklist for Workflow associated with dataset conversion:

Dataset Name: South Central Climate Projections Evaluation Project (C-PrEP)

https://cida.usgs.gov/thredds/cprep_catalog.html https://cida.usgs.gov/thredds/catalog/demo/thredds/cprep/catalog.html

Large collection of individual NetCDF files that are not aggregated.

Fairly simple dimensionality.

Dataset {
    Int32 i_offset;
    Int32 j_offset;
    Float64 lat[lat = 140];
    Float64 lat_bnds[lat = 140][bnds = 2];
    Float64 lon[lon = 190];
    Float64 lon_bnds[lon = 190][bnds = 2];
    Grid {
     ARRAY:
        Float32 pr[time = 34310][lat = 140][lon = 190];
     MAPS:
        Float64 time[time = 34310];
        Float64 lat[lat = 140];
        Float64 lon[lon = 190];
    } pr;
    Float64 time[time = 34310];
    Float64 time_bnds[time = 34310][bnds = 2];
} demo/thredds/cprep/pr_day_I35prp1-QDM-A28D01K00_rcp85_r1i1p1_I35Land_20060101-20991231.nc;

Recommend conversion from raw NetCDF files to ZARR as a 1:1 nc -> ZARR store process. No aggregation needed.


  • Identify Source Data location and access (check the dataset spreadsheet)
  • Collect ownership information (Who do we ask questions of if we have problems?)
  • Create new workflow notebook from template; stash in the ./workflows folder tree in an appropriate spot.
  • Create STAC catalog entry;
    • Verify all metadata
    • Create entry
  • Reportage
    • add notebook and the dask performance report to the repo
    • Calculate summary statistics on output (compression ratio, total size)
    • Save STAC JSON snippet to repo
  • Merge and close the issue.
Edited by Nathan Pasley