hawaii_2018 Conversion to Zarr
Checklist for Workflow associated with dataset conversion:
Dataset Name: Very fine resolution dynamically downscaled climate data for Hawaii
Large collection of WRF output e.g.
Dataset {
Float32 HGT[south_north = 205][west_east = 180];
Float32 LANDMASK[south_north = 205][west_east = 180];
Float32 XLAT[south_north = 205][west_east = 180];
Float32 XLONG[south_north = 205][west_east = 180];
Int16 CFRACL[Time = 175296][south_north = 205][west_east = 180];
Int16 CFRACT[Time = 175296][south_north = 205][west_east = 180];
Float32 FGDP[Time = 175296][south_north = 205][west_east = 180];
Float32 GLW[Time = 175296][south_north = 205][west_east = 180];
Float32 GRDFLX[Time = 175296][south_north = 205][west_east = 180];
Float32 GSW[Time = 175296][south_north = 205][west_east = 180];
Float32 HFX[Time = 175296][south_north = 205][west_east = 180];
Int32 I_RAINNC[Time = 175296][south_north = 205][west_east = 180];
Float32 LAI[Time = 175296][south_north = 205][west_east = 180];
Float32 LH[Time = 175296][south_north = 205][west_east = 180];
Int16 LU_INDEX[Time = 175296][south_north = 205][west_east = 180];
Float32 LWP[Time = 175296][south_north = 205][west_east = 180];
Float32 PSFC[Time = 175296][south_north = 205][west_east = 180];
Float32 Q2[Time = 175296][south_north = 205][west_east = 180];
Float32 RAINNC[Time = 175296][south_north = 205][west_east = 180];
Float32 SNOW[Time = 175296][south_north = 205][west_east = 180];
Int16 SNOWC[Time = 175296][south_north = 205][west_east = 180];
Float32 SNOWH[Time = 175296][south_north = 205][west_east = 180];
Float32 T2[Time = 175296][south_north = 205][west_east = 180];
Float32 TSK[Time = 175296][south_north = 205][west_east = 180];
Int32 Time[Time = 175296];
Float32 U10[Time = 175296][south_north = 205][west_east = 180];
Float32 V10[Time = 175296][south_north = 205][west_east = 180];
} hawaii_hawaii_present;
May be too large to work against OPeNDAP -- ~15GB per year split into two domains for 20 years
Recommend testing OPeNDAP access speed and switching to direct if absolutely necessary.
Identify Source Data location and access (check the dataset spreadsheet) -
Collect ownership information (Who do we ask questions of if we have problems?) -
Create new workflow notebook from template; stash in the ./workflows
folder tree in an appropriate spot.-
Identify landing spot on S3 (currently somewhere in: https://s3.console.aws.amazon.com/s3/buckets/nhgf-development?prefix=workspace/®ion=us-west-2) -
Calculate chunking, layout, compression, etc -
Run notebook -
Read test (pattern to be determined by the dataset)
Create STAC catalog entry; -
Verify all metadata -
Create entry
Reportage -
add notebook and the dask performance report to the repo -
Calculate summary statistics on output (compression ratio, total size) -
Save STAC JSON snippet to repo
Merge and close the issue.