notaro_2018 Conversion to Zarr
Checklist for Workflow associated with dataset conversion:
Dataset Name: Dynamical Downscaling for the Midwest and Great Lakes Basin
https://cida.usgs.gov/thredds/catalog/demo/thredds/notaro_2018/catalog.html
18 OPeNDAP endpoints e.g. https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/notaro_ACCESS_1980_1999
domains |
---|
ACCESS 1980-1999 |
ACCESS 2040-2059 |
ACCESS 2080-2099 |
CNRM 1980-1999 |
CNRM 2040-2059 |
CNRM 2080-2099 |
GFDL 1980-1999 |
GFDL 2040-2059 |
GFDL 2080-2099 |
IPSL 1980-1999 |
IPSL 2040-2059 |
IPSL 2080-2099 |
MIROC 1980-1999 |
MIROC 2040-2059 |
MIROC 2080-2099 |
MRI 1980-1999 |
MRI 2040-2059 |
MRI 2080-2099 |
Float32 iy[iy = 86];
Float32 jx[jx = 111];
Int32 time[time = 175320];
Semi complex join of many files. If direct OPeNDAP access can function, it should be used, creating aggregation from scratch is going to be non-trivial.
-
Identify Source Data location and access (check the dataset spreadsheet) -
-
Collect ownership information (Who do we ask questions of if we have problems?) -
-
Create new workflow notebook from template; stash in the ./workflows
folder tree in an appropriate spot.-
Identify landing spot on S3 (currently somewhere in: https://s3.console.aws.amazon.com/s3/buckets/nhgf-development?prefix=workspace/®ion=us-west-2) -
Calculate chunking, layout, compression, etc -
Run notebook -
Read test (pattern to be determined by the dataset)
-
-
Create STAC catalog entry; -
Verify all metadata -
Create entry
-
-
Reportage -
add notebook and the dask performance report to the repo -
Calculate summary statistics on output (compression ratio, total size) -
Save STAC JSON snippet to repo
-
-
Merge and close the issue.
Edited by Snyder, Amelia Marie