ssebopeta Conversion to Zarr

Checklist for Workflow associated with dataset conversion:

Dataset Name: Operational Simplified Surface Energy Balance Conterminous U.S. Actual Evapotranspiration Data

Collection of GeoTIFF files that have been processed with GDAL and NCO to make them a NetCDF collection. Need to tailor this: https://code.usgs.gov/wma/nhgf/geo-data-portal/thredds-config/-/blob/master/mnt_thredds/ssebopeta/nco.sh?ref_type=heads for ZARR

https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/ssebopeta/monthly https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/ssebopeta/yearly

https://cida.usgs.gov/thredds/catalog/demo/thredds/ssebopeta/catalog.html

Identify Source Data location and access (check the dataset spreadsheet)
Collect ownership information (Who do we ask questions of if we have problems?)
Create new workflow notebook from template; stash in the ./workflows folder tree in an appropriate spot.
- Identify landing spot on S3 (currently somewhere in: https://s3.console.aws.amazon.com/s3/buckets/nhgf-development?prefix=workspace/&region=us-west-2)
- Calculate chunking, layout, compression, etc
- Run notebook
- Read test (pattern to be determined by the dataset)
Create STAC catalog entry;
- Verify all metadata
- Create entry
Reportage
- add notebook and the dask performance report to the repo
- Calculate summary statistics on output (compression ratio, total size)
- Save STAC JSON snippet to repo
Merge and close the issue.

Edited Oct 31, 2023 by Blodgett, David L.

Assignee Loading

Time tracking Loading