Skip to content

GitLab

Explore

Sign in

Admin message

Maintenance scheduled for Thursday, May 23 from 5 PM to 6 PM ET.

Water Mission Area
nhgf
Geo-Data-Portal
GDP_Data_Processing
Issues
#29

bcca Conversion to Zarr

Checklist for Workflow associated with dataset conversion:

Dataset Name:

Statistically downscaled GCM data using Bias Corrected Constructed Analogs Daily CMIP3 Climate Projections V2

Statistically downscaled GCM data using Bias Corrected Constructed Analogs Daily CMIP5 Climate Projections V2

https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/cmip5_bcca/historical https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/cmip5_bcca/future

https://cida.usgs.gov/thredds/catalog/demo/thredds/bcca/cmip5/catalog.html

https://cida.usgs.gov/thredds/catalog/demo/thredds/bcca/cmip3/catalog.html

Should be able to follow LOCA pattern with this one. It is VERY large and NetCDF files have unique variable names.

Identify Source Data location and access (check the dataset spreadsheet)
Collect ownership information (Who do we ask questions of if we have problems?)
Create new workflow notebook from template; stash in the ./workflows folder tree in an appropriate spot.
- Identify landing spot on S3 (currently somewhere in: https://s3.console.aws.amazon.com/s3/buckets/nhgf-development?prefix=workspace/&region=us-west-2)
- Calculate chunking, layout, compression, etc
- Run notebook
- Read test (pattern to be determined by the dataset)
Create STAC catalog entry;
- Verify all metadata
- Create entry
Reportage
- add notebook and the dask performance report to the repo
- Calculate summary statistics on output (compression ratio, total size)
- Save STAC JSON snippet to repo
Merge and close the issue.

Edited Oct 23, 2023 by Blodgett, David L.

Assignee

Select assignees

Time tracking