Skip to content

GitLab

Explore

Sign in

Admin message

Maintenance scheduled for Thursday, May 23 from 5 PM to 6 PM ET.

Water Mission Area
nhgf
Geo-Data-Portal
GDP_Data_Processing
Issues
#28

Macav2 Conversion to Zarr

***If this one turns out to be hard to convert, we should archive it.

Checklist for Workflow associated with dataset conversion:

Dataset Name: Multivariate Adaptive Constructed Analogs (MACA) CMIP5 Statistically Downscaled Data for Coterminous USA

https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/macav2metdata_daily_future https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/macav2metdata_daily_historical https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/macav2metdata_monthly_future https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/macav2metdata_monthly_historical

https://cida.usgs.gov/thredds/catalog/demo/thredds/macav2/catalog.html

Follow LOCA pattern -- very large collection of downscaling that is in netcddf files with unique variable names.

Identify Source Data location and access (check the dataset spreadsheet)
- MACAV2-METDATA Datasets consist of historical (1950-2005) and future (2006-2099) datasets in both daily and monthly time-steps.
- Source data are located in: s3://nhgf-development/thredds/macav2/
- Datasets represent the output from a list of models: BNU-ESM
- See list of models and variables attached in comments below
Collect ownership information (Who do we ask questions of if we have problems?)
Create new workflow notebook from template; stash in the ./workflows folder tree in an appropriate spot.
- Identify landing spot on S3 (currently somewhere in: https://s3.console.aws.amazon.com/s3/buckets/nhgf-development?prefix=workspace/&region=us-west-2)
- Calculate chunking, layout, compression, etc
- Run notebook
- Read test (pattern to be determined by the dataset)
Create STAC catalog entry;
- Verify all metadata
- Create entry
Reportage
- add notebook and the dask performance report to the repo
- Calculate summary statistics on output (compression ratio, total size)
- Save STAC JSON snippet to repo
Merge and close the issue.

Edited Feb 13, 2024 by Brown, Donald

Assignee

Select assignees

Time tracking