Skip to content

GitLab

Explore

Sign in

alaska_et_2020 Conversion to Zarr

Checklist for Workflow associated with dataset conversion:

Dataset Name: Gridded 20km Daily Reference Evapotranspiration for the State of Alaska from 1979 to 2017

https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/alaska_et_2020/gfdl_historical_simulation https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/alaska_et_2020/ccsm4_historical_simulation https://cida.usgs.gov/thredds/catalog.html?dataset=cida.usgs.gov/alaska_et_2020/era-interim_reanalysis

https://cida.usgs.gov/thredds/catalog/demo/thredds/alaska_et_2020/catalog.html

3 instances of 1 variable @ 261 X 261 X 22325 natively stored in annual NetCDF files.

Suggest working converting directly from OPeNDAP to avoid file diversity. Should be able to request ~100 time steps at a time.

Identify Source Data location and access (check the dataset spreadsheet)
Collect ownership information (Who do we ask questions of if we have problems?)
Create new workflow notebook from template; stash in the ./workflows folder tree in an appropriate spot.
- Identify landing spot on S3 (currently somewhere in: https://s3.console.aws.amazon.com/s3/buckets/nhgf-development?prefix=workspace/&region=us-west-2)
- Calculate chunking, layout, compression, etc
- Run notebook
- Read test (pattern to be determined by the dataset)
Create STAC catalog entry;
- Verify all metadata
- Create entry
Reportage
- add notebook and the dask performance report to the repo
- Calculate summary statistics on output (compression ratio, total size)
- Save STAC JSON snippet to repo
Merge and close the issue.

Edited Dec 27, 2023 by Andrew Laws

Assignee Loading

Time tracking Loading

Confidentiality

Confidentiality controls have moved to the issue actions menu () at the top of the page.