"## In progress - figuring out how to label bnds dim of dataset\n",
"\n",
"This is a workflow to build [STAC collections](https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md) from the zarr assets for the dataset named above. We use the [datacube extension](https://github.com/stac-extensions/datacube) to define the spatial and temporal dimensions of the zarr store, as well as the variables it contains.\n",
## In progress - figuring out how to label bnds dim of dataset
This is a workflow to build [STAC collections](https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md) from the zarr assets for the dataset named above. We use the [datacube extension](https://github.com/stac-extensions/datacube) to define the spatial and temporal dimensions of the zarr store, as well as the variables it contains.
To simplify this workflow so that it can scale to many datasets, a few simplifying suggestions and assumptions are made:
1. For USGS data, we can use the CC0-1.0 license. For all other data we can use Unlicense. Ref: https://spdx.org/licenses/
2. I am assuming all coordinates are from the WGS84 datum if not specified.
# When writing data to Zarr, Xarray sets this attribute on all variables based on the variable dimensions. When reading a Zarr group, Xarray looks for this attribute on all arrays,
#### user input needed - you will need to copy all of the dimensions printed below into the dict and fill in the appropriate attributes(type, axis, extent, etc.):
Please see [datacube spec](https://github.com/stac-extensions/datacube?tab=readme-ov-file#dimension-object) for details on required fields.
If you have a dimension like "bnds" that is used on variables like time_bnds, lon_bnds, lat_bnds to choose either the lower or upper bound, you can use and [additional dimension object](https://github.com/stac-extensions/datacube?tab=readme-ov-file#additional-dimension-object). We recommend making the type "count" as Microsoft Planetary Computer did [here](https://github.com/stac-extensions/datacube/blob/9e74fa706c9bdd971e01739cf18dcc53bdd3dd4f/examples/daymet-hi-annual.json#L76).
# spec says that the keys of cube:dimensions and cube:variables should be unique together; a key like lat should not be both a dimension and a variable.
# we will drop all values in dims from vars
vars=[vforvinvarsifvnotindims]
# Microsoft Planetary Computer includes coordinates and crs as variables here:
# spec says that the keys of cube:dimensions and cube:variables should be unique together; a key like lat should not be both a dimension and a variable.
# we will drop all values in dims from vars
vars=[vforvinvarsifvnotindims]
# Microsoft Planetary Computer includes coordinates and crs as variables here:
## Supplemental - Calendar Exploration based on zarr conversion workflow
A special handling of the calendars was set up in the [zarr converstion workflow](https://code.usgs.gov/wma/nhgf/geo-data-portal/gdp_data_processing/-/blob/main/workflows/opendap/CIDA/CPREP/cprep_conversion.ipynb?ref_type=heads) for this dataset, so this was further explored to see if it was necessary.
#### What do we get for min/max times if we do NOT decode the times (like was done in the zarr conversion workflow)?### What do we get for min/max times if we do NOT decode the times? - noleap
The workflow is overwriting the calendar values so that instead of being marked in the 12:00 hour, they are marked with the 00:00 hour. There doesn't seem to be different handling for Julian vs. noleap calendars. We will use the data's true data values and let xarray decode the times for us, rather than using this shifted timestamp.