Skip to content

rmcd code review

Code review

Daniel - The code looks good. I found problems with the netcdf_example.py but maybe it worked for you? Using the .yml file to create an conda env, and pip installing the local repo into that env resulted in a series of errors. I have added a suggested change to xstrm_local.yml and requirments.txt with some further suggestions below. Tests completed and the coverage looks complete.

Required docs missing per software-release-checklist

  • Disclaimer
  • Most USGS software I've seen uses the CC0 1.0 Universal Summary found here, however the existing license maybe fine?

Tests:

tests all completed and the coverage looks good.

netcdf_example.py:

  • In running this example, in addition to adding rioxarray I also had to add cftime, suggested by an error in reading the netcdf file on line 51. Suggest modifying the yml and requirements file for running this example.

    • include the following which should support most netcdf files:
      • netCDF4==1.5.8
      • xarray==2022.3.0
      • rasterio==1.2.10
      • rioxarray==0.11.1
  • gfv1_co_catchments.parquet is not included. At least point to a data repository other's could download a complete working dataset. to follow the examples. I think that will help someone new to the code.

  • I used shapefile (attached GagewsII_subset) containing gagesII basins I've been testing with and running against netcdf_example.py (note changes reflect my working version). In it's initial state opening with rioxarray the resulting aggregated values were wrong as the values which have units of kelvin are too large and do not convert to reasonable degrees fahrenheit.

    • I think I understand what might be causing the problem. It looks like rioxarray may not account for the add_offset and scale_factor attributes which are (I think) cf-conventions. For example when I read the netcdf file referenced in this example using xarray, the values are proper kelvin values, ie they convert to realistic F or C values. Adding mask_and_scale=True improves the answer but comparing the max of file opened with xarray and rioxarray results in 243.4 and 243 respectively so it looks like rioxarray is treating the values as integers?.

    • Finally, reading the netcdf with xnc = xr.open_dataset(nc) also tried xnc = xr.open_dataset(nc, decode_cf=True, decode_coords=True), both work so the decode keywords don't seem necessary. I'm pretty sure the key here is to have netCDF4 installed as xarray is using that as the netcdf4 as the default engine in the backend.

Other notes:

  • grid_to_poly.py

    • line 204 and 207: prefer bool to binary
    • In top-level documentation. I think of images as rasters, and model data as grids. That's semantics, but you might include something like summarization of rasters (geo-referenced images) or other grid based data such as netcdf's and keep that language consistent throughout the documentation. Just a suggestion.
    • If I am using an ndarray for grid_to_poly, and I leave out the grid_affine argument, it results in Value error specifying a need for affine transform for numpy arrays but you might want to catch this first.
  • point_to_poly.py

    • line 180 and 183 prefer bool to binary
  • GagaesII_subset.zip

Edited by Wieferich, Daniel Joseph