Skip to content

Update runoff calculation function following discussion

Hinman, Elise D requested to merge update-runoff-wt-calc-eh into main

This merge request implements the results of the discussion in this MR: Update runoff weight mtx param alt (!55) · Merge requests · Water / Computational Tools / Surface Water Work / hyswap · GitLab (usgs.gov)

First, the MR adjusts the format of the runoff data input from a list to a dictionary. Leveraging a dictionary is helpful (I think?) because the function can filter to particular basin's runoff by the name of the site (the name of its entry in the dictionary), rather than loop through the entire list looking for the df with the runoff data from the site of interest. There is probably a better way to do this, but with my beginner skills, it was the easiest solution. You can see these edits to Jay's query_nwis_runoff_data function in the updated Jupyter notebook (needs data zip Margaux provided to me over Surface Water teams chat), which walks through how to calculate runoff for one or more hucs:

calculate_runoff_fun_v4_edh.ipynb

Second, the MR reimagines the calculate_geometric_runoff function to follow the WaterWatch readme. As in Margaux's MR, the function begins by checking that the site_col has the correct string format and multiplies the proportion columns by a multiplier, if in percent rather than proportion.

It then filters the geom intersections df to only intersections with hucs that have runoff data. This piece needs more thought and discussion that isn't covered in the read me. More below.

Next, the function searches for instances where the proportion of geom in basin AND proportion of basin in geom are over 0.9. If this condition is met, it uses the runoff values from the basin corresponding with the highest weight value (proportion in basin x proportion in geom).

If that closeness of overlap doesn't exist in the weights table, the function goes to the next step of finding instances where either the proportion of geom in basin is near 1 OR the proportion of basin in geom is near 1. I chose 'near one' rather than 1 because sometimes the intersection between two shapes isn't perfect (e.g. a basin may be contained within a geometry but they may share a border such that a tiny sliver of the basin may be outside of the geometry just based on shape drawing error). These instances make up a filtered geom intersections dataframe that is used to calculate a weighted runoff value for each day with runoff data.

Something that was challenging to accommodate was the dearth of data for some basins over the test time periods used. In other words, there might be sites whose drainage area intersects the geometry of interest, but they may not have any data with which to estimate runoff. In some cases during my testing, area-based runoff could not be calculated because there wasn't any data for the basins that fit the conditional statement in the readme. I approached it the best I could, but this probably requires some more thought.

Caveats and further issues:

Currently, the function will use whatever it can find to estimate runoff using a weighted average for a specified geometry, and I think this is incorrect for situations where a huc contains/is contained within only ONE basin with data. For example, if there's only one basin with runoff data available that is 1000x bigger than the geometry and contains the geometry, and no basins with data within the geometry, the geometry's runoff will be equivalent to the runoff calculated for that giant basin:

(tiny weight x giant basin's runoff)/tiny weight = giant basin's runoff

Similarly, if there's only one basin available that is 1000x smaller than the geometry and is contained within the geometry, and there are no basins with data that contain the geometry, the geometry's runoff will be equivalent to the runoff calculated for the tiny basin.

How do we want this to work? For the first example, I would think we would get rid of the "weighted average" part, and just make the estimated runoff weight x runoff, but I'm not sure what to do about the second example.

[UPDATE]: I think this caveat/issue has been solved by the new approach to calculating a weighted average. I checked the numpy.average function, and if there is only one runoff value, it doesn't apply the weighting factor...because there is no average to be made.

Are there thresholds/rules we want to implement? E.g. we only estimate runoff when we have data from sites within AND containing the geometry? I'm unsure whether this sort of filtering will be done prior to calculating runoff. Margaux and I discussed this yesterday but it wasn't clear to me what degree of guard rails need to be on this function vs how much of it is on the user to figure out.

This MR needs some strong validation testing and some tests added.

Edited by Hinman, Elise D

Merge request reports