Revised format of percentile thresholds and improvements to percentile calculations (!62) · Merge requests · Water / Computational Tools / Surface Water Work / hyswap

Scott Hamshaw requested to merge sdh-percentile-fixes into main Jan 19, 2024

Key change with this MR is that the format of the percentile threshold data has changed to better align with proper statistical methods. Now matches closer to NWIS Stats Service and WaterWatch data formats. This MR should resolve some issues raised in initial validation of percentile calculations. Addresses issues #69 (closed) and #72 (closed)

Highlights of changes:

percentile column names are no longer a numeric 5, 10, 50, etc. They are now formatted as 'p05', 'p10', 'p50', etc.
percentiles of 0 and 100 are no longer calculated as they aren't possible to get with unbiased percentile methods like the default 'weibull' method. Instead, a min and max value are calculated and returned. Can optionally, disable min and max calculations in the percentile functions
metadata including the count, start_yr, end_yr, and mean are able to be calculated for each month-day
When a percentile is calculated, we now also check whether the minimum number of data points supports that calculation by checking the min and max percentile rank available from the data (using the exceedance probability functions). This is on by default and can optionally be disabled.
Some clean-up of methods in the calculate_exceedance_probability_from_values to align with method names in numpy.percentile

Revised format of percentile thresholds and improvements to percentile calculations

Merge request reports