Skip to content

Revised format of percentile thresholds and improvements to percentile calculations

Scott Hamshaw requested to merge sdh-percentile-fixes into main

Key change with this MR is that the format of the percentile threshold data has changed to better align with proper statistical methods. Now matches closer to NWIS Stats Service and WaterWatch data formats. This MR should resolve some issues raised in initial validation of percentile calculations. Addresses issues #69 (closed) and #72 (closed)

Highlights of changes:

  • percentile column names are no longer a numeric 5, 10, 50, etc. They are now formatted as 'p05', 'p10', 'p50', etc.
  • percentiles of 0 and 100 are no longer calculated as they aren't possible to get with unbiased percentile methods like the default 'weibull' method. Instead, a min and max value are calculated and returned. Can optionally, disable min and max calculations in the percentile functions
  • metadata including the count, start_yr, end_yr, and mean are able to be calculated for each month-day
  • When a percentile is calculated, we now also check whether the minimum number of data points supports that calculation by checking the min and max percentile rank available from the data (using the exceedance probability functions). This is on by default and can optionally be disabled.
  • Some clean-up of methods in the calculate_exceedance_probability_from_values to align with method names in numpy.percentile

Merge request reports