Percentile calculation with few data points
Currently percentile calculations are using np.percentile
or np.nanpercentile
which returns the min or max value if the percentile exceeds the percent rank that can be calculated from the data. As an example, if we have 20 data points, the 95th percentile might be 1,000 cfs. We don't have enough data to determine the 99th percentile rank from the data, but np.percentile
would return 1,000 cfs as the 99th percentile. The image from the numpy documentation illustrates this behaivor with the areas highlighted in yellow all having the same value. The highlighted area would be smaller the more data points that are available. This behaivor is different than what is done in NWIS Stats and WaterWatch (see https://code.usgs.gov/water/computational-tools/surface-water-work/validation-testing/-/merge_requests/1).
Possible solution, add an argument that sets percentile values to be NA if they are determined to be beyond the maximum percentile rank that can be calculated from the data.