Skip to content

Switch from day of year to month-day percentile calculations

Hinman, Elise D requested to merge doy-handling into main

This MR changes the approach to filtering data in calculate_variable_percentile_thresholds_by_day from using the day of the year to using the month-day. I found it easiest to create a new function in utils.py called filter_data_by_month_day. This function uses a MM-DD input and isolates the month and day to create a df of dates that match the month and day. If there are trailing and leading values, it determines the date range for each year that would make up the moving window. It then loops through each date range and pulls the appropriate data from each date range in the df (leap year or non-leap year). It retuns a series of data that match the month-day filter to be used to calculate percentiles for that month day.

I also adjusted the calculate_variable_percentile_thresholds_by_day to always construct a 366-row long percentile df. It uses a leap year to construct the initial empty df, which is then filled using month-day percentiles calculated from the filtered data. Days for which there are no data or insufficient data are filled with NaNs. This will break many of the unit tests, since the function previously used min and max day to determine the size of the percentiles df. I need to go back and fix these, but I wanted to get a pulse on whether this seems like a good approach.

We will also need to test this new output with the plotting functions to see if I broke anything else.

Potentially addresses: #61 (closed), #4 (closed). Fits into construct posed in #6 (closed).

Edited by Hinman, Elise D

Merge request reports