Zemmels review - fix inconsistencies in names and def's
This issue is to investigate the issues raised by Joe, verify, and fix.
Copied text from Joe's review ( #96 (closed)): I'm somewhat confused about the differences in arguments referring to the names of columns in a dataframe. For example, both the plots.plot_duration_hydrograph
and percentiles.calculate_variable_percentile_thresholds_by_day
functions have a date_column_name
argument, but the definition of this argument differs between these two functions. Great observation, we have standardized all functions to use the more verbose 'date_column_name' input.
Further, the plots.plot_duration_hydrograph
function has a data_col
argument that seems to have the same usage as the data_column_name
argument in percentiles.calculate_variable_percentile_thresholds_by_day
. Another good catch. We have standardized all functions to use the more verbose 'data_column_name' input.
I recommend standardizing the names and description of arguments that are reused in multiple functions. I think this will help users make connections easier between the package's functions. We went through and checked to ensure inputs were not only consistent across functions, but had the exact same input description to improve usability.
Another example of different arguments used in similar ways: could using df
as in rolling_average
vs. data
as in filter_approved_data
be standardized? Noted. We have used the term 'df' where the function requires a dataframe, and have opted to retain the input 'data' when the input could take multiple forms (array, series, df, etc.). See 'calculate_fixed_percentile_thresholds' for an example of this type of input.