diff --git a/docs/algorithms/SqDist.md b/docs/algorithms/SqDist.md index c3e0935f6659c82211faf6b42aa79696d6aeb6f2..3356b5404594721dd5a38cce761871bd9632a061 100644 --- a/docs/algorithms/SqDist.md +++ b/docs/algorithms/SqDist.md @@ -22,9 +22,16 @@ variety of time scales associated with distinct physical phenomena. These are: SV is fairly easily separated from higher frequency variations using low-order polynomials to *detrend* the data. `SQ` and `DIST` have similar time scales, and are therefore more difficult to separate. Fourier series can be fit to data to -estimate `SQ`, which works well in non-real time situations. This approach -suffers in real time situations for both practical and theoretical reasons that -we won't discuss here. +estimate `SQ`, which works reasonably well in non-real time situations. + +However, Fourier decomposition suffers in real time processing because there is +a large computational burden associated with the long time series required to +properly reproduce seasonal and yearly variations, plus their associated +harmonics. Even if the computational burden were not an issue, such long time +series become a numerical problem when extreme artificial spikes, and even +moderate baseline shifts, are not corrected for, since the algorithm must +accommodate these non-periodic artifacts in the data fit for the entire duration +of the long time series (e.g., a year or more). ## Exponential Smoothing @@ -47,37 +54,47 @@ intervals) of the observations contributing to the current average. A weight of observations become the average. Exponential smoothing can be used to estimate a running mean, a linear trend, -even a periodic sequence of discrete "seasonal corrections" (there's more, but -we focus on these three here). We define separate forgetting factors for each: +even a periodic sequence of `m` discrete "seasonal corrections" (there's more, +but we focus on these three here). We define separate forgetting factors for +each: - `alpha` - sensitivity of running average to new observations - `beta` - sensitivity of linear trend to new observations - `gamma` - sensitivity of seasonal correction to new observations - -Now, suppose we have a time series of 1-minute resolution geomagnetic data, and -want to remove secular variations with a time scale longer than 30 days. 30 days -is 43200 minutes, so we specify `alpha=1/43200`. If we want to allow the slope -to vary with a similar time scale, we specify `beta=1/43200`. However, if we -want seasonal corrections to vary with a 30 day time scale, it is necessary to -account for the fact that they are only updated once per cycle. If that cycle is -1 day, or 1440 minutes, that means `gamma=1/43200*1440`. +- `m` - length of cycle for which seasonal corrections are estimated + +Now, suppose we have a typical time series of 1-minute resolution geomagnetic +data, and want to remove secular variations with a time scale longer than 30 +days (SV is traditionally fit to the monthly averages of "quiet days", leading +to ~30 day time scale). 30 days is 43200 minutes, so we specify `alpha=1/43200`. +If we want to allow the slope to vary with a similar time scale, we specify +`beta=1/43200`. However, if we want seasonal corrections to vary with a 30 day +time scale, it is necessary to account for the fact that they are only updated +once per cycle. If that cycle is 1 day, or `m=1440` minutes, that means +`gamma=1/43200*1440`. So, `alpha`, `beta`, and `gamma`, combined with observations, provide a running average of geomagnetic time series (`SV+SQ`). This is then subtracted from the actual observations to produce `DIST`. ## Why Exponential Smoothing? -In addition to real time data considerations, this approach is significantly -less computationally expensive than traditional Fourier techniques. No Fourier -transform of months-to-years-long data series is required, and memory -requirements are comparably reduced, since a description of the state of the -system at any given moment is only `2+m` (`m` is the number of data points -in an `SQ` cycle, plus `SV` and instantaneous slope of linear trend). - -Finally, exponential smoothing is generally more robust to common issues with -real time data series; it easily extrapolates `SV` and `SQ` across gaps in the -data; it provides a running estimate of the variance of `DIST`, which can be -used to set a threshold for spike detection; and it adjusts `SV` to accommodate -permanent DC offsets at rate specified by the user. +In addition to addressing the computational and operational shortcomings of +traditional Fourier decomposition methods in real time processing mentioned +above, our exponential smoothing with seasonal corrections has other advantages: +`SV` and `SQ` are not constrained to arbitrary functional forms, and so more +truly reflect actual observation; `SV` and `SQ` are easily extrapolated across +gaps in data; and a running estimate of the standard deviation of `DIST` is +provided (which can be used as a performance metric, to set noise thresholds, +etc.). Finally, the simplicity of exponential smoothing makes it more intuitive, +and thus easier to adapt to evolving operational requirements. + +There are disadvantages too, most notable perhaps being that `SQ` becomes quite +noisy if the general level of magnetic disturbance is large, as is the case at +high latitude observatories. The user can certainly apply ad hoc corrections to +`SQ` to smooth it. We are currently considering one such correction that still +retains all the best characteristic of exponential smoothing. A future version +of this algorithm may contain a configuration parameter that allows the user to +specify a local `SQ` smoothing window, in addition to the once-per-cycle +exponential smoothing currently allowed. ## References