ProcessingAlgorithm and interval metadata support (!384) · Merge requests · ghsc / National Geomagnetism Program / geomag-algorithms

Tagging @ahobbs @nshavers @swilbur @bgeels.

This is a processing algorithm that will calculate indices for metadata then apply the necessary operation to the indexed values. This is currently only setup to work for spikes and artificial_disturbances, but I have setup the basic support and structure to work - with the adjusted algorithm and offset metadata.

The idea is that after the indices are calculated, the indexed values for spikes and artificial_disturbances will be replaced with nans because they are bad data.
When the time comes, a method will need to be added so that the indexed values for the adjusted algorithm and offset metadata will perform the necessary operation.

Some current issues and thoughts for the future:

I noticed that if I create new metadata in production and then immediately try to create it again, there is a large and inconsistent delay in the database. The database will not find the newly created metadata and will therefore create the same metadata again. This is leading me to believe there is some sort of cache that is messing up the queries.
Unfortunately, this code is a bit messy right now due to the number of conditionals needed as a result of how our data is structured. I am hoping that some of Ali's future changes will allow me to clean this up.
This should work with second and minute intervals, but I had to add support in other places in order to accomplish this.
Given the above, it would be nice to edit Controller.py to use FilterAlgorithm.py to generate despiked minute data while only storing spikes in the seconds resolution (or a similar solution). This also follows the logic that we will stop storing minutes data one day and would clean things up.

Some additional thoughts on data structures/algorithms for future consideration:

I currently have spike metadata only store the spike timestamps like this. I considered making these masked arrays that would span the entire day so that I could use a direct array comparison between the spike timestamps and trace timestamps.
I also considered using a binary search given the current different sized arrays with np.searchsorted.
I ended up taking advantage of the fact that these are timestamps and so I just compute the indices directly assuming: index=round((spike time−start time)×sampling rate)

Edited Apr 03, 2025 by Wernle, Alexandra Nicole

ProcessingAlgorithm and interval metadata support

Merge request reports