Metadata based instrument calibrations (restoring changes from MR #306 with improvements) (!311) · Merge requests · ghsc / National Geomagnetism Program / geomag-algorithms

Merged Geels, Brendan Ryan requested to merge ghsc/users/bgeels/geomag-algorithms:metadata-based-instrument-calibrations into master 11 months ago

All threads resolved!

This MR restores the changes in MR #306, that were later reverted, and also adds a file-based 'caching' feature to ensure that database server issues don't disrupt real-time data processing.

Here is the description from MR #306 ("Replace geomagio/Metadata.py"):

The goal of this MR is to move away from using instrument calibrations that are stored in an in-repo .py file and instead use the new instrument metadata (see !290 (merged)) from the database to generate the calibrations. The changes in this MR are closely related to or depend on the changes in !290 (merged) from @awernle and vice-versa.

Overview of changes:

Move get_instrument function from the temporary file geomagio/Metadata.py to geomagio/metadata/instrument/InstrumentCalibrations.py.
Rename get_instrument function to get_instrument_calibrations to avoid confusion with instrument metadata.
Change functionality of get_instrument_calibrations so that it pulls instrument metadata via get_metadata then processes it into calibration data using InstrumentCalibrations.
InstrumentCalibrations is a new class that compiles a list of instrument calibrations from a list of instrument metadata objects. The output elements use the same format that Metadata/get_instrument previously provided so changes to Miniseedfactory and tests are minimal.

Potential things to consider/discuss:

get_instrument_calibrations does not currently have a failover method for providing calibrations if it doesn't have a connection to the database. If we lost that connection, then we just get errors and an empty list.
get_instrument_calibrations appears to be called by MiniSeedFactory every time we filter data and for every channel ( @erigler , @swilbur does this seem correct?). We may also want to consider having MiniSeedFactory send the name of the channel that it is looking for calibrations on so that get_instrument_calibrations can handle it efficiently.
We will have to watch out for any processing issues that may be caused by delays from network latency or database performance issues.

NOTE: The default url is currently pointed at the staging database for testing. This should be updated in a future MR to point at production!

During testing our version of Cryptography started getting flagged by Audit so the poetry lock file had to be updated.

The file-based caching functions were added to Util.py to facilitate possible use in other processes. These save/read a single instance of an object, or list of objects, in JSON format under the user's .cache directory. File locks are used to prevent race conditions. InstrumentCalibrations.py saves a separate cache file for each observatory. It resorts to reading calibrations from the cache file if there are any errors when trying to pull or convert instrument metadata, and it outputs warnings when doing so.

TO-DO: Add another test to audit that checks cache functionality.

Edited 11 months ago by Geels, Brendan Ryan