Combining identical SNCL traces with overlapping times result in gaps
If data gets "overwritten" in the Edge database, there will be multiple miniseed (MS) blocks holding similar data, since Edge, by design, never deletes previously written data.
The obspy NEIC client, used by geomag-algorithms' MiniSeedFactory, does not attempt to filter or sort these blocks. It simply pulls every MS block that holds any data falling within the requested time interval and places it into an obspy Stream object as a unique Trace. Finally, it uses obspy's powerful Stream.merge(-1) method to combine these similar Traces.
If there are time overlaps in these Traces, Stream.merge(-1)
will leave the offending Traces alone, and return a multi-Trace Stream. Therefore, following the example of previously written I/O factories in geomag-algorithms, the MiniSeedFactory calls Stream.merge()
to force the multiple Traces to be combined.
That said, whoever wrote the first geomag-algorithms I/O factory that used this method did not fully understand or appreciate what the default and implicit "0" option did when Stream.merge()
was invoked. Any time overlaps are turned into "gaps"! This wreaks havoc with the logic used by geomag-algorithms' update mechanism, and makes it nearly impossible to overwrite data and retrieve these same data.
This was rarely, if ever, an issue until the MiniSeedFactory came along because the older EdgeFactory, which uses the obspy earthworm client, could rely on Edge to never return duplicate copies of data. Rather it always returned the most recently written data, effectively disallowing access to "overwritten" data in Edge.
One possible solution is to invoke Stream.merge(1)
, which applies obspy's logic for over-writing data. This logic is described in detail in their online docs, which is a good thing because the actual code is difficult to decipher.
HOWEVER, this logic fails us if we want to overwrite an interval that falls within (is "contained by", using obspy's language) a previously written interval. There won't be a gap, but the expectation that data written at a later time will take precedence upon retrieval is violated, and only the larger "containing" interval data will be returned.
After much experimentation with Edge and the NEIC client, I convinced myself that Edge returns MS blocks in the order they were originally written. We can put this to our advantage by introducing a fairly small amount of new code to the MiniSeedFactory to replace where it originally called data.merge()
("data" being the Stream object returned from the NEIC client).
# initialize empty temporary stream
st_tmp = Stream()
# loop over returned traces in the order they were returned
for tr in data:
# add next tr to temporary stream
st_tmp += tr
# replace time overlaps with gaps
st_tmp.merge(0)
# split on gaps, then add next trace to stream, and finally,
# replace gaps with most recently written data in tr
st_tmp = (st_tmp.split() + tr).merge(0)
# point `data` to the newly merged stream and continue processing
data = st_tmp
This should always give precedence to the latest miniseed block written to edge, regardless of its time stamp interval. This is very Edge-specific, and may not be appropriate for data retrieved from other sources, or using different protocols. This should be OK for geomag-algorithms' MiniSeedFactory.