Miniseed factory's endtime is not inclusive
To Do:
- adjust starttime/endtime by subtracting/adding a half-interval to the call to underlying clients/factories that retrieve data series, then use ObsPy class member functions to trim to the originally requested interval. If this can be generalized to work with all factories, that would be ideal, but at least do this for MiniSeedFactory and EdgeFactory.
Details:
If you use the MiniSeedFactory.get_timeseries() to pull 1-minute data from starttime to endtime, you will get the correct length trace(s) returned (i.e., (endtime - starttime)/delta + 1), but the last data point always seems to be a NaN. The MiniSeedFactory class wrapper to ObsPy's neic (miniseed) client, which in turn, simply opens a socket to a specified Edge host to request miniseed data using -s
, -b
, and -d
parameters much like the java-based CWBQuery tool. If 500 microseconds (half-millisecond) is added to endtime, the expected top-of-the-minute value is returned correctly. In fact, if you request 1-second data, but the last requested second falls on a top-of-the-minute, everything occurs identically.
Here's an example using 1-minute data (assumes Edge running locally):
from geomagio.edge import MiniSeedFactory
from obspy.core import UTCDateTime
miniseed_factory = MiniSeedFactory(
host = 'localhost',
port = 2061,
interval = 'minute',
type = 'variation',
)
miniseed_stream = miniseed_factory.get_timeseries(
starttime=UTCDateTime(2022, 10, 5, 0, 15),
endtime=UTCDateTime(2022, 10, 5, 0, 20),
observatory='BOU',
channels=['U', 'V', 'W']
)
print(miniseed_stream[0].data)
[20768.2265625, 20768.23828125, 20768.26953125, 20768.45117188, 20768.4765625, nan]
In contrast, if you use the EdgeFactory to pull data from starttime to endtime, you will get the correct length trace(s) returned (i.e., (endtime - starttime)/delta + 1), and the last data point is as expected. The EdgeFactory is a geomag-algorithms wrapper to ObsPy's earthworm (waveserver) client. Here's an example:
from geomagio.edge import EdgeFactory
from obspy.core import UTCDateTime
edge_factory = EdgeFactory(
host = 'localhost',
port = 2060,
interval = 'minute',
type = 'variation',
)
edge_stream = edge_factory.get_timeseries(
starttime=UTCDateTime(2022, 10, 5, 0, 15),
endtime=UTCDateTime(2022, 10, 5, 0, 20),
observatory='BOU',
channels=['H', 'E', 'Z']
)
print(edge_stream[0].data)
[20768.226, 20768.238, 20768.269, 20768.451, 20768.476, 20768.445]
Using Python's debugger, I was able to trace down a few additional potentially helpful facts:
- when requesting 1-minute data using the MiniSeedFactory (neic client) as above, 5 512-byte miniseed blocks are returned; in other words, it seems that there is 1 miniseed block per 1-minute data point.
- when requesting 1-minute data using the MiniSeedFactory (neic client) as above, but with endtime + 500 microseconds, 6 512-byte miniseed blocks are returned.
- when requesting 1-second data using the MiniSeedFactory (neic client), and an endtime that falls on the top-of-the-minute, the final miniseed block returned does NOT include the top-of-the-minute; however...
- when requesting 1-second data using the MiniSeedFactory (neic client), and 500 microseconds is added to an endtime that falls on the top-of-the-minute, the final miniseed block returne3d DOES include the top-of-the-minute.
This is a somewhat pressing problem because the Geomag program has at least one customer that regularly pulls 1-minute realtime data, and low-latency is a priority. The issue with the seconds data and top-of-the-minute gaps is not really all that bad, but when 1-minute data is delayed by 1-minute, this can be problematic. There is also reason to believe that our back-filling Python routines are not working quite as expected due to these top-of-the-minute gaps.