Changes

Hotovec-Ellis, Alicia Jean · 8ddc4f80
--- a/Outputs.md
+++ b/Outputs.md
+REDPy automatically generates many output files at the end of each run of `redpy-catfill`, `redpy-backfill`, and `redpy-force-plot`. These outputs are designed to help you navigate the catalog and more easily notice patterns within and across families. Some scripts may also allow you to manually generate additional outputs.
+
 [[_TOC_]]

 ## File Structure

-Below is a chart of the file names and locations of the outputs generated by REDPy assuming the default output folder structure (i.e., within `REDPy/runs/`). Many files are generated by default as outputs of running `redpy-catfill`, `redpy-backfill`, or `redpy-force-plot`. Some are generated when other scripts are run, and these are denoted with the name of the script in square brackets. Yet others may be generated as default outputs, but only when certain settings are enabled and/or conditions met, and those are denoted in parentheses.
+Below is a chart of the file names and locations of the outputs generated by REDPy assuming the default output folder structure (i.e., within `REDPy/runs/`). Some are generated when other scripts are run, and these are denoted with the name of the script in `[square brackets]`. Yet others may be generated as default outputs, but only when certain settings are enabled and/or conditions met, and those are denoted in `(parentheses)`.

 ```
 REDPy/
@@ -12,7 +14,7 @@ REDPy/
 │   │   ├── families/
 │   │   │   ├── *.html
 │   │   │   ├── *.png
-│   │   │   ├── *.pdf     [redpy-create-pdf-family]
+│   │   │   ├── fam*.pdf  [redpy-create-pdf-family]
 │   │   │   ├── fam*.png
 │   │   │   ├── map*.png  (if checkcomcat=True and local matches found)
 │   │   │   ├── ...
@@ -22,13 +24,15 @@ REDPy/
 │   │   │   ├── YYYYmmddHHMMSS.SSSSSS-kurt.png  [redpy-plot-junk]
 │   │   │   ├── ...
 │   │   ├── reports/                  [redpy-create-report]
+│   │   │   ├── *-cmatrix.npy         [redpy-create-report -m]
+│   │   │   ├── *-evtimes.npy         [redpy-create-report -m]
 │   │   │   ├── *-report.html         [redpy-create-report]
 │   │   │   ├── *-report-bokeh.html   [redpy-create-report]
 │   │   │   ├── *-report.png          [redpy-create-report]
 │   │   │   ├── *-reportcmat.png      [redpy-create-report]
 │   │   │   ├── *-reportwaves.png     [redpy-create-report]
 │   │   │   ├── ...
-│   │   ├── catalog.txt
+│   │   ├── catalog.txt               (catalog.csv if verbosecatalog=True)
 │   │   ├── catalog_cores.txt
 │   │   ├── catalog_junk.txt          [redpy-plot-junk]
 │   │   ├── catalog_orphans.txt
@@ -50,54 +54,122 @@ REDPy/
 ```

 ## Interactive Timelines
-`overview.html`, `overview_meta.html`, and `overview_recent.html` are interactive plots with a shared time axis (i.e., panning or zooming in one window will update the rest). `overview_recent.html` shares the same format as `overview.html` but for only the last `recplot` days, and shows all families active within that period in the occurrence timeline (instead of clusters with at least `minplot` members). Meanwhile, `overview_meta.html` condenses all plots into tabs so many runs can appear on `meta.html` at once. This plot shows the last `mrecplot` days with at least `mminplot` members.
+The `overview.html`, `overview_meta.html`, and `overview_recent.html` files are interactive plots with a shared time axis (i.e., panning or zooming in one panel will update the rest) that can be opened in a web browser. `overview_recent.html` shares the same format as `overview.html` but for only the last `recplot` days, and shows all families active within that period in the occurrence timeline (instead of families with at least `minplot` members). Meanwhile, `overview_meta.html` condenses all plots into tabs so many runs can appear on a `meta.html` page at once. This plot shows the last `mrecplot` days with at least `mminplot` members. The ending time of the plots is set to the time of the latest trigger by default, but can also be set to the time the timeline was rendered by setting `bokehendtime=now`.
+
+Below is an annotated screenshot of the `overview_recent.html` output of the suggested default run:
+<img src="https://code.usgs.gov/ahotovec-ellis/REDPy/-/raw/main/img/output-overview.png" alt="Annotated overview_recent.html" />
+
+1. **Title** (here, 'REDPy Catalog') is set in `title`. It is also used for the `<title>` of the page.

-**Navigation bar** at top right has options to pan, zoom, tap, reset, and save. Title (here, 'REDPy Catalog') is set in `title`.
+2. The **navigation bar** at top right has options to pan, zoom, tap, reset, and save.

-The **Repeaters vs. Orphans** plot is a histogram with number of both types of events within bins defined by `binhr` and `binday`; the total number of triggers in that hour will be the sum of these two.
+By default, these timelines have four panels, the layout of which can be customized in `plotformat`:

-The **Frequency Index** plot has a point for every repeating event related to its frequency content. Tectonic-type events usually have FI>0 and 'long period' earthquakes have FI<0 with the default settings. This plot is useful for quickly identifying the character of repeating seismicity.
+3. The **Repeaters vs. Orphans** plot shows the number/counts of these two classifications within temporal bins defined by `binday`, `binhr`, and `mbinhr`. The total number of triggers in that hour will be the sum of these two lines, which can be displayed instead by setting `timeline_vs=triggers` (also changes the plot title to **Repeaters vs. Triggers**).

-The **Occurrence Timeline** has horizontal lines corresponding to individual clusters/families, with endpoints at the times of the first and last events in that cluster. Colored bars correspond to hours with activity within that cluster, colored by the number of events within that hour (see color scale at top left). The number to the right of the bars corresponds to the number of total members within the cluster. Hovering the mouse over a cluster will display a preview waveform (core event at `plotsta` station) and the cluster's ID number. Clicking here will open a [more detailed page about the cluster](#cluster-pages).
+4. The **Frequency Index** (FI) plot has a point for every repeating event with a quantity related to its frequency content, a ratio of energy in an upper and lower frequency band (see [Buurman and West (2006)](https://pubs.usgs.gov/pp/1769/chapters/p1769_chapter02.pdf) for more details on FI). Tectonic-type events usually have FI>0 and 'long period' earthquakes have FI<0 with the default settings, but note that the value is dependent on your choice of frequency bands (`filomin`, etc.) and many other factors. This plot is useful for quickly identifying the character of repeating seismicity and how the frequency content evolves with time. The dots are colored by the number of available channels during the correlation window that are not in a data gap, may be hovered for additional event information (e.g., event time, FI, family), and clicked to open the corresponding family page.

-The **Cluster Longevity** plot orders the clusters in the occurrence timeline by the length of time they are active, and can be useful for identifying times when many clusters die or are created. If the starting time of a cluster is before the date of the start of `overview_recent.html`, an arrow will indicate that the cluster extends off the plot.
+5. The **Occurrence Timeline** has horizontal lines corresponding to families, with endpoints at the times of the first and last events in that family. This line shows through when no members are active and will have an arrow if the family extends off the bounds of the plot. Colored bars correspond to time bins with activity within that family, colored by either the number of events within that bin (see color scale at top left) or the average FI (with color scale span controlled by `fispanlow` and `fispanhigh`). This bin width is controlled by `dybin`, `hrbin`, and `mhrbin`. By default, color by rate and FI are tabbed. The number to the right of the bars corresponds to the number of total members within the family. Hovering the mouse over a family will display a preview waveform (core event at `plotsta` station) and the family's ID number. Clicking here will open a [more detailed page about the family](#family-pages).

-Below is the `overview_recent.html` output of the suggested default run:
-<img src="https://raw.githubusercontent.com/ahotovec/REDPy/master/img/bokeh.png" width=900 alt="Default overview.html" />
+6. The **Family Longevity** plot orders the families in the occurrence timeline by the length of time they are active, and can be useful for identifying times when many families die or are created. If the starting time of a family is before the date of the start of `overview_recent.html`, an arrow will indicate that the family extends off the plot.
+
+The `meta_overview.html` version of these plots is roughly the same (varies by settings), but with all plots contained in tabs.

 ## Family Pages
-Each cluster has its own detailed page with statistics and plots. At the very top are links to the previous and next clusters (ordered by starting time) for navigation between subsequent clusters (though you may always change the number just before `.html` manually, if you wish). The preview waveform is the same as the [occurrence plot](#bokeh-plots) on `overview.html`, and some quick statistics are listed below it.
+Each family has its own detailed page with statistics and plots that can be opened in a web browser. Nominally, these are meant to be opened from a timeline page, but can be accessed directly within the `families/` folder, and are simply named with the family number (e.g., `0.html`).
+
+Below is an annotated screenshot of the `families/0.html` output of a lightly modified default run:
+<img src="https://code.usgs.gov/ahotovec-ellis/REDPy/-/raw/main/img/output-family.png" alt="Annotated family page for default run Family 0" />
+
+1. **Navigation** links to the previous and next families (ordered by starting time). You can skip to a specific family by changing the number in the address bar.
+
+2. The **preview waveform** is the same as the [occurrence plot](#bokeh-plots) on `overview.html`. This image is `families/0.png`.
+
+3. Some **statistics** related to the number and timing of members in the family.
+
+Below this is a multi-paneled image `families/fam0.png`:

-Top left plot shows the core (black) and stacked (red) waveforms at each station/channel used, and at right is the sum of the Fourier amplitude spectra over all stations. Below those are three timelines. First, the amplitude (on the same preview station only) of each event with time. Next, the time between successive members of the cluster in hours (note the logarithmic scale). Last is the cross-correlation coefficient relative to the best correlated event. Open circles at the bottom mean that no value is stored for that pair (either not computed or below `cmin`). This is intended to help visualize how the waveforms are changing with time. *Note that the coefficient plotted here is what is stored in the table, which is the maximum across all stations used.*
+4. The core (black) and stacked (red) **waveforms at each station/channel** used. The harder it is to see the core, the better it likely represents the family as a whole.

-Below is the page for Cluster 1 from the suggested default run:
-<img src="https://raw.githubusercontent.com/ahotovec/REDPy/master/img/cluster.png" width=900 alt="Default Cluster 1" />
+5. The normalized sum of the Fourier **amplitude spectra** (post-filtering) over all channels for both the single core event and all events together. Intended to quickly summarize the strongest frequencies in the signal across all observations.
+
+6. **Timeline of amplitude** (on the same preview station `printsta` only) of each event with time.
+
+7. **Timeline of inter-event time** (time between successive members of the family) in hours. Note the logarithmic scale on the y-axis.
+
+8. **Timeline of cross-correlation coefficient** relative to the best correlated event (i.e., the event that has the maximum sum across rows in the stored correlation matrix). Open circles at the bottom mean that no value is stored for that pair (either not computed or below `cmin`). This is intended to help visualize how the waveforms are changing with time. A black symbol will denote which event is the current core event, which may be different from the best correlated. *Note that the coefficient plotted here is what is stored in the table, which is across all stations used.*
+
+If `checkcomcat=False` (by default), this will be the end of the page. If it is `True`, and if a match to at least one member in the family was found, more of the page will be rendered:
+
+9. A **map of matched local events** will have the locations of local matched events as red dots. Locations given in `stalats` and `stalons` are plotted as black triangles. The average depth of the family (usually relative to sea level) is given at the top of the map along with the number of matches found. This image is `families/map0.png` and is only rendered if a local match is found.
+
+10. A **list of matched events** will be listed for both local (black) and more distant (red; regional and teleseismic distances) matches. All matches are listed, including possible conflicting matches, along with the best matching phase arrival. This list scrolls to conserve space. Totals are listed at the bottom. If no matches are found the list will be empty.

 ## Text Catalogs
-`catalog.txt`: Dates of all repeaters in the catalog and their associated cluster number. If `verbosecatalog` is `True`, then frequency index, amplitude, time since previous event in hours, and correlation coefficient with respect to the best correlated event are also included.

-`cores.txt`: Dates of core events and their cluster number. Could be used to create templates from core events.
+Several text-based catalogs are written to the output directory:
+
+`catalog.txt`: Dates of all repeaters in the catalog and their associated family number. These are ordered by ascending family number and aligned event time within that family.

-`dailycounts.txt`: Tabulated daily 'histogram' of occurrence of each cluster.
+`catalog.csv`: If `verbosecatalog=True`, then frequency index, amplitudes for each channel in `[square braces]`, time since previous event in hours, and correlation coefficient with respect to the best correlated event are also included, which allow external reproduction of the timelines in the family images. Note that `catalog.txt` will not be written!

-`junk.txt`: If [plotJunk.py](./Scripts-and-Helper-Functions#plotjunkpy) was run, dates of all triggers and associated type code.
+`catalog_cores.txt`: Dates of current core events and their family number. Could be used to create templates from core events.

-`orphancatalog.txt`: Dates of current orphans.
+`catalog_orphans.txt`: Dates of current orphans available to be adopted.

-`swarm.csv`: This file can be read by [Swarm](https://volcanoes.usgs.gov/software/swarm/download.shtml) v2.8.5+ using the tagging feature to annotate the interactive helicorders. It marks each repeating event with a label that has the `groupName` and the cluster it belongs to (so for the default run, family 1 would be labeled as 'default1'). The station listed is the one referenced by `printsta` in the configuration, and can be changed using global find/replace in a text editor to change which station or channel the tags will appear on. Finally, colors can be chosen for clusters of interest by adding lines to the `EventClassifications.config` file in the Swarm folder. For example, adding the line:
+`catalog_triggers.txt`: Dates of all triggers that made it past the junk filter. Includes deleted events (and subsequent matches to deleted families), expired and current orphans, and all repeaters. The time is the original trigger time, which may be slightly different from the event time listed in `catalog.txt`.
+
+`swarm.csv` and `swarm_triggers.csv`: These files can be read by [Swarm](https://volcanoes.usgs.gov/software/swarm/download.shtml) v2.8.5+ using the "tagging" feature to annotate the interactive helicorders. It marks each repeating event with a label that has the `groupname` and the family it belongs to (so for the default run, Family 1 would be labeled as 'default1'). The station listed is the one referenced by `printsta` in the configuration, and can be changed using global find/replace in a text editor to change which station or channel the tags should appear on. Colors can be chosen for families or types of interest by adding lines to the `EventClassifications.config` file in the Swarm folder. For example, adding the line:

 `default1, #ffff00`

-changes the appearance of members of the `default1` family to be yellow to stand out against the default red-orange of the rest of the catalog.
+changes the appearance of members of the `default1` family to be yellow to stand out against the default red-orange of the rest of the catalog. The `swarm_triggers.csv` file contains all events in `swarm.csv` but at the times that they originally triggered (i.e., before they were aligned).
+
+## Manually Generated Outputs
+
+### Reports
+
+Some families (especially large ones) may have interesting features that a user might want to investigate in more detail. A "report" can be generated with `redpy-create-report` for a given family (or list of families) that has more information than the standard family page.
+
+Below is an annotated screenshot of the `reports/report-0.html` output for a Family 0 report with no flags (`redpy-create-report 0`) after the default run completes:
+<img src="https://code.usgs.gov/ahotovec-ellis/REDPy/-/raw/main/img/output-reports.png" alt="Annotated report page for Family 0" />
+
+1. Instead of a navigation bar, the **time** that the report was rendered is listed. For runs that update on a schedule, this can help remind the user that the report may be out of date.
+
+2. The same **waveform preview** for the core event as the family page. This file (`reports/0-report.png`) is a copy of `families/0.png` from the time the report was rendered.
+
+3. The same **statistics** as the family page from the time the report was rendered.
+
+4. Instead of showing only the core and stack of the waveforms, images of **all waveforms** on all channels are shown (`reports/0-reportwaves.png`). Time is on the x-axis, and each row of pixels on the y-axis corresponds to a member of the family. Color is by amplitude, with white at 0, red for negative amplitudes, and blue for positive amplitudes. Each waveform is normalized, with some cropping of amplitudes toward the end of the waveform. This view takes up a significant amount of space (note that I've cropped out most of the next 6 panels!) but can help assess which channels have the most signal or see subtle changes in the waveforms with time.
+
+5. The **timeline of amplitude** plot is now interactive (pan, zoom), and amplitudes for all channels are now shown instead of just the single `printsta` channel. As the title suggests, click on the names in the legend to toggle hiding or showing each channel to isolate which you want to see.
+
+6. The **timeline of inter-event time** is now interactive, but otherwise matches the family page.
+
+7. The **timeline of cross-correlation coefficient** is also interactive, but now is explicitly relative to the core event. Values below `cmin` have been filled in, unless the `-s` flag has been used to "skip" recalculating the full dense cross-correlation matrix.
+
+These interactive timelines can be accessed by themselves in `reports/0-report-bokeh.html`.
+
+8. The **stored cross-correlation matrix** has rows and columns for each member, with darker colors corresponding to more similar waveforms and lighter colors to less similar. The color switches from yellow to white at `cmin`, which for the stored matrix corresponds to places where either the correlation value is below `cmin` for that pair, _or_ it was never calculated in the first place.
+
+9. The **full cross-correlation matrix** contains values for _all_ pairs, including those that were missing from the stored matrix. Values below `cmin` are still cropped at white, but values exist. For large families (>1000 members or so) this matrix can take a very long time to create. If you choose to skip re-calculating the full matrix with `-s`, this part of the image will not be included in `reports/0-reportcmat.png`.
+
+If you use the `-o` flag, the waveform plots in 4 and correlation matrices in 8 and/or 9 will be ordered with OPTICS, which is used to pick the core event. This ordering is based on similarity from the correlation matrix, such that similar events will be close to each other in the order, allowing visualization of possible sub-families within the family. The ordering is applied to both the waveform images and the correlation matrices. Note the tight groups of dark colors surrounded by lighter colors in the matrix on the right:
+
+<img src="https://code.usgs.gov/ahotovec-ellis/REDPy/-/raw/main/img/output-reports-ordered.png" alt="Comparison of ordering by time vs. ordering by OPTICS" />
+
+If you use the `-m` flag, two .npy (numpy format) files will be output: `reports/0-cmatrix.npy` with the values of the cross-correlation matrix (full by default, or only the stored matrix if `-s`) and `reports/0-evtimes.npy` with the corresponding event times. If using `-o`, these two files will reflect the ordering with OPTICS.
+
+### Associated Locations

-## Reports
+If `checkcomcat=True`, three files will be saved that contain external catalog locations for the three distance and magnitude thresholds defined in your [configuration](Inputs-and-Settings#configuration-file): `external_local.txt`, `external_regional.txt`, and `external_teleseismic.txt`. These are updated with each run to span the same time as the first and last triggers. For long-lived runs, these files can grow to be relatively large, but save a significant amount of time querying the original catalog over the web. You can replace these files with custom catalogs with the same format. However, be careful if the files do not completely bracket the triggers of your run as the code may attempt to append what it thinks is missing. If you want, you can also use these files as inputs into `redpy-compare-catalog`, which output by default to `matches.csv` in your current directory.

-go here
+Furthermore, if you run `redpy-write-family-locations`, `family_locations.csv` will be written to the output directory for that run with a summary of the median of locations contained within each [family page](Outputs#family-pages).

-## Associated Locations
+### Higher Quality Outputs

-go here
+Editable, publication quality (or at least nearer to it than a flat screenshot) .pdf versions of the timeline and family images may be created with `redpy-create-pdf-timeline` and `redpy-create-pdf-family`. The timeline will be created in the main output directory as `overview.pdf`, and there are several flags that control its appearance. The family image is created in `families/fam*.pdf` where `*` is the family number you specified. It will look almost identical to `families/fam*.png`, and you can control the time span (e.g., you want multiple families to share the same time axes for comparison, or you want to zoom in on a time of interest in a long-lived family).

-## Higher Quality Outputs
+### Contents of Junk Table

-go here
\ No newline at end of file
+The contents of the "junk" table may be output for troubleshooting purposes (i.e., to check that triggers included here have been excluded from consideration correctly). A text catalog `catalog_junk.txt` is created in the main outputs directory with the trigger times and "junk type" for each entry. The type is related to which thresholds were exceeded: `freq` means that the frequency index filter for teleseisms flagged that trigger for having too many channels (more than `teleok`) with FI below the threshold (`telefi`), `kurt` means that the kurtosis (sharpness/spikiness) of the waveform or its frequency spectrum was exceeded, or that the trigger failed `both` the frequency and kurtosis tests. A new folder `junk/` is created that has images of waveforms with filenames containing the time of the trigger (`YYYYmmddHHMMSS.SSSS` corresponding to year, month, day, hour, minute, and decimal second) and the junk type.
\ No newline at end of file