Squirrel API usage

A complete example: calculate hourly RMS values

In this tutorial we will write a user-friendly command line interface (CLI) program to calculate hourly root-mean-square (RMS) values for continuous seismic waveform recordings. Many methods in seismology process waveform recordings in similar ways and so this example can serve as a design pattern for tools like daily spectrograms, PSD plots, drum plots, earthquake detectors, automatic pickers, and so on. We will show different styles of writing such a script/app, each useful in slightly different situations. If you use the provided building blocks and follow the conventions laid out below, your app will immediately feel familiar to other Squirrel users. And it will ship with a bunch of nice features, like support for extra large datasets. And it will be pleasure to maintain.

Specs for the tutorial app:

  • Calculate hourly RMS values.

  • Get data from local file collection or from online FDSN web services.

  • Keep and reuse downloaded raw data (don’t download same data twice).

  • Process a given time span.

  • Station/channel selection.

  • Efficiently hop through the data in chunks of one hour

  • Always start chunks at full hour.

  • Bandpass filter data before calculating RMS.

  • Extinguish filter artifacts by use of overlapping time windows.

Variant 1: quick and dirty script

You feel like hard-coding today? Sometimes we do not need any sophisticated CLI or configuration files.

Here’s a commented solution to the RMS calculation task. Study carefully - there is a lot of functionality packed into this short script.

import numpy as np
from pyrocko import progress
from pyrocko.util import setup_logging, time_to_str, str_to_time_fillup as stt
from pyrocko.squirrel import Squirrel

# Enable logging and progress bars:
setup_logging('squirrel_rms1.py', 'info')
progress.set_default_viewer('terminal')   # or 'log' to just log progress


def rms(data):
    return np.sqrt(np.sum(data**2))


# Time span (Hunga Tonga explosion was on 2022-01-15):
tmin = stt('2022-01-14')
tmax = stt('2022-01-17')

# Filter range:
fmin = 0.01
fmax = 0.05

# All data access will happen through a single Squirrel instance:
sq = Squirrel()

# Uncomment to add local data directories. This will only index the waveform
# headers and not load the actual waveform data at this point. If the files are
# already known and unmodified, it will use cached information.
# sq.add('data/2022')

# Add online data source. Enable access to BGR's FDSN web service and restrict
# to GR network and LH channels only:
sq.add_fdsn('bgr', query_args=dict(network='GR', channel='LH?'))

# Ensure meta-data from online sources is up to date for the selected time
# span:
sq.update(tmin=tmin, tmax=tmax)

# Allow waveform download for station BFO. This does not download anything
# yet, it just enables downloads later, when needed. Omit `tmin`, `tmax`,
# and `codes` to enable download of everything selected in `add_fdsn(...)`.
sq.update_waveform_promises(tmin=tmin, tmax=tmax, codes='*.BFO.*.*')

# Iterate window-wise, with some overlap over the data:
for batch in sq.chopper_waveforms(
        tmin=tmin,
        tmax=tmax,
        codes='*.*.*.LHZ',
        tinc=3600.,              # One hour time windows (payload).
        tpad=1.0/fmin,           # Add padding to absorb filter artifacts.
        want_incomplete=False,   # Skip incomplete windows.
        snap_window=True):       # Start all windows at full hours.

    for tr in batch.traces:

        # Filtering will introduce some artifacts in the padding interval.
        tr.highpass(4, fmin)
        tr.lowpass(4, fmax)

        # Cut off the contaminated padding. Trim to the payload interval.
        tr.chop(batch.tmin, batch.tmax)

        # Print channel codes, timestamp and RMS value of the hour.
        print(tr.str_codes, time_to_str(batch.tmin), rms(tr.ydata))

Note

In the code above, we have shown three different ways of restricting the channels to be processed. Each of these narrows down the set of channels available but they have very different effects:

  • The query_args argument of sq.add_fdsn() restricts the meta-data known to Squirrel by limiting what is fetched from the FDSN station web service. Using query_args has the benefit of reducing the size of the meta-data to transmit and parse. However changing it later will result in new queries to be made. If unchanged, the locally cached meta-data will be used when the program is rerun.

  • By calling sq.update_waveform_promises() we give Squirrel permission to download certain waveforms by creating so-called waveform promises for matching channels and time intervals. If we call the method without specifying the codes argument, waveforms from all stations/channels are marked as downloadable. Squirrel will not download anything if we do not call sq.update_waveform_promises().

  • The codes argument of sq.chopper_waveforms() selects from the waveforms we have locally and from the waveform promises set up earlier. Downloading will happen just as needed and in chunks of reasonable size.

Before we actually run the script we will first create a local Squirrel environment, so that all the downloaded files as well as the database are stored in the current directory under .squirrel/. This will make it easier to clean up when we are done (rm -rf .squirrel/). If we omit this step, the user’s global Squirrel environment (~/.pyrocko/cache/squirrel/) is used.

Create local environment (optional):

$ squirrel init

And now let’s run our script:

$ python squirrel_rms1.py
[...]
squirrel_rms1.py:psq.client.fdsn - INFO - FDSN "bgr" metadata: querying...
squirrel_rms1.py:psq.client.fdsn - INFO - FDSN "bgr" metadata: new (expires: never)
[...]
squirrel_rms1.py:psq.base        - INFO - Waveform orders standing for download: 1 (1)
squirrel_rms1.py:psq.client.fdsn - INFO - FDSN "bgr" waveforms: downloading, 1 order: GR.BFO..LHZ
squirrel_rms1.py:psq.client.fdsn - INFO - FDSN "bgr" waveforms: 1 download successful
[...]
GR.BFO..LHZ. 2022-01-14 00:00:00.000 1663.1710971934713
GR.BFO..LHZ. 2022-01-14 01:00:00.000 1773.5581525847992
GR.BFO..LHZ. 2022-01-14 02:00:00.000 1688.5986175096787
[...]
squirrel_rms1.py:psq.base        - INFO - Waveform orders standing for download: 1 (1)
squirrel_rms1.py:psq.client.fdsn - INFO - FDSN "bgr" waveforms: downloading, 1 order: GR.BFO..LHZ
squirrel_rms1.py:psq.client.fdsn - INFO - FDSN "bgr" waveforms: 1 download successful
GR.BFO..LHZ. 2022-01-14 22:00:00.000 1570.7909549562307
GR.BFO..LHZ. 2022-01-14 23:00:00.000 1595.3630840478215
GR.BFO..LHZ. 2022-01-15 00:00:00.000 1445.7303611595091
[...]

Excellent! It is downloading waveform data and calculating RMS values.

The lines with the RMS values are printed to stdout, while log messages go to stderr. Like this, we could for example redirect only the RMS results to a file but still see the log messages in the terminal:

$ python squirrel_rms1.py > rms-GR.BFO..LHZ.txt

Running the script a second time is way faster, because nothing has to be downloaded.

Not very flexible though with all the hard-coded settings in the script. Read on to see how we can configure data access from the command line.

Variant 2: command line app (simple)

Instead of hard-coding the data sources in the script, we could set them with command line arguments. The pyrocko.squirrel.tool module offers functionality to set up our program so that it accepts the same options and arguments like for example squirrel scan. Here’s the complete program after changing to this to use from_command():

squirrel_rms2.py - Notable differences to Variant 1 highlighted.
#!/usr/bin/env python3
import numpy as np
from pyrocko.util import time_to_str, str_to_time_fillup as stt
from pyrocko import squirrel


def rms(data):
    return np.sqrt(np.sum(data**2))


tmin = stt('2022-01-14')
tmax = stt('2022-01-17')

fmin = 0.01
fmax = 0.05

# Setup Squirrel instance with waveforms from files given on command line.
# Supports all common options offered by Squirrel tool commands like `squirrel
# scan`:  --help, --loglevel, --progress, --include, --exclude, --optimistic,
# --format, --kind, --persistent, --update, --dataset

parser = squirrel.SquirrelArgumentParser(
    description='Report hourly RMS values.')

parser.add_squirrel_selection_arguments()
args = parser.parse_args()
sq = args.make_squirrel()

sq.update(tmin=tmin, tmax=tmax)
sq.update_waveform_promises(tmin=tmin, tmax=tmax, codes='*.BFO.*.*')

for batch in sq.chopper_waveforms(
        tmin=tmin,
        tmax=tmax,
        codes='*.*.*.LHZ',
        tinc=3600.,
        tpad=1.0/fmin,
        want_incomplete=False,
        snap_window=True):

    for tr in batch.traces:
        tr.highpass(4, fmin)
        tr.lowpass(4, fmax)
        tr.chop(batch.tmin, batch.tmax)
        print(tr.str_codes, time_to_str(batch.tmin), rms(tr.ydata))

To get RMS values of some local data, we could run

$ python squirrel_rms2.py data/2022

It is also self-documenting:

$ python squirrel_rms2.py --help
usage: examples/squirrel_rms2.py [--help] [--loglevel LEVEL] [--progress DEST]
                             [--include REGEX] [--exclude REGEX]
                             [--optimistic] [--format FORMAT]
                             [--kind KINDS] [--persistent NAME] [--update]
                             [--dataset FILE]
                             [paths [paths ...]]

Report hourly RMS values.

Positional arguments:
  paths                 Files and directories with waveforms, metadata and
                        events.

Optional arguments:
  --help, -h            Show this help message and exit.
  --loglevel LEVEL      Set logger level. Choices: critical, error, warning,
                        info, debug. Default: info.
  --progress DEST       Set how progress status is reported. Choices: terminal,
                        log, off. Default: terminal.

[...]
--dataset FILE, -d FILE
                    Add directories, files, remote sources from dataset
                    description file. This option can be repeated to add
                    multiple datasets. Run `squirrel template` to obtain
                    examples of dataset description files.

So, to use a remote data source we can create a dataset description file and pass this to --dataset. Examples of such dataset description files are provided by the squirrel template command. By chance there already is an example for accessing all LH channels from BGR FDSN web service! We can save the example dataset description file with

$ squirrel template bgr-gr-lh.dataset -w
squirrel:psq.cli.template - INFO - File written: bgr-gr-lh.dataset.yaml

The dataset description is a nicely commented YAML file and we could modify it to our liking.

bgr-gr-lh.dataset.yaml
--- !squirrel.Dataset

# All file paths given below are treated relative to the location of this
# configuration file. Here we may give a common prefix. For example, if the
# configuration file is in the sub-directory 'PROJECT/config/', set it to '..'
# so that all paths are relative to 'PROJECT/'.
path_prefix: '.'

# Data sources to be added (LocalData, FDSNSource, CatalogSource, ...)
sources:
- !squirrel.FDSNSource

  # URL or alias of FDSN site.
  site: bgr

  # FDSN query arguments to make metadata queries.
  # See http://www.fdsn.org/webservices/fdsnws-station-1.1.pdf
  # Time span arguments should not be added here, because they are handled
  # automatically by Squirrel.
  query_args:
    network: 'GR'
    channel: 'LH?'

To calculate RMS values for the configured dataset, we can now run

$ python squirrel_rms2.py --dataset bgr-gr-lh.dataset.yaml
[...]
GR.BFO..LHZ. 2022-01-14 00:00:00.000 1663.1710971934713
GR.BFO..LHZ. 2022-01-14 01:00:00.000 1773.5581525847992
GR.BFO..LHZ. 2022-01-14 02:00:00.000 1688.5986175096787
[...]

This is a bit more flexible because we can now easily exchange the data used from the command line. But there is still room for improvements. Read on to see how we can add our own options to the program’s command line interface.

Variant 3a: command line app (tight integration, single-command)

#!/usr/bin/env python3
import numpy as np
from pyrocko.util import time_to_str as tts
from pyrocko import squirrel


def rms(data):
    return np.sqrt(np.sum(data**2))


class ReportRMSTool(squirrel.SquirrelCommand):

    def setup(self, parser):
        self.add_selection_arguments(parser)

        # Add '--codes', '--tmin' and '--tmax', but not '--time'.
        self.add_query_arguments(parser, without=['time'])

        parser.add_argument(
            '--fmin',
            dest='fmin',
            metavar='FLOAT',
            type=float,
            help='Corner of highpass [Hz].')

        parser.add_argument(
            '--fmax',
            dest='fmax',
            metavar='FLOAT',
            type=float,
            help='Corner of lowpass [Hz].')

    def call(self, parser, args):
        sq = self.squirrel_from_selection_arguments(args)

        fmin = args.fmin
        fmax = args.fmax

        query_args = self.squirrel_query_from_arguments(args)
        sq.update(**query_args)
        sq.update_waveform_promises(**query_args)

        for batch in sq.chopper_waveforms(
                tinc=3600.,
                tpad=1.0/fmin if fmin is not None else 0.0,
                want_incomplete=False,
                snap_window=True,
                **query_args):

            for tr in batch.traces:

                if fmin is not None:
                    tr.highpass(4, fmin)

                if fmax is not None:
                    tr.lowpass(4, fmax)

                tr.chop(batch.tmin, batch.tmax)
                print(tr.str_codes, tts(batch.tmin), rms(tr.ydata))


squirrel.from_command(
    ReportRMSTool(),
    description='Report hourly RMS values.')

Variant 3b: command line app (tight integration, multi-command)

squirrel_rms3b.py - Differences to Variant 3a highlighted.
#!/usr/bin/env python3
import numpy as np
from pyrocko.util import time_to_str as tts
from pyrocko import squirrel


def rms(data):
    return np.sqrt(np.sum(data**2))


class ReportRMSTool(squirrel.SquirrelCommand):

    def setup_subcommand(self, subparsers):
        return self.add_parser(
            subparsers, 'rms', help='Report hourly RMS values.')

    def setup(self, parser):
        self.add_selection_arguments(parser)

        # Add '--codes', '--tmin' and '--tmax', but not '--time'.
        self.add_query_arguments(parser, without=['time'])

        parser.add_argument(
            '--fmin',
            dest='fmin',
            metavar='FLOAT',
            type=float,
            help='Corner of highpass [Hz].')

        parser.add_argument(
            '--fmax',
            dest='fmax',
            metavar='FLOAT',
            type=float,
            help='Corner of lowpass [Hz].')

    def call(self, parser, args):
        sq = self.squirrel_from_selection_arguments(args)

        fmin = args.fmin
        fmax = args.fmax

        query_args = self.squirrel_query_from_arguments(args)
        sq.update(**query_args)
        sq.update_waveform_promises(**query_args)

        for batch in sq.chopper_waveforms(
                tinc=3600.,
                tpad=1.0/fmin if fmin is not None else 0.0,
                want_incomplete=False,
                snap_window=True,
                **query_args):

            for tr in batch.traces:

                if fmin is not None:
                    tr.highpass(4, fmin)

                if fmax is not None:
                    tr.lowpass(4, fmax)

                tr.chop(batch.tmin, batch.tmax)
                print(tr.str_codes, tts(batch.tmin), rms(tr.ydata))


class PlotRMSTool(squirrel.SquirrelCommand):

    def setup_subcommand(self, subparsers):
        return self.add_parser(
            subparsers, 'plot', help='Plot RMS traces.')

    def call(self, parser, args):
        self.fail('Not implemented yet!')


squirrel.from_command(
    subcommands=[ReportRMSTool(), PlotRMSTool()],
    description='My favourite RMS tools.')

Variant 4: command line app (loose integration)

import argparse
import numpy as np
from pyrocko.util import time_to_str as tts
from pyrocko.squirrel.tool import common as squirrel_tool_common


def rms(data):
    return np.sqrt(np.sum(data**2))


parser = argparse.ArgumentParser()
squirrel_tool_common.add_selection_arguments(parser)

parser.add_argument(
    '--fmin',
    dest='fmin',
    metavar='FLOAT',
    type=float,
    help='Corner of highpass [Hz].')

parser.add_argument(
    '--fmax',
    dest='fmax',
    metavar='FLOAT',
    type=float,
    help='Corner of lowpass [Hz].')

args = parser.parse_args()

sq = squirrel_tool_common.squirrel_from_selection_arguments(args)

fmin = args.fmin
fmax = args.fmax

for batch in sq.chopper_waveforms(
        tinc=3600.,
        tpad=1.0/fmin if fmin is not None else 0.0,
        want_incomplete=False,
        snap_window=True):

    for tr in batch.traces:

        if fmin is not None:
            tr.highpass(4, fmin)

        if fmax is not None:
            tr.lowpass(4, fmax)

        tr.chop(batch.tmin, batch.tmax)
        print(tr.str_codes, tts(tr.tmin), rms(tr.ydata))