Squirrel tutorial

TODO: In this tutorial we will download data for the … Event…

Downloading data

Squirrel offers transparent download of seismic waveforms and station metadata from FDSN web services. With an appropriate dataset configuration this can happen just

We first create a local Squirrel environment, so that all the downloaded files as well as the database are stored in the current directory under .squirrel/. This will make it easier to clean up when we are done (rm -rf .squirrel/). If we omit this step, the user’s global Squirrel environment (~/.pyrocko/cache/squirrel/) is used.

Create local environment (optional):

TODO: picture

$ squirrel init

To use a remote data source we can create a dataset description file and pass this to the --dataset option of the various squirrel subcommands. Examples of such dataset description files are provided by the squirrel template command. By chance there already is an example for accessing all LH channels from BGR’s FDSN web service! We can save the example dataset description file with

$ squirrel template bgr-gr-lh.dataset -w
squirrel:psq.cli.template - INFO - File written: bgr-gr-lh.dataset.yaml

The dataset description is a nicely commented YAML file and we could modify it to our liking.

bgr-gr-lh.dataset.yaml
--- !squirrel.Dataset

# All file paths given below are treated relative to the location of this
# configuration file. Here we may give a common prefix. For example, if the
# configuration file is in the sub-directory 'PROJECT/config/', set it to '..'
# so that all paths are relative to 'PROJECT/'.
path_prefix: '.'

# Data sources to be added (LocalData, FDSNSource, CatalogSource, ...)
sources:
- !squirrel.FDSNSource

  # URL or alias of FDSN site.
  site: bgr

  # Uncomment to let metadata expire in 10 days:
  #expires: 10d

  # Waveforms can be optionally shared with other FDSN client configurations,
  # so that data is not downloaded multiple times. The downside may be that in
  # some cases more data than expected is available (if data was previously
  # downloaded for a different application).
  #shared_waveforms: true

  # FDSN query arguments to make metadata queries.
  # See http://www.fdsn.org/webservices/fdsnws-station-1.1.pdf
  # Time span arguments should not be added here, because they are handled
  # automatically by Squirrel.
  query_args:
    network: 'GR'
    channel: 'LH?'

Expert users can get a non-commented version of the file by adding --format brief to the squirrel template command.

Now we tell squirrel to update the meta-information for the time interval of interest. This is done with the squirrel update command. Channel information intersecting with the given time interval will be downloaded.

TODO: picture

$ squirrel update --dataset bgr-gr-lh.dataset.yaml --tmin 2021-07-28 --tmax 2021-08-01
[...]
squirrel update:psq.client.fdsn           - INFO     - FDSN "bgr" metadata: querying...
squirrel update:psq.client.fdsn           - INFO     - FDSN "bgr" metadata: new (expires: never)
[...]
squirrel update:psq.cli.update            - INFO     - Squirrel stats:
  Number of files:               2
  Total size of known files:     87 kB
  Number of index nuts:          160
  Available content kinds:
    channel: 120 1991-09-01 00:00:00.000 - <none>
    station: 40  <none>                  - <none>
  Available codes:
    GR.AHRW..LHE GR.AHRW..LHN GR.AHRW..LHZ GR.AHRW.*    GR.ASSE..LHE GR.ASSE..LHN
    GR.ASSE..LHZ GR.ASSE.*    GR.BFO..LHE  GR.BFO..LHN
    [140 more]
    GR.UBR..LHZ  GR.UBR.*     GR.WET..LHE  GR.WET..LHN  GR.WET..LHZ  GR.WET.*
    GR.ZARR..LHE GR.ZARR..LHN GR.ZARR..LHZ GR.ZARR.*
  Sources:
    client:fdsn:b3ad21f2a866c178889cfdf4f493eba588a59543
  Operators:                     <none>

After fetching the meta information from the FDSN web service, a brief overview of the contents currently known to Squirrel is printed.

If we run the update command a second time, Squirrel informs us that cached metadata has been used:

$ squirrel update --dataset bgr-gr-lh.dataset.yaml --tmin 2021-07-28 --tmax 2021-08-01
[...]
squirrel update:psq.client.fdsn           - INFO     - FDSN "bgr" metadata: using cached (expires: never)
[...]

It is possible to set an expiration date for the metadata in the dataset configuration.

If we later need the instrument response information of the seismic stations of the data selection, we can add the --responses option to the update command:

TODO: picture

$ squirrel update  --responses --dataset bgr-gr-lh.dataset.yaml --tmin 2021-07-28 --tmax 2021-08-01
[...]
  Available content kinds:
    channel:  120 1991-09-01 00:00:00.000 - <none>
    response: 150 1991-01-01 00:00:00.000 - <none>
    station:  40  <none>                  - <none>
[...]

Now we also have response information which contains details about how the seismometers convert physical ground motion into measurement records.

Next we must give permission to Squirrel to download data given certain constraints. Squirrel will only download waveform data when it has a so-called promise for a given time span and channel. These promises must be explicitly created with the --promises option of squirrel update. We are only interested in vertical component seismograms at this point, so we restrict promise creation to channels ending in ‘Z’.

TODO: picture (local channels with Z are marked blue)

$ squirrel update --promises --dataset bgr-gr-lh.dataset.yaml --tmin 2021-07-28 --tmax 2021-08-01 --codes '*.*.*.??Z'
[...]
  Available content kinds:
    channel:          120 1991-09-01 00:00:00.000 - <none>
    station:          40  <none>                  - <none>
    waveform_promise: 40  2021-07-28 00:00:00.000 - 2021-08-01 00:00:00.000
[...]

To actually download the waveforms, we can now use the squirrel summon command.

TODO: picture (waveforms are added).

$ squirrel summon --dataset bgr-gr-lh.dataset.yaml --tmin 2021-07-28 --tmax 2021-08-01

Finally we can have a look at the data.

$ squirrel snuffler --dataset bgr-gr-lh.dataset.yaml
output of squirrel_tutorial1.png

TODO: The M8.2 Alaska earthquake is at TIME …

output of squirrel_tutorial2.png

Waveforms are always downloaded in blocks of reasonable size, therefore the downloaded time frame may be slightly larger than the requested time span.

Dataset conversion

So far the data has been downloaded into a special cache directory maintained by Squirrel. Using the data from there is useful if we will later add more waveforms. However, sometimes we want to create our own waveform archive in a portable form.

TODO: picture

To copy the data downloaded in the previous section into a handy directory structure, we can use the squirrel jackseis command. With its --out-sds-path a standard SDS data directory with day-files in MSEED format is created.

$ squirrel jackseis --dataset bgr-gr-lh.dataset.yaml --out-sds-path data/sds
$ tree data/   # Use `ls`, if `tree` is not installed.
data/
└── sds
    └── 2021
        └── GR
            ├── BFO
            │   └── LHZ.D
            │       ├── GR.BFO..LHZ.D.2021.208
            │       ├── GR.BFO..LHZ.D.2021.209
            │       ├── GR.BFO..LHZ.D.2021.210
            │       ├── GR.BFO..LHZ.D.2021.211
            │       ├── GR.BFO..LHZ.D.2021.212
            │       └── GR.BFO..LHZ.D.2021.213
            ├── ...

We will use this dataset as a “local dataset” in the following sections.

TODO: add metadata export TODO: picture

$ squirrel jackseis --dataset bgr-gr-lh.dataset.yaml --out-meta-path meta/stations.xml

Local datasets

To inspect some local data holdings, we can use the Snuffler application. Add files and directories to /

$ squirrel snuffler --add data/sds meta/stations.xml
$ quirrel template local.dataset

file listing

explain path_prefix

squirrel snuffler –dataset config/local.dataset.yaml

Dataset inspection

squirrel scan –dataset config/local.dataset.yaml

squirrel coverage –dataset config/local.dataset.yaml

squirrel codes –dataset config/local.dataset.yaml

squirrel nuts –dataset config/local.dataset.yaml –codes ‘.BFO..*’

squirrel files –dataset config/local.dataset.yaml –codes ‘.BFO..*’

Earthquake catalogs