Squirrel Tool - dataset inspection and management¶
The squirrel command line tool is a front-end to the Squirrel data access infrastructure. It offers functionality to
inspect various aspects of a data collection.
pre-scan / index file collections.
download data from online sources (FDSN web services, earthquake catalogs).
manage separate (isolated, local) environments for different projects.
manage persistent selections to speed up access to very large datasets.
Command reference¶
- squirrel
- squirrel codes - Get summary of available data codes.
- squirrel coverage - Report time spans covered.
- squirrel database - Database inspection and maintenance.
- squirrel database env - Show current Squirrel environment.
- squirrel database stats - Show information about cached meta-data.
- squirrel database files - Show paths of files for which cached meta-data is available.
- squirrel database nuts - Dump index entry summaries.
- squirrel database cleanup - Remove leftover volatile data entries.
- squirrel database remove - Remove cached meta-data of files matching given patterns.
- squirrel files - Lookup files providing given content selection.
- squirrel init - Create local environment.
- squirrel jackseis - Convert waveform archive data.
- squirrel jackseis-classic - Squirrel's adaption of classic Jackseis.
- squirrel nuts - Search indexed contents.
- squirrel operators - Print available operator mappings.
- squirrel persistent - Manage persistent selections.
- squirrel scan - Scan and index files and directories.
- squirrel snuffler - Experimental Squirrel-powered Snuffler.
- squirrel summon - Fill local cache.
- squirrel template - Print configuration snippets.
- squirrel update - Update remote sources inventories.
Help¶
The squirrel
tool and its subcommands are self-documenting with the
--help
option. Run squirrel
without any options to get the list of
available subcommands. Run squirrel SUBCOMMAND --help
to get details about
a specific subcommand.
Common options¶
Options shared between subcommands are grouped into three categories:
General options include
--loglevel
to select the program’s verbosity and--progress
to control how progress status is indicated. These are provided by all of Squirrel’s subcommands.Data collection options control which files and other data sources should be aggregated to form a dataset. The
--add
option to add files and directories. Further options are available to include/exclude files by regular expression patterns, to restrict to use selected content kinds only (waveform, station, channel, response, event), to create persistent data selections and more. Finally, the--dataset
option is provided to configure the dataset conveniently in a YAML file rather than repeatedly with the many command line options. Using--dataset
includes the possibility to add online data sources.Data query options are used to restrict processing/presentation to a subset of a data collection. They have no influence on the data collection itself, only on what is shown. It is possible to query by time interval (
--tmin
,--tmax
,--time
), channel/station code pattern (--codes
), and content kinds (--kinds
).
Tutorial¶
Downloading data¶
We first create a local Squirrel environment, so that all the downloaded files
as well as the database are stored in the current directory under
.squirrel/
. This will make it easier to clean up when we are done (rm
-rf .squirrel/
). If we omit this step, the user’s global Squirrel environment
(~/.pyrocko/cache/squirrel/
) is used.
Create local environment (optional):
$ squirrel init
To use a remote data source we can create a dataset description file and pass
this to the --dataset
option of the various squirrel
subcommands.
Examples of such dataset description files are provided by the squirrel
template
command. By chance there already is an example for accessing all LH
channels from BGR’s FDSN web service! We can save the example dataset
description file with
$ squirrel template bgr-gr-lh.dataset -w
squirrel:psq.cli.template - INFO - File written: bgr-gr-lh.dataset.yaml
The dataset description is a nicely commented YAML file and we could modify it to our liking.
--- !squirrel.Dataset
# All file paths given below are treated relative to the location of this
# configuration file. Here we may give a common prefix. For example, if the
# configuration file is in the sub-directory 'PROJECT/config/', set it to '..'
# so that all paths are relative to 'PROJECT/'.
path_prefix: '.'
# Data sources to be added (LocalData, FDSNSource, CatalogSource, ...)
sources:
- !squirrel.FDSNSource
# URL or alias of FDSN site.
site: bgr
# Uncomment to let metadata expire in 10 days:
#expires: 10d
# Waveforms can be optionally shared with other FDSN client configurations,
# so that data is not downloaded multiple times. The downside may be that in
# some cases more data than expected is available (if data was previously
# downloaded for a different application).
#shared_waveforms: true
# FDSN query arguments to make metadata queries.
# See http://www.fdsn.org/webservices/fdsnws-station-1.1.pdf
# Time span arguments should not be added here, because they are handled
# automatically by Squirrel.
query_args:
network: 'GR'
channel: 'LH?'
Expert users can get a non-commented version of the file by adding --format
brief
to the squirrel template
command.
Now we tell squirrel to update the meta-information for the time interval of interest.
$ squirrel update --dataset bgr-gr-lh.dataset.yaml --tmin 2021-07-28 --tmax 2021-08-01
[...]
squirrel update:psq.client.fdsn - INFO - FDSN "bgr" metadata: querying...
squirrel update:psq.client.fdsn - INFO - FDSN "bgr" metadata: new (expires: never)
[...]
squirrel update:psq.cli.update - INFO - Squirrel stats:
Number of files: 2
Total size of known files: 87 kB
Number of index nuts: 160
Available content kinds:
channel: 120 1991-09-01 00:00:00.000 - <none>
station: 40 <none> - <none>
Available codes:
GR.AHRW..LHE GR.AHRW..LHN GR.AHRW..LHZ GR.AHRW.* GR.ASSE..LHE GR.ASSE..LHN
GR.ASSE..LHZ GR.ASSE.* GR.BFO..LHE GR.BFO..LHN
[140 more]
GR.UBR..LHZ GR.UBR.* GR.WET..LHE GR.WET..LHN GR.WET..LHZ GR.WET.*
GR.ZARR..LHE GR.ZARR..LHN GR.ZARR..LHZ GR.ZARR.*
Sources:
client:fdsn:b3ad21f2a866c178889cfdf4f493eba588a59543
Operators: <none>
After fetching the meta information from the FDSN web service, a brief overview of the contents currently known to Squirrel is printed.
If we run the update command a second time, Squirrel informs us that cached metadata has been used:
$ squirrel update --dataset bgr-gr-lh.dataset.yaml --tmin 2021-07-28 --tmax 2021-08-01
[...]
squirrel update:psq.client.fdsn - INFO - FDSN "bgr" metadata: using cached (expires: never)
[...]
It is possible to set an expiration date for the metadata in the dataset configuration.
Next we must give permission to Squirrel to download data given certain
constraints. Squirrel will only download waveform data when it has a so-called
promise for a given time span and channel. These promises must be explicitly
created with the --promises
option of squirrel update
. We are only
interested in vertical component seismograms at this point, so we restrict
promise creation to channels ending in ‘Z’.
$ squirrel update --promises --dataset bgr-gr-lh.dataset.yaml --tmin 2021-07-28 --tmax 2021-08-01 --codes '*.*.*.??Z'
[...]
Available content kinds:
channel: 120 1991-09-01 00:00:00.000 - <none>
station: 40 <none> - <none>
waveform_promise: 40 2021-07-28 00:00:00.000 - 2021-08-01 00:00:00.000
[...]
To actually download the waveforms, we can now use the squirrel summon
command.
$ squirrel summon --dataset bgr-gr-lh.dataset.yaml --tmin 2021-07-28 --tmax 2021-08-01
Finally we can have a look at the data.
$ squirrel snuffler --dataset bgr-gr-lh.dataset.yaml
TODO: screenshot snuffler (save as png), no controls, full time window
Waveforms are always downloaded in blocks of reasonable sizes, therefore the downloaded time frame may be slightly larger than the requested time span.
TODO: screenshot snuffler (save as png), waveforms
M8.2 Alaska earthquake.
Dataset conversion¶
So far the data has been downloaded into a special cache directory maintained by Squirrel. Using the data from there is useful if we will later add more waveforms but sometimes we are interested in creating our own waveform archive in a portable form.
To copy the data downloaded in the previous section into a handy directory
structure, we can use the squirrel jackseis
command. With its
--out-sds-path
a standard SDS data directory with
day-files in MSEED format is created.
$ squirrel jackseis --dataset bgr-gr-lh.dataset.yaml --out-sds-path data/sds
$ tree data/
data/
└── sds
└── 2021
└── GR
├── BFO
│ └── LHZ.D
│ ├── GR.BFO..LHZ.D.2021.208
│ ├── GR.BFO..LHZ.D.2021.209
│ ├── GR.BFO..LHZ.D.2021.210
│ ├── GR.BFO..LHZ.D.2021.211
│ ├── GR.BFO..LHZ.D.2021.212
│ └── GR.BFO..LHZ.D.2021.213
├── ...
We will use this dataset as a “local dataset” in the following sections.
Local datasets¶
Dataset inspection¶
P