pyrocko.squirrel.base

Squirrel main classes.

Classes

Batch(tmin, tmax, i, n, igroup, ngroups, traces)

Batch of waveforms from window-wise data extraction.

Squirrel([env, database, cache_path, persistent])

Prompt, lazy, indexing, caching, dynamic seismological dataset access.

SquirrelStats(**kwargs)

Container to hold statistics about contents available from a Squirrel.

class Batch(tmin, tmax, i, n, igroup, ngroups, traces)[source]

Bases: object

Batch of waveforms from window-wise data extraction.

Encapsulates state and results yielded for each window in window-wise waveform extraction with the Squirrel.chopper_waveforms() method.

Attributes:

tmin

Start of this time window.

tmax

End of this time window.

i

Index of this time window in sequence.

n

Total number of time windows in sequence.

igroup

Index of this time window’s sequence group.

ngroups

Total number of sequence groups.

traces

Extracted waveforms for this time window.

class Squirrel(env=None, database=None, cache_path=None, persistent=None)[source]

Bases: Selection

Prompt, lazy, indexing, caching, dynamic seismological dataset access.

Parameters:
  • env (Environment or str) – Squirrel environment instance or directory path to use as starting point for its detection. By default, the current directory is used as starting point. When searching for a usable environment the directory '.squirrel' or 'squirrel' in the current (or starting point) directory is used if it exists, otherwise the parent directories are search upwards for the existence of such a directory. If no such directory is found, the user’s global Squirrel environment '$HOME/.pyrocko/squirrel' is used.

  • database (Database or str) – Database instance or path to database. By default the database found in the detected Squirrel environment is used.

  • cache_path (str) – Directory path to use for data caching. By default, the 'cache' directory in the detected Squirrel environment is used.

  • persistent (str) – If given a name, create a persistent selection.

This is the central class of the Squirrel framework. It provides a unified interface to query and access seismic waveforms, station meta-data and event information from local file collections and remote data sources. For prompt responses, a profound database setup is used under the hood. To speed up assemblage of ad-hoc data selections, files are indexed on first use and the extracted meta-data is remembered in the database for subsequent accesses. Bulk data is lazily loaded from disk and remote sources, just when requested. Once loaded, data is cached in memory to expedite typical access patterns. Files and data sources can be dynamically added to and removed from the Squirrel selection at runtime.

Queries are restricted to the contents of the files currently added to the Squirrel selection (usually a subset of the file meta-information collection in the database). This list of files is referred to here as the “selection”. By default, temporary tables are created in the attached database to hold the names of the files in the selection as well as various indices and counters. These tables are only visible inside the application which created them and are deleted when the database connection is closed or the application exits. To create a selection which is not deleted at exit, supply a name to the persistent argument of the Squirrel constructor. Persistent selections are shared among applications using the same database.

Method summary

Some of the methods are implemented in Squirrel’s base class Selection.

add(paths[, kinds, format, include, ...])

Add files to the selection.

add_source(source[, check])

Add remote resource.

add_fdsn(*args, **kwargs)

Add FDSN site for transparent remote data access.

add_catalog(*args, **kwargs)

Add online catalog for transparent event data access.

add_dataset(ds[, check])

Read dataset description from file and add its contents.

add_virtual(nuts[, virtual_paths])

Add content which is not backed by files.

update([constraint])

Update or partially update channel and event inventories.

update_waveform_promises([constraint])

Permit downloading of remote waveforms.

advance_accessor([accessor_id, cache_id])

Notify memory caches about consumer moving to a new data batch.

clear_accessor(accessor_id[, cache_id])

Notify memory caches about a consumer having finished.

reload()

Check for modifications and reindex modified files.

iter_paths([raw])

Iterate over all file paths currently belonging to the selection.

iter_nuts([kind, tmin, tmax, codes, ...])

Iterate over content entities matching given constraints.

iter_kinds([codes])

Iterate over content types available in selection.

iter_deltats([kind])

Iterate over sampling intervals available in selection.

iter_codes([kind])

Iterate over content identifier code sequences available in selection.

get_paths([raw])

Get all file paths currently belonging to the selection.

get_nuts(*args, **kwargs)

Get content entities matching given constraints.

get_kinds([codes])

Get content types available in selection.

get_deltats([kind])

Get sampling intervals available in selection.

get_codes([kind])

Get identifier code sequences available in selection.

get_counts([kind])

Get number of occurrences of any (kind, codes) combination.

get_time_span([kinds, tight, dummy_limits])

Get time interval over all content in selection.

get_deltat_span(kind)

Get min and max sampling interval of all content of given kind.

get_nfiles()

Get number of files in selection.

get_nnuts()

Get number of nuts in selection.

get_total_size()

Get aggregated file size available in selection.

get_stats()

Get statistics on contents available through this selection.

get_content(nut[, cache_id, accessor_id, ...])

Get and possibly load full content for a given index entry from file.

get_stations([obj, tmin, tmax, time, codes, ...])

Get stations matching given constraints.

get_channels([obj, tmin, tmax, time, codes, ...])

Get channels matching given constraints.

get_responses([obj, tmin, tmax, time, ...])

Get instrument responses matching given constraints.

get_events([obj, tmin, tmax, time, codes])

Get events matching given constraints.

get_waveform_nuts([obj, tmin, tmax, time, ...])

Get waveform content entities matching given constraints.

get_waveforms([obj, tmin, tmax, time, ...])

Get waveforms matching given constraints.

chopper_waveforms([obj, tmin, tmax, time, ...])

Iterate window-wise over waveform archive.

get_coverage(kind[, tmin, tmax, codes, limit])

Get coverage information.

pile

Emulates the older pyrocko.pile.Pile interface.

snuffle(**kwargs)

Look at dataset in Snuffler.

glob_codes(kind, codes)

Find codes matching given patterns.

get_database()

Get the database to which this selection belongs.

print_tables([table_names, stream])

Dump raw database tables in textual form (for debugging purposes).

add(paths, kinds=None, format='detect', include=None, exclude=None, check=True)[source]

Add files to the selection.

Parameters:
  • paths (list of str) – Iterator yielding paths to files or directories to be added to the selection. Recurses into directories. If given a str, it is treated as a single path to be added.

  • kinds (list of str) – Content types to be made available through the Squirrel selection. By default, all known content types are accepted.

  • format (str) – File format identifier or 'detect' to enable auto-detection (available: 'datacube', 'mseed', 'pyrocko_events', 'pyrocko_stations', 'sac', 'spickle', 'stationxml', 'tdms_idas', 'virtual', 'yaml').

  • include – If not None, files are only included if their paths match the given regular expression pattern.

  • exclude – If not None, files are only included if their paths do not match the given regular expression pattern.

  • check (bool) – If True, all file modification times are checked to see if cached information has to be updated (slow). If False, only previously unknown files are indexed and cached information is used for known files, regardless of file state (fast, corrresponds to Squirrel’s --optimistic mode). File deletions will go undetected in the latter case.

Complexity:

O(log N)

reload()[source]

Check for modifications and reindex modified files.

Based on file modification times.

add_virtual(nuts, virtual_paths=None)[source]

Add content which is not backed by files.

Parameters:
  • nuts (iterator yielding Nut objects) – Content pieces to be added.

  • virtual_paths (list of str) – List of virtual paths to prevent creating a temporary list of the nuts while aggregating the file paths for the selection.

Stores to the main database and the selection.

add_volatile_waveforms(traces)[source]

Add in-memory waveforms which will be removed when the app closes.

add_source(source, check=True)[source]

Add remote resource.

Parameters:

source (subclass of Source) – Remote data access client instance.

add_fdsn(*args, **kwargs)[source]

Add FDSN site for transparent remote data access.

Arguments are passed to FDSNSource.

add_catalog(*args, **kwargs)[source]

Add online catalog for transparent event data access.

Arguments are passed to CatalogSource.

add_dataset(ds, check=True)[source]

Read dataset description from file and add its contents.

Parameters:
  • ds (str or Dataset) – Path to dataset description file, dataset description object or name of a built-in dataset. See dataset.

  • check (bool) – If True, all file modification times are checked to see if cached information has to be updated (slow). If False, only previously unknown files are indexed and cached information is used for known files, regardless of file state (fast, corrresponds to Squirrel’s --optimistic mode). File deletions will go undetected in the latter case.

iter_nuts(kind=None, tmin=None, tmax=None, codes=None, codes_exclude=None, sample_rate_min=None, sample_rate_max=None, naiv=False, kind_codes_ids=None, path=None, limit=None)[source]

Iterate over content entities matching given constraints.

Parameters:
  • kind (str, list of str) – Content kind (or kinds) to extract.

  • tmin (get_time_float()) – Start time of query interval.

  • tmax (get_time_float()) – End time of query interval.

  • codes (list of Codes objects appropriate for the queried content type, or anything which can be converted to such objects.) – List of code patterns to query.

  • naiv (bool) – Bypass time span lookup through indices (slow, for testing).

  • kind_codes_ids (list of int) – Kind-codes IDs of contents to be retrieved (internal use).

Yields:

Nut objects representing the intersecting content.

Complexity:

O(log N) for the time selection part due to heavy use of database indices.

Query time span is treated as a half-open interval [tmin, tmax). However, if tmin equals tmax, the edge logics are modified to closed-interval so that content intersecting with the time instant t = tmin = tmax is returned (otherwise nothing would be returned as [t, t) never matches anything).

Time spans of content entities to be matched are also treated as half open intervals, e.g. content span [0, 1) is matched by query span [0, 1) but not by [-1, 0) or [1, 2). Also here, logics are modified to closed-interval when the content time span is an empty interval, i.e. to indicate a time instant. E.g. time instant 0 is matched by [0, 1) but not by [-1, 0) or [1, 2).

get_nuts(*args, **kwargs)[source]

Get content entities matching given constraints.

Like iter_nuts() but returns results as a list.

get_time_span(kinds=None, tight=True, dummy_limits=True)[source]

Get time interval over all content in selection.

Parameters:

kinds – If not None, restrict query to given content kinds.

Complexity:

O(1), independent of the number of nuts.

Returns:

(tmin, tmax), combined time interval of queried content kinds.

has(kinds)[source]

Check availability of given content kinds.

Parameters:

kinds – Content kinds to query.

Returns:

True if any of the queried content kinds is available in the selection.

get_deltat_span(kind)[source]

Get min and max sampling interval of all content of given kind.

Parameters:

kind (str) – Content kind

Returns:

(deltat_min, deltat_max)

iter_kinds(codes=None)[source]

Iterate over content types available in selection.

Parameters:

codes (Codes) – If given, get kinds only for selected codes identifier. Only a single identifier may be given here and no pattern matching is done, currently.

Yields:

Available content kinds as str.

Complexity:

O(1), independent of number of nuts.

iter_deltats(kind=None)[source]

Iterate over sampling intervals available in selection.

Parameters:

kind (str) – If given, get sampling intervals only for a given content type.

Yields:

float values.

Complexity:

O(1), independent of number of nuts.

iter_codes(kind=None)[source]

Iterate over content identifier code sequences available in selection.

Parameters:

kind (str) – If given, get codes only for a given content type.

Yields:

tuple of str

Complexity:

O(1), independent of number of nuts.

get_kinds(codes=None)[source]

Get content types available in selection.

Parameters:

codes (Codes) – If given, get kinds only for selected codes identifier. Only a single identifier may be given here and no pattern matching is done, currently.

Returns:

Sorted list of available content types.

Return type:

py:class:list of str

Complexity:

O(1), independent of number of nuts.

get_deltats(kind=None)[source]

Get sampling intervals available in selection.

Parameters:

kind (str) – If given, get sampling intervals only for selected content type.

Complexity:

O(1), independent of number of nuts.

Returns:

Sorted list of available sampling intervals.

get_codes(kind=None)[source]

Get identifier code sequences available in selection.

Parameters:

kind (str) – If given, get codes only for selected content type.

Complexity:

O(1), independent of number of nuts.

Returns:

Sorted list of available codes as tuples of strings.

get_counts(kind=None)[source]

Get number of occurrences of any (kind, codes) combination.

Parameters:

kind (str) – If given, get codes only for selected content type.

Complexity:

O(1), independent of number of nuts.

Returns:

dict with counts[kind][codes] or counts[codes] if kind is not None

glob_codes(kind, codes)[source]

Find codes matching given patterns.

Parameters:
  • kind (str) – Content kind to be queried.

  • codes (list of Codes objects appropriate for the queried content type, or anything which can be converted to such objects.) – List of code patterns to query.

Returns:

List of matches of the form [kind_codes_id, codes, deltat].

update(constraint=None, **kwargs)[source]

Update or partially update channel and event inventories.

Parameters:
  • constraint (Constraint) – Selection of times or areas to be brought up to date.

  • **kwargs – Shortcut for setting constraint=Constraint(**kwargs).

This function triggers all attached remote sources, to check for updates in the meta-data. The sources will only submit queries when their expiration date has passed, or if the selection spans into previously unseen times or areas.

update_waveform_promises(constraint=None, **kwargs)[source]

Permit downloading of remote waveforms.

Parameters:
  • constraint (Constraint) – Remote waveforms compatible with the given constraint are enabled for download.

  • **kwargs – Shortcut for setting constraint=Constraint(**kwargs).

Calling this method permits Squirrel to download waveforms from remote sources when processing subsequent waveform requests. This works by inserting so called waveform promises into the database. It will look into the available channels for each remote source and create a promise for each channel compatible with the given constraint. If the promise then matches in a waveform request, Squirrel tries to download the waveform. If the download is successful, the downloaded waveform is added to the Squirrel and the promise is deleted. If the download fails, the promise is kept if the reason of failure looks like being temporary, e.g. because of a network failure. If the cause of failure however seems to be permanent, the promise is deleted so that no further attempts are made to download a waveform which might not be available from that server at all. To force re-scheduling after a permanent failure, call update_waveform_promises() yet another time.

remove_waveform_promises(from_database='selection')[source]

Remove waveform promises from live selection or global database.

Calling this function removes all waveform promises provided by the attached sources.

Parameters:

from_database – Remove from live selection 'selection' or global database 'global'.

get_nfiles()[source]

Get number of files in selection.

get_nnuts()[source]

Get number of nuts in selection.

get_total_size()[source]

Get aggregated file size available in selection.

get_stats()[source]

Get statistics on contents available through this selection.

check(obj=None, tmin=None, tmax=None, time=None, codes=None, ignore=[])[source]

Check for common data/metadata problems.

Parameters:
  • obj (any object with attributes tmin, tmax and codes) – Object providing tmin, tmax and codes to be used to constrain the query. Direct arguments override those from obj.

  • tmin (get_time_float()) – Start time of query interval.

  • tmax (get_time_float()) – End time of query interval.

  • time (get_time_float()) – Time instant to query. Equivalent to setting tmin and tmax to the same value.

  • codes (list of Codes objects appropriate for the queried content type, or anything which can be converted to such objects.) – Pattern of content codes to query.

  • ignore (list of str (SquirrelCheckProblemType)) – Problem types to be ignored.

Returns:

SquirrelCheck object containing the results of the check.

See do_check().

get_content(nut, cache_id='default', accessor_id='default', show_progress=False, model='squirrel')[source]

Get and possibly load full content for a given index entry from file.

Loads the actual content objects (channel, station, waveform, …) from file. For efficiency, sibling content (all stuff in the same file segment) will also be loaded as a side effect. The loaded contents are cached in the Squirrel object.

async get_contents_async(nuts: list[pyrocko.squirrel.model.Nut], cache_id='default', accessor_id='default', show_progress=False, model='squirrel')[source]

Get and possibly load full content for a given index entry from file.

Loads the actual content objects (channel, station, waveform, …) from file. For efficiency, sibling content (all stuff in the same file segment) will also be loaded as a side effect. The loaded contents are cached in the Squirrel object.

advance_accessor(accessor_id='default', cache_id=None)[source]

Notify memory caches about consumer moving to a new data batch.

Parameters:
  • accessor_id (str) – Name of accessing consumer to be advanced.

  • cache_id (str) – Name of cache to for which the accessor should be advanced. By default the named accessor is advanced in all registered caches. By default, two caches named 'default' and 'waveform' are available.

See ContentCache for details on how Squirrel’s memory caching works and can be tuned. Default behaviour is to release data when it has not been used in the latest data window/batch. If the accessor is never advanced, data is cached indefinitely - which is often desired e.g. for station meta-data. Methods for consecutive data traversal, like chopper_waveforms() automatically advance and clear their accessor.

clear_accessor(accessor_id, cache_id=None)[source]

Notify memory caches about a consumer having finished.

Parameters:
  • accessor_id (str) – Name of accessor to be cleared.

  • cache_id (str) – Name of cache for which the accessor should be cleared. By default the named accessor is cleared from all registered caches. By default, two caches named 'default' and 'waveform' are available.

Calling this method clears all references to cache entries held by the named accessor. Cache entries are then freed if not referenced by any other accessor.

get_stations(obj=None, tmin=None, tmax=None, time=None, codes=None, model='squirrel', on_error='raise')[source]

Get stations matching given constraints.

Parameters:
  • obj (any object with attributes tmin, tmax and codes) – Object providing tmin, tmax and codes to be used to constrain the query. Direct arguments override those from obj.

  • tmin (get_time_float()) – Start time of query interval.

  • tmax (get_time_float()) – End time of query interval.

  • time (get_time_float()) – Time instant to query. Equivalent to setting tmin and tmax to the same value.

  • codes (list of Codes objects appropriate for the queried content type, or anything which can be converted to such objects.) – Pattern of content codes to query.

  • model (str) – Select object model for returned values: 'squirrel' to get Squirrel station objects or 'pyrocko' to get Pyrocko station objects with channel information attached.

Returns:

List of pyrocko.squirrel.Station objects by default or list of pyrocko.model.Station objects if model='pyrocko' is requested.

See iter_nuts() for details on time span matching.

get_channels(obj=None, tmin=None, tmax=None, time=None, codes=None, model='squirrel')[source]

Get channels matching given constraints.

Parameters:
  • obj (any object with attributes tmin, tmax and codes) – Object providing tmin, tmax and codes to be used to constrain the query. Direct arguments override those from obj.

  • tmin (get_time_float()) – Start time of query interval.

  • tmax (get_time_float()) – End time of query interval.

  • time (get_time_float()) – Time instant to query. Equivalent to setting tmin and tmax to the same value.

  • codes (list of Codes objects appropriate for the queried content type, or anything which can be converted to such objects.) – Pattern of content codes to query.

Returns:

List of Channel objects.

See iter_nuts() for details on time span matching.

get_sensors(obj=None, tmin=None, tmax=None, time=None, codes=None)[source]

Get sensors matching given constraints.

Parameters:
  • obj (any object with attributes tmin, tmax and codes) – Object providing tmin, tmax and codes to be used to constrain the query. Direct arguments override those from obj.

  • tmin (get_time_float()) – Start time of query interval.

  • tmax (get_time_float()) – End time of query interval.

  • time (get_time_float()) – Time instant to query. Equivalent to setting tmin and tmax to the same value.

  • codes (list of Codes objects appropriate for the queried content type, or anything which can be converted to such objects.) – Pattern of content codes to query.

Returns:

List of Sensor objects.

See iter_nuts() for details on time span matching.

get_responses(obj=None, tmin=None, tmax=None, time=None, codes=None, model='squirrel')[source]

Get instrument responses matching given constraints.

Parameters:
  • obj (any object with attributes tmin, tmax and codes) – Object providing tmin, tmax and codes to be used to constrain the query. Direct arguments override those from obj.

  • tmin (get_time_float()) – Start time of query interval.

  • tmax (get_time_float()) – End time of query interval.

  • time (get_time_float()) – Time instant to query. Equivalent to setting tmin and tmax to the same value.

  • codes (list of Codes objects appropriate for the queried content type, or anything which can be converted to such objects.) – Pattern of content codes to query.

  • model (str) – Select data model for returned objects. Choices: 'squirrel', 'stationxml', 'stationxml+'. See return value description.

Returns:

List of Response if model == 'squirrel' or list of FDSNStationXML if model == 'stationxml' or list of (Response, FDSNStationXML) if model == 'stationxml+'.

See iter_nuts() for details on time span matching.

get_response(obj=None, tmin=None, tmax=None, time=None, codes=None, model='squirrel', on_duplicate='raise')[source]

Get instrument response matching given constraints.

Parameters:
  • obj (any object with attributes tmin, tmax and codes) – Object providing tmin, tmax and codes to be used to constrain the query. Direct arguments override those from obj.

  • tmin (get_time_float()) – Start time of query interval.

  • tmax (get_time_float()) – End time of query interval.

  • time (get_time_float()) – Time instant to query. Equivalent to setting tmin and tmax to the same value.

  • codes (list of Codes objects appropriate for the queried content type, or anything which can be converted to such objects.) – Pattern of content codes to query.

  • model (str) – Select data model for returned object. Choices: 'squirrel', 'stationxml', 'stationxml+'. See return value description.

  • on_duplicate (str) – Determines how duplicates/multiple matching responses are handled. Choices: 'raise' - raise Duplicate, 'warn' - emit a warning and return first match, 'ignore' - silently return first match.

Returns:

Response if model == 'squirrel' or FDSNStationXML if model == 'stationxml' or (Response, FDSNStationXML) if model == 'stationxml+'.

Same as get_responses() but returning exactly one response. Raises NotAvailable if none is available. Duplicates are handled according to the on_duplicate argument.

See iter_nuts() for details on time span matching.

get_events(obj=None, tmin=None, tmax=None, time=None, codes=None)[source]

Get events matching given constraints.

Parameters:
  • obj (any object with attributes tmin, tmax and codes) – Object providing tmin, tmax and codes to be used to constrain the query. Direct arguments override those from obj.

  • tmin (get_time_float()) – Start time of query interval.

  • tmax (get_time_float()) – End time of query interval.

  • time (get_time_float()) – Time instant to query. Equivalent to setting tmin and tmax to the same value.

  • codes (list of Codes objects appropriate for the queried content type, or anything which can be converted to such objects.) – Pattern of content codes to query.

Returns:

List of Event objects.

See iter_nuts() for details on time span matching.

get_waveform_nuts(obj=None, tmin=None, tmax=None, time=None, codes=None, codes_exclude=None, sample_rate_min=None, sample_rate_max=None, order_only=False)[source]

Get waveform content entities matching given constraints.

Parameters:
  • obj (any object with attributes tmin, tmax and codes) – Object providing tmin, tmax and codes to be used to constrain the query. Direct arguments override those from obj.

  • tmin (get_time_float()) – Start time of query interval.

  • tmax (get_time_float()) – End time of query interval.

  • time (get_time_float()) – Time instant to query. Equivalent to setting tmin and tmax to the same value.

  • codes (list of Codes objects appropriate for the queried content type, or anything which can be converted to such objects.) – Pattern of content codes to query.

Like get_nuts() with kind='waveform' but additionally resolves matching waveform promises (downloads waveforms from remote sources).

See iter_nuts() for details on time span matching.

have_waveforms(obj=None, tmin=None, tmax=None, time=None, codes=None)[source]

Check if any waveforms or waveform promises are available for given constraints.

Parameters:
  • obj (any object with attributes tmin, tmax and codes) – Object providing tmin, tmax and codes to be used to constrain the query. Direct arguments override those from obj.

  • tmin (get_time_float()) – Start time of query interval.

  • tmax (get_time_float()) – End time of query interval.

  • time (get_time_float()) – Time instant to query. Equivalent to setting tmin and tmax to the same value.

  • codes (list of Codes objects appropriate for the queried content type, or anything which can be converted to such objects.) – Pattern of content codes to query.

get_waveforms(obj=None, tmin=None, tmax=None, time=None, codes=None, codes_exclude=None, sample_rate_min=None, sample_rate_max=None, uncut=False, want_incomplete=True, degap=True, maxgap=5, maxlap=None, snap=None, include_last=False, load_data=True, accessor_id='default', operator_params=None, order_only=False, channel_priorities=None)[source]

Get waveforms matching given constraints.

Parameters:
  • obj (any object with attributes tmin, tmax and codes) – Object providing tmin, tmax and codes to be used to constrain the query. Direct arguments override those from obj.

  • tmin (get_time_float()) – Start time of query interval.

  • tmax (get_time_float()) – End time of query interval.

  • time (get_time_float()) – Time instant to query. Equivalent to setting tmin and tmax to the same value.

  • codes (list of Codes objects appropriate for the queried content type, or anything which can be converted to such objects.) – Pattern of content codes to query.

  • sample_rate_min (float) – Consider only waveforms with a sampling rate equal to or greater than the given value [Hz].

  • sample_rate_max (float) – Consider only waveforms with a sampling rate equal to or less than the given value [Hz].

  • uncut (bool) – Set to True, to disable cutting traces to [tmin, tmax] and to disable degapping/deoverlapping. Returns untouched traces as they are read from file segment. File segments are always read in their entirety.

  • want_incomplete (bool) – If True, gappy/incomplete traces are included in the result.

  • degap (bool) – If True, connect traces and remove gaps and overlaps.

  • maxgap (int) – Maximum gap size in samples which is filled with interpolated samples when degap is True.

  • maxlap (int) – Maximum overlap size in samples which is removed when degap is True.

  • snap (tuple of 2 callables) – Rounding functions used when computing sample index from time instance, for trace start and trace end, respectively. By default, (round, round) is used.

  • include_last (bool) – If True, add one more sample to the returned traces (the sample which would be the first sample of a query with tmin set to the current value of tmax).

  • load_data (bool) – If True, waveform data samples are read from files (or cache). If False, meta-information-only traces are returned (dummy traces with no data samples).

  • accessor_id (str) – Name of consumer on who’s behalf data is accessed. Used in cache management (see cache). Used as a key to distinguish different points of extraction for the decision of when to release cached waveform data. Should be used when data is alternately extracted from more than one region / selection.

  • channel_priorities (list of str) – List of band/instrument code combinations to try. For example, giving ['HH', 'BH'] would first try to get HH? channels and then fallback to BH? if these are not available. The first matching waveforms are returned. Use in combination with sample_rate_min and sample_rate_max to constrain the sample rate.

See iter_nuts() for details on time span matching.

Loaded data is kept in memory (at least) until clear_accessor() has been called or advance_accessor() has been called two consecutive times without data being accessed between the two calls (by this accessor). Data may still be further kept in the memory cache if held alive by consumers with a different accessor_id.

async get_waveforms_async(obj=None, tmin=None, tmax=None, time=None, codes=None, codes_exclude=None, sample_rate_min=None, sample_rate_max=None, uncut=False, want_incomplete=True, degap=True, maxgap=5, maxlap=None, snap=None, include_last=False, load_data=True, accessor_id='default', operator_params=None, order_only=False, channel_priorities=None)[source]

Get waveforms matching given constraints.

Parameters:
  • obj (any object with attributes tmin, tmax and codes) – Object providing tmin, tmax and codes to be used to constrain the query. Direct arguments override those from obj.

  • tmin (get_time_float()) – Start time of query interval.

  • tmax (get_time_float()) – End time of query interval.

  • time (get_time_float()) – Time instant to query. Equivalent to setting tmin and tmax to the same value.

  • codes (list of Codes objects appropriate for the queried content type, or anything which can be converted to such objects.) – Pattern of content codes to query.

  • sample_rate_min (float) – Consider only waveforms with a sampling rate equal to or greater than the given value [Hz].

  • sample_rate_max (float) – Consider only waveforms with a sampling rate equal to or less than the given value [Hz].

  • uncut (bool) – Set to True, to disable cutting traces to [tmin, tmax] and to disable degapping/deoverlapping. Returns untouched traces as they are read from file segment. File segments are always read in their entirety.

  • want_incomplete (bool) – If True, gappy/incomplete traces are included in the result.

  • degap (bool) – If True, connect traces and remove gaps and overlaps.

  • maxgap (int) – Maximum gap size in samples which is filled with interpolated samples when degap is True.

  • maxlap (int) – Maximum overlap size in samples which is removed when degap is True.

  • snap (tuple of 2 callables) – Rounding functions used when computing sample index from time instance, for trace start and trace end, respectively. By default, (round, round) is used.

  • include_last (bool) – If True, add one more sample to the returned traces (the sample which would be the first sample of a query with tmin set to the current value of tmax).

  • load_data (bool) – If True, waveform data samples are read from files (or cache). If False, meta-information-only traces are returned (dummy traces with no data samples).

  • accessor_id (str) – Name of consumer on who’s behalf data is accessed. Used in cache management (see cache). Used as a key to distinguish different points of extraction for the decision of when to release cached waveform data. Should be used when data is alternately extracted from more than one region / selection.

  • channel_priorities (list of str) – List of band/instrument code combinations to try. For example, giving ['HH', 'BH'] would first try to get HH? channels and then fallback to BH? if these are not available. The first matching waveforms are returned. Use in combination with sample_rate_min and sample_rate_max to constrain the sample rate.

See iter_nuts() for details on time span matching.

Loaded data is kept in memory (at least) until clear_accessor() has been called or advance_accessor() has been called two consecutive times without data being accessed between the two calls (by this accessor). Data may still be further kept in the memory cache if held alive by consumers with a different accessor_id.

chopper_waveforms(obj=None, tmin=None, tmax=None, time=None, codes=None, codes_exclude=None, sample_rate_min=None, sample_rate_max=None, tinc=None, tpad=0.0, want_incomplete=True, snap_window=False, degap=True, maxgap=5, maxlap=None, snap=None, include_last=False, load_data=True, accessor_id=None, clear_accessor=True, operator_params=None, grouping=None, channel_priorities=None)[source]

Iterate window-wise over waveform archive.

Parameters:
  • obj (any object with attributes tmin, tmax and codes) – Object providing tmin, tmax and codes to be used to constrain the query. Direct arguments override those from obj.

  • tmin (get_time_float()) – Start time of query interval.

  • tmax (get_time_float()) – End time of query interval.

  • time (get_time_float()) – Time instant to query. Equivalent to setting tmin and tmax to the same value.

  • codes (list of Codes objects appropriate for the queried content type, or anything which can be converted to such objects.) – Pattern of content codes to query.

  • tinc (get_time_float()) – Time increment (window shift time) (default uses tmax-tmin).

  • tpad (get_time_float()) – Padding time appended on either side of the data window (window overlap is 2*tpad).

  • want_incomplete (bool) – If True, gappy/incomplete traces are included in the result.

  • snap_window (bool) – If True, start time windows at multiples of tinc with respect to system time zero.

  • degap (bool) – If True, connect traces and remove gaps and overlaps.

  • maxgap (int) – Maximum gap size in samples which is filled with interpolated samples when degap is True.

  • maxlap (int) – Maximum overlap size in samples which is removed when degap is True.

  • snap (tuple of 2 callables) – Rounding functions used when computing sample index from time instance, for trace start and trace end, respectively. By default, (round, round) is used.

  • include_last (bool) – If True, add one more sample to the returned traces (the sample which would be the first sample of a query with tmin set to the current value of tmax).

  • load_data (bool) – If True, waveform data samples are read from files (or cache). If False, meta-information-only traces are returned (dummy traces with no data samples).

  • accessor_id (str) – Name of consumer on who’s behalf data is accessed. Used in cache management (see cache). Used as a key to distinguish different points of extraction for the decision of when to release cached waveform data. Should be used when data is alternately extracted from more than one region / selection.

  • clear_accessor (bool) – If True (default), clear_accessor() is called when the chopper finishes. Set to False to keep loaded waveforms in memory when the generator returns.

  • grouping (Grouping) – By default, traversal over the data is over time and all matching traces of a time window are yielded. Using this option, it is possible to traverse the data first by group (e.g. station or network) and second by time. This can reduce the number of traces in each batch and thus reduce the memory footprint of the process.

Yields:

For each extracted time window or waveform group a Batch object is yielded.

See iter_nuts() for details on time span matching.

async chopper_waveforms_async(obj=None, tmin=None, tmax=None, time=None, codes=None, codes_exclude=None, sample_rate_min=None, sample_rate_max=None, tinc=None, tpad=0.0, want_incomplete=True, snap_window=False, degap=True, maxgap=5, maxlap=None, snap=None, include_last=False, load_data=True, accessor_id=None, clear_accessor=True, operator_params=None, grouping=None, channel_priorities=None)[source]

Iterate window-wise over waveform archive.

Parameters:
  • obj (any object with attributes tmin, tmax and codes) – Object providing tmin, tmax and codes to be used to constrain the query. Direct arguments override those from obj.

  • tmin (get_time_float()) – Start time of query interval.

  • tmax (get_time_float()) – End time of query interval.

  • time (get_time_float()) – Time instant to query. Equivalent to setting tmin and tmax to the same value.

  • codes (list of Codes objects appropriate for the queried content type, or anything which can be converted to such objects.) – Pattern of content codes to query.

  • tinc (get_time_float()) – Time increment (window shift time) (default uses tmax-tmin).

  • tpad (get_time_float()) – Padding time appended on either side of the data window (window overlap is 2*tpad).

  • want_incomplete (bool) – If True, gappy/incomplete traces are included in the result.

  • snap_window (bool) – If True, start time windows at multiples of tinc with respect to system time zero.

  • degap (bool) – If True, connect traces and remove gaps and overlaps.

  • maxgap (int) – Maximum gap size in samples which is filled with interpolated samples when degap is True.

  • maxlap (int) – Maximum overlap size in samples which is removed when degap is True.

  • snap (tuple of 2 callables) – Rounding functions used when computing sample index from time instance, for trace start and trace end, respectively. By default, (round, round) is used.

  • include_last (bool) – If True, add one more sample to the returned traces (the sample which would be the first sample of a query with tmin set to the current value of tmax).

  • load_data (bool) – If True, waveform data samples are read from files (or cache). If False, meta-information-only traces are returned (dummy traces with no data samples).

  • accessor_id (str) – Name of consumer on who’s behalf data is accessed. Used in cache management (see cache). Used as a key to distinguish different points of extraction for the decision of when to release cached waveform data. Should be used when data is alternately extracted from more than one region / selection.

  • clear_accessor (bool) – If True (default), clear_accessor() is called when the chopper finishes. Set to False to keep loaded waveforms in memory when the generator returns.

  • grouping (Grouping) – By default, traversal over the data is over time and all matching traces of a time window are yielded. Using this option, it is possible to traverse the data first by group (e.g. station or network) and second by time. This can reduce the number of traces in each batch and thus reduce the memory footprint of the process.

Yields:

For each extracted time window or waveform group a Batch object is yielded.

See iter_nuts() for details on time span matching.

property pile

Emulates the older pyrocko.pile.Pile interface.

This property exposes a pyrocko.squirrel.pile.Pile object, which emulates most of the older pyrocko.pile.Pile methods but uses the fluffy power of the Squirrel under the hood.

This interface can be used as a drop-in replacement for piles which are used in existing scripts and programs for efficient waveform data access. The Squirrel-based pile scales better for large datasets. Newer scripts should use Squirrel’s native methods to avoid the emulation overhead.

snuffle(**kwargs)[source]

Look at dataset in Snuffler.

get_coverage(kind, tmin=None, tmax=None, codes=None, limit=None)[source]

Get coverage information.

Get information about strips of gapless data coverage.

Parameters:
  • kind (str) – Content kind to be queried.

  • tmin (get_time_float()) – Start time of query interval.

  • tmax (get_time_float()) – End time of query interval.

  • codes (list of Codes objects appropriate for the queried content type, or anything which can be converted to such objects.) – If given, restrict query to given content codes patterns.

  • limit (int) – Limit query to return only up to a given maximum number of entries per matching time series (without setting this option, very gappy data could cause the query to execute for a very long time).

Returns:

Information about time spans covered by the requested time series data.

Return type:

list of Coverage

get_stationxml(obj=None, tmin=None, tmax=None, time=None, codes=None, level='response', on_error='raise')[source]

Get station/channel/response metadata in StationXML representation.

%(query_args)s

Returns:

FDSNStationXML object.

print_tables(table_names=None, stream=None)[source]

Dump raw database tables in textual form (for debugging purposes).

Parameters:
  • table_names (list of str) – Names of tables to be dumped or None to dump all.

  • stream – Open file or None to dump to standard output.

class SquirrelStats(**kwargs)[source]

Bases: Object

Container to hold statistics about contents available from a Squirrel.

See also Squirrel.get_stats().

nfiles

int

Number of files in selection.

nnuts

int

Number of index nuts in selection.

codes

list of tuple of str objects objects, default: []

Available code sequences in selection, e.g. (agency, network, station, location) for stations nuts.

kinds

list of str objects, default: []

Available content types in selection.

total_size

int

Aggregated file size of files is selection.

counts

dict of dict of int objects objects, default: {}

Breakdown of how many nuts of any content type and code sequence are available in selection, counts[kind][codes].

time_spans

dict of tuple of pyrocko.util.get_time_float (pyrocko.guts.Timestamp) objects objects, default: {}

Time spans by content type.

sources

list of str objects, default: []

Descriptions of attached sources.

operators

list of str objects, default: []

Descriptions of attached operators.