squirrel

Prompt seismological data access with a fluffy tail.

Usage

from pyrocko import squirrel as psq

sq = psq.Squirrel()
sq.add(files)

Concepts

  • squirrel
  • nut
  • database

Reference

class Selection(database=None, persistent=None)[source]

Bases: object

Database backed file selection.

Parameters:
  • databaseDatabase object or path to database or None for user’s default database
  • persistent (str) – if given a name, create a persistent selection

By default, a temporary table in the database is created to hold the names of the files in the selection. This table is only visible inside the application which created it. If a name is given to persistent, a named selection is created, which is visible also in other applications using the same database. Paths of files can be added to the selection using the add() method.

get_database()[source]

Get the database to which this selection belongs.

Returns:Database object
add(file_paths)[source]

Add files to the selection.

Parameters:file_paths (iterator yielding str objects) – Paths to files to be added to the selection.
remove(file_paths)[source]

Remove files from the selection.

Parameters:file_paths (list of str) – Paths to files to be removed from the selection.
set_file_states_known()[source]

Set file states to “known” (2).

undig_grouped(skip_unchanged=False)[source]

Get content inventory of all files in selection.

Param:skip_unchanged: if True only inventory of modified files is yielded (flag_modified must be called beforehand).

This generator yields tuples (path, nuts) where path is the path to the file and nuts is a list of pyrocko.squirrel.Nut objects representing the contents of the file.

flag_modified(check=True)[source]

Mark files which have been modified.

Parameters:check – if True query modification times of known files on disk. If False, only flag unknown files.

Assumes file state is 0 for newly added files, 1 for files added again to the selection (forces check), or 2 for all others (no checking is done for those).

Sets file state to 0 for unknown or modified files, 2 for known and not modified files.

class Squirrel(database=None, persistent=None)[source]

Bases: pyrocko.squirrel.base.Selection

Prompt, lazy, indexing, caching, dynamic seismological dataset access.

Parameters:
  • databaseDatabase object or path to database or None for user’s default database
  • persistent (str) – if given a name, create a persistent selection

By default, temporary tables are created in the attached database to hold the names of the files in the selection as well as various indices and counters. These tables are only visible inside the application which created it. If a name is given to persistent, a named selection is created, which is visible also in other applications using the same database. Paths of files can be added to the selection using the add() method.

add(file_paths, kinds=None, format='detect', check=True)[source]

Add files to the selection.

Parameters:
  • file_paths – iterator yielding paths to files or directories to be added to the selection. Recurses into directories given. If given a str, it is treated as a single path to be added.
  • kinds (list of str) – if given, allowed content types to be made available through the squirrel selection.
  • format (str) – file format identifier or 'detect' for auto-detection

Complexity: O(log N)

add_virtual(nuts, virtual_file_paths=None)[source]

Add content which is not backed by files.

Stores to the main database and the selection.

If virtual_file_paths are given, this prevents creating a temp list of the nuts while aggregating the file paths for the selection.

add_source(source)[source]

Add remote resource.

add_fdsn(*args, **kwargs)[source]

Add FDSN site for transparent remote data access.

add_catalog(*args, **kwargs)[source]

Add online catalog for transparent event data access.

get_nuts(kind, tmin=None, tmax=None, codes=None)[source]

Iterate content intersecting with the half open interval [tmin, tmax[.

Parameters:
  • kindstr, content kind to extract
  • tmin – timestamp, start time of interval
  • tmax – timestamp, end time of interval
  • codes – tuple of str, pattern of content codes to be matched

Complexity: O(log N)

Yields pyrocko.squirrel.Nut objects representing the intersecting content.

get_time_span()[source]

Get time interval over all content in selection.

Complexity O(1), independent of number of nuts

Returns:(tmin, tmax)
iter_kinds(codes=None)[source]

Iterate over content types available in selection.

Parameters:codes – if given, get kinds only for selected codes identifier

Complexity: O(1), independent of number of nuts

iter_codes(kind=None)[source]

Iterate over content identifier code sequences available in selection.

Parameters:kind (str) – if given, get codes only for a given content type

Complexity: O(1), independent of number of nuts

iter_counts(kind=None)[source]

Iterate over number of occurrences of any (kind, codes) combination.

Parameters:kind – if given, get counts only for selected content type

Yields tuples ((kind, codes), count)

Complexity: O(1), independent of number of nuts

get_kinds(codes=None)[source]

Get content types available in selection.

Parameters:codes – if given, get kinds only for selected codes identifier

Complexity: O(1), independent of number of nuts

Returns:sorted list of available content types
get_codes(kind=None)[source]

Get identifier code sequences available in selection.

Parameters:kind – if given, get codes only for selected content type

Complexity: O(1), independent of number of nuts

Returns:sorted list of available codes
get_counts(kind=None)[source]

Get number of occurrences of any (kind, codes) combination.

Parameters:kind – if given, get codes only for selected content type

Complexity: O(1), independent of number of nuts

Returns:dict with counts[kind][codes] or ``counts[codes] if kind is not None
update(constraint=None, **kwargs)[source]

Update inventory of remote content for a given selection.

This function triggers all attached remote sources, to check for updates in the metadata. The sources will only submit queries when their expiration date has passed, or if the selection spans into previously unseen times or areas.

get_nfiles()[source]

Get number of files in selection.

get_nnuts()[source]

Get number of nuts in selection.

get_total_size()[source]

Get aggregated file size available in selection.

get_stats()[source]

Get statistics on contents available through this selection.

get_content(nut)[source]

Get and possibly load full content for a given index entry from file.

Loads the actual content objects (channel, station, waveform, …) from file. For efficiency sibling content (all stuff in the same file segment) will also be loaded as a side effect. The loaded contents are cached in the squirrel object.

class SquirrelStats(**kwargs)[source]

Bases: pyrocko.guts.Object

Container to hold statistics about contents available through a squirrel.

nfiles

int

number of files in selection

nnuts

int

number of index nuts in selection

codes

list of tuple of str objects objects, default: []

available code sequences in selection, e.g. (agency, network, station, location) for stations nuts.

kinds

list of str objects, default: []

available content types in selection

total_size

int

aggregated file size of files is selection

counts

dict of dict of int objects objects, default: {}

breakdown of how many nuts of any content type and code sequence are available in selection, counts[kind][codes]

tmin

builtins.float (pyrocko.guts.Timestamp), optional

earliest start time of all nuts in selection

tmax

builtins.float (pyrocko.guts.Timestamp), optional

latest end time of all nuts in selection

class Database(database_path=':memory:', log_statements=False)[source]

Bases: object

Shared meta-information database used by squirrel.

dig(nuts)[source]

Store or update content meta-information.

Given nuts are assumed to represent an up-to-date and complete inventory of a set of files. Any old information about these files is first pruned from the database (via database triggers). If such content is part of a live selection, it is also removed there. Then the new content meta-information is inserted into the main database. The content is not automatically inserted into the live selections again. It is in the responsibility of the selection object to perform this step.

remove(path)[source]

Prune content meta-inforamation about a given file.

All content pieces belonging to file path are removed from the main database and any attached live selections (via database triggers).

reset(path)[source]

Prune information associated with a given file, but keep the file path.

This method is called when reading a file failed. File attributes, format, size and modification time are set to NULL. File content meta-information is removed from the database and any attached live selections (via database triggers).

class DatabaseStats(**kwargs)[source]

Bases: pyrocko.guts.Object

Container to hold statistics about contents cached in meta-information db.

nfiles

int

nnuts

int

codes

list of list of str objects objects, default: []

kinds

list of str objects, default: []

total_size

int

counts

dict of dict of int objects objects, default: {}

class Content(**kwargs)[source]

Bases: pyrocko.guts.Object

Base class for content types in the Squirrel framework.

class Waveform(**kwargs)[source]

Bases: pyrocko.squirrel.model.Content

A continuous seismic waveform snippet.

agency

str, default: ''

Agency code (2-5)

network

str, default: ''

Deployment/network code (1-8)

station

str, default: ''

Station code (1-5)

location

str, default: ''

Location code (0-2)

channel

str, default: ''

Channel code (3)

extra

str, default: ''

Extra/custom code

tmin

builtins.float (pyrocko.guts.Timestamp)

tmax

builtins.float (pyrocko.guts.Timestamp)

deltat

float, optional

data

numpy.ndarray (pyrocko.guts_array.Array)

numpy array with data samples

class WaveformPromise(**kwargs)[source]

Bases: pyrocko.squirrel.model.Content

Information about a waveform potentially available at a remote site.

agency

str, default: ''

Agency code (2-5)

network

str, default: ''

Deployment/network code (1-8)

station

str, default: ''

Station code (1-5)

location

str, default: ''

Location code (0-2)

channel

str, default: ''

Channel code (3)

extra

str, default: ''

Extra/custom code

tmin

builtins.float (pyrocko.guts.Timestamp)

tmax

builtins.float (pyrocko.guts.Timestamp)

deltat

float, optional

source_hash

str

class Station(**kwargs)[source]

Bases: pyrocko.squirrel.model.Content

A seismic station.

agency

str, default: ''

Agency code (2-5)

network

str, default: ''

Deployment/network code (1-8)

station

str, default: ''

Station code (1-5)

location

str, optional, default: ''

Location code (0-2)

tmin

builtins.float (pyrocko.guts.Timestamp), optional

tmax

builtins.float (pyrocko.guts.Timestamp), optional

lat

float

lon

float

elevation

float, optional

depth

float, optional

description

str, optional

class Channel(**kwargs)[source]

Bases: pyrocko.squirrel.model.Content

A channel of a seismic station.

agency

str, default: ''

Agency code (2-5)

network

str, default: ''

Deployment/network code (1-8)

station

str, default: ''

Station code (1-5)

location

str, default: ''

Location code (0-2)

channel

str, default: ''

Channel code (3)

tmin

builtins.float (pyrocko.guts.Timestamp), optional

tmax

builtins.float (pyrocko.guts.Timestamp), optional

lat

float

lon

float

elevation

float, optional

depth

float, optional

dip

float, optional

azimuth

float, optional

deltat

float, optional

class Nut(file_path=None, file_format=None, file_mtime=None, file_size=None, file_segment=None, file_element=None, kind_id=0, codes='', tmin_seconds=None, tmin_offset=0.0, tmax_seconds=None, tmax_offset=0.0, deltat=None, content=None, tmin=None, tmax=None, values_nocheck=None)[source]

Bases: pyrocko.guts.Object

Index entry referencing an elementary piece of content.

So-called nuts are used in Pyrocko’s Squirrel framework to hold common meta-information about individual pieces of waveforms, stations, channels, etc. together with the information where it was found or generated.

file_path

str, optional

file_format

str, optional

file_mtime

builtins.float (pyrocko.guts.Timestamp), optional

file_size

int, optional

file_segment

int, optional

file_element

int, optional

kind_id

int

codes

str

tmin_seconds

builtins.float (pyrocko.guts.Timestamp)

tmin_offset

float, optional, default: 0.0

tmax_seconds

builtins.float (pyrocko.guts.Timestamp)

tmax_offset

float, optional, default: 0.0

deltat

float, optional

content

Content, optional

iload(paths, segment=None, format='detect', database=None, check=True, commit=True, skip_unchanged=False, content=['waveform', 'station', 'channel', 'response', 'event'])[source]

Iteratively load content or index/reindex meta-information from files.

Parameters:
  • paths – iterator yielding file names to load from or pyrocko.squirrel.Selection object
  • segment (str) – file-specific segment identifier (con only be used when loading from a single file.
  • format (str) – file format identifier or 'detect' for autodetection
  • database (pyrocko.squirrel.Database) – database to use for meta-information caching
  • check (bool) – if True, investigate modification time and file sizes of known files to debunk modified files (pessimistic mode), or False to deactivate checks (optimistic mode)
  • commit (bool) – flag, whether to commit updated information to the meta-information database
  • skip_unchanged (bool) – if True, only yield index nuts for new / modified files
  • content – list of strings, selection of content types to load

This generator yields pyrocko.squirrel.Nut objects for individual pieces of information found when reading the given files. Such a nut may represent a waveform, a station, a channel, an event or other data type. The nut itself only contains the meta-information. The actual content information is attached to the nut if requested. All nut meta-information is stored in the squirrel meta-information database. If possible, this function avoids accessing the actual disk files and provides the requested information straight from the database. Modified files are recognized and reindexed as needed.

detect_format(path)[source]

Determine file type from first 512 bytes.

Parameters:path (str) – path of file
get_backend(fmt)[source]

Get squirrel io backend module for a given file format.

Params str fmt:format identifier
exception FormatDetectionFailed(path)[source]

Bases: pyrocko.io.io_common.FileLoadError

Exception raised when file format detection fails.

exception UnknownFormat(format)[source]

Bases: Exception

Exception raised when user requests an unknown file format.

class Constraint(**kwargs)[source]

Bases: pyrocko.guts.Object

Undocumented.

tmin

builtins.float (pyrocko.guts.Timestamp), optional

tmax

builtins.float (pyrocko.guts.Timestamp), optional

contains(constraint)[source]

Check if the constraint completely includes a more restrictive one.

expand(constraint)[source]

Widen constraint to include another given constraint.