squirrel
¶
Prompt seismological data access with a fluffy tail.
Usage¶
from pyrocko import squirrel as psq
sq = psq.Squirrel()
sq.add(files)
Concepts¶
- squirrel
- nut
- database
Reference¶
-
class
Selection
(database, persistent=None)[source]¶ Bases:
object
Database backed file selection.
Parameters: By default, a temporary table in the database is created to hold the names of the files in the selection. This table is only visible inside the application which created it. If a name is given to
persistent
, a named selection is created, which is visible also in other applications using the same database. Paths of files can be added to the selection using theadd()
method.-
silent_touch
(file_path)[source]¶ Update modification time of file without initiating reindexing.
Useful to prolong validity period of data with expiration date.
-
add
(file_paths, kind_mask=255, format='detect')[source]¶ Add files to the selection.
Parameters: file_paths (iterator yielding str
objects) – Paths to files to be added to the selection.
-
remove
(file_paths)[source]¶ Remove files from the selection.
Parameters: file_paths ( list
ofstr
) – Paths to files to be removed from the selection.
-
undig_grouped
(skip_unchanged=False)[source]¶ Get content inventory of all files in selection.
Param: skip_unchanged: if True
only inventory of modified files is yielded (_flag_modified()
must be called beforehand).This generator yields tuples
((format, path), nuts)
wherepath
is the path to the file,format
is the format assignation or'detect'
andnuts
is a list ofpyrocko.squirrel.Nut
objects representing the contents of the file.
-
-
class
Squirrel
(env=None, database=None, cache_path=None, persistent=None)[source]¶ Bases:
pyrocko.squirrel.base.Selection
Prompt, lazy, indexing, caching, dynamic seismological dataset access.
Parameters: By default, temporary tables are created in the attached database to hold the names of the files in the selection as well as various indices and counters. These tables are only visible inside the application which created it. If a name is given to
persistent
, a named selection is created, which is visible also in other applications using the same database. Paths of files can be added to the selection using theadd()
method.-
add
(file_paths, kinds=None, format='detect', check=True)[source]¶ Add files to the selection.
Parameters: - file_paths – iterator yielding paths to files or directories to be added to the selection. Recurses into directories given. If given a str, it is treated as a single path to be added.
- kinds (
list
ofstr
) – if given, allowed content types to be made available through the squirrel selection. - format (str) – file format identifier or
'detect'
for auto-detection
Complexity: O(log N)
-
add_virtual
(nuts, virtual_file_paths=None)[source]¶ Add content which is not backed by files.
Stores to the main database and the selection.
If
virtual_file_paths
are given, this prevents creating a temp list of the nuts while aggregating the file paths for the selection.
-
iter_nuts
(kind=None, tmin=None, tmax=None, codes=None, naiv=False, kind_codes_ids=None)[source]¶ Iterate content matching given contraints.
Parameters: - kind –
str
, content kind to extract or sequence of such - tmin – timestamp, start time of interval
- tmax – timestamp, end time of interval
- codes – tuple of str, pattern of content codes to be matched
Complexity: O(log N) for the time selection part due to heavy use of database indexes.
Yields
pyrocko.squirrel.Nut
objects representing the intersecting content.Query time span is treated as a half-open interval
[tmin, tmax)
. However, iftmin
equalstmax
, the edge logics are modified to closed-interval so that content intersecting with the time instantt = tmin = tmax
is returned (otherwise nothing would be returned as[t, t)
never matches anything).Content time spans are also treated as half open intervals, e.g. content span
[0, 1)
is matched by query span[0, 1)
but not by[-1, 0)
or[1, 2)
. Also here, logics are modified to closed-interval when the content time span is an empty interval, i.e. to indicate a time instant. E.g. time instant 0 is matched by[0, 1)
but not by[-1, 0)
or[1, 2)
.- kind –
-
get_time_span
()[source]¶ Get time interval over all content in selection.
Complexity O(1), independent of number of nuts
Returns: (tmin, tmax)
-
get_deltat_span
(kind)[source]¶ Get min and max sampling interval of all waveform contents.
Returns: (deltat_min, deltat_max)
-
iter_kinds
(codes=None)[source]¶ Iterate over content types available in selection.
Parameters: codes – if given, get kinds only for selected codes identifier Complexity: O(1), independent of number of nuts
-
iter_deltats
(kind=None)[source]¶ Iterate over sampling intervals available in selection.
Parameters: kind ( str
) – if given, get sampling intervals only for a given content typeComplexity: O(1), independent of number of nuts
-
iter_codes
(kind=None)[source]¶ Iterate over content identifier code sequences available in selection.
Parameters: kind ( str
) – if given, get codes only for a given content typeComplexity: O(1), independent of number of nuts
-
iter_counts
(kind=None)[source]¶ Iterate over number of occurrences of any (kind, codes) combination.
Parameters: kind – if given, get counts only for selected content type Yields tuples
((kind, codes), count)
Complexity: O(1), independent of number of nuts
-
get_kinds
(codes=None)[source]¶ Get content types available in selection.
Parameters: codes – if given, get kinds only for selected codes identifier Complexity: O(1), independent of number of nuts
Returns: sorted list of available content types
-
get_deltats
(kind=None)[source]¶ Get sampling intervals available in selection.
Parameters: kind – if given, get codes only for selected content type Complexity: O(1), independent of number of nuts
Returns: sorted list of available sampling intervals
-
get_codes
(kind=None)[source]¶ Get identifier code sequences available in selection.
Parameters: kind – if given, get codes only for selected content type Complexity: O(1), independent of number of nuts
Returns: sorted list of available codes as tuples of strings
-
get_counts
(kind=None)[source]¶ Get number of occurrences of any (kind, codes) combination.
Parameters: kind – if given, get codes only for selected content type Complexity: O(1), independent of number of nuts
Returns: dict
withcounts[kind][codes] or ``counts[codes]
if kind is notNone
-
update
(constraint=None, **kwargs)[source]¶ Update inventory of remote content for a given selection.
This function triggers all attached remote sources, to check for updates in the metadata. The sources will only submit queries when their expiration date has passed, or if the selection spans into previously unseen times or areas.
-
get_content
(nut, cache='default', accessor='default')[source]¶ Get and possibly load full content for a given index entry from file.
Loads the actual content objects (channel, station, waveform, …) from file. For efficiency sibling content (all stuff in the same file segment) will also be loaded as a side effect. The loaded contents are cached in the squirrel object.
-
-
class
SquirrelStats
(**kwargs)[source]¶ Bases:
pyrocko.guts.Object
Container to hold statistics about contents available through a squirrel.
-
♦
nfiles
¶ int
number of files in selection
-
♦
nnuts
¶ int
number of index nuts in selection
-
♦
codes
¶ list
oftuple
ofstr
objects objects, default:[]
available code sequences in selection, e.g. (agency, network, station, location) for stations nuts.
-
♦
kinds
¶ list
ofstr
objects, default:[]
available content types in selection
-
♦
total_size
¶ int
aggregated file size of files is selection
-
♦
counts
¶ dict
ofdict
ofint
objects objects, default:{}
breakdown of how many nuts of any content type and code sequence are available in selection,
counts[kind][codes]
-
♦
tmin
¶ builtins.float
(pyrocko.guts.Timestamp
), optionalearliest start time of all nuts in selection
-
♦
tmax
¶ builtins.float
(pyrocko.guts.Timestamp
), optionallatest end time of all nuts in selection
-
♦
-
class
Database
(database_path=':memory:', log_statements=False)[source]¶ Bases:
object
Shared meta-information database used by squirrel.
-
dig
(nuts)[source]¶ Store or update content meta-information.
Given
nuts
are assumed to represent an up-to-date and complete inventory of a set of files. Any old information about these files is first pruned from the database (via database triggers). If such content is part of a live selection, it is also removed there. Then the new content meta-information is inserted into the main database. The content is not automatically inserted into the live selections again. It is in the responsibility of the selection object to perform this step.
-
remove
(path)[source]¶ Prune content meta-inforamation about a given file.
All content pieces belonging to file
path
are removed from the main database and any attached live selections (via database triggers).
-
reset
(path)[source]¶ Prune information associated with a given file, but keep the file path.
This method is called when reading a file failed. File attributes, format, size and modification time are set to NULL. File content meta-information is removed from the database and any attached live selections (via database triggers).
-
-
class
DatabaseStats
(**kwargs)[source]¶ Bases:
pyrocko.guts.Object
Container to hold statistics about contents cached in meta-information db.
-
♦
nfiles
¶ int
number of files in database
-
♦
nnuts
¶ int
number of index nuts in database
-
♦
codes
¶ list
oftuple
ofstr
objects objects, default:[]
available code sequences in database, e.g. (agency, network, station, location) for stations nuts.
-
♦
kinds
¶ list
ofstr
objects, default:[]
available content types in database
-
♦
total_size
¶ int
aggregated file size of files referenced in database
-
♦
counts
¶ dict
ofdict
ofint
objects objects, default:{}
breakdown of how many nuts of any content type and code sequence are available in database,
counts[kind][codes]
-
♦
-
class
Content
(**kwargs)[source]¶ Bases:
pyrocko.guts.Object
Base class for content types in the Squirrel framework.
-
class
Waveform
(**kwargs)[source]¶ Bases:
pyrocko.squirrel.model.Content
A continuous seismic waveform snippet.
-
♦
agency
¶ str
, default:''
Agency code (2-5)
-
♦
network
¶ str
, default:''
Deployment/network code (1-8)
-
♦
station
¶ str
, default:''
Station code (1-5)
-
♦
location
¶ str
, default:''
Location code (0-2)
-
♦
channel
¶ str
, default:''
Channel code (3)
-
♦
extra
¶ str
, default:''
Extra/custom code
-
♦
tmin
¶ builtins.float
(pyrocko.guts.Timestamp
)
-
♦
tmax
¶ builtins.float
(pyrocko.guts.Timestamp
)
-
♦
deltat
¶ float
, optional
-
♦
data
¶ numpy.ndarray
(pyrocko.guts_array.Array
), optionalnumpy array with data samples
-
♦
-
class
WaveformPromise
(**kwargs)[source]¶ Bases:
pyrocko.squirrel.model.Content
Information about a waveform potentially available at a remote site.
-
♦
agency
¶ str
, default:''
Agency code (2-5)
-
♦
network
¶ str
, default:''
Deployment/network code (1-8)
-
♦
station
¶ str
, default:''
Station code (1-5)
-
♦
location
¶ str
, default:''
Location code (0-2)
-
♦
channel
¶ str
, default:''
Channel code (3)
-
♦
extra
¶ str
, default:''
Extra/custom code
-
♦
tmin
¶ builtins.float
(pyrocko.guts.Timestamp
)
-
♦
tmax
¶ builtins.float
(pyrocko.guts.Timestamp
)
-
♦
deltat
¶ float
, optional
-
♦
source_hash
¶ str
-
♦
-
class
Station
(**kwargs)[source]¶ Bases:
pyrocko.squirrel.model.Content
A seismic station.
-
♦
agency
¶ str
, default:''
Agency code (2-5)
-
♦
network
¶ str
, default:''
Deployment/network code (1-8)
-
♦
station
¶ str
, default:''
Station code (1-5)
-
♦
location
¶ str
, optional, default:''
Location code (0-2)
-
♦
tmin
¶ builtins.float
(pyrocko.guts.Timestamp
), optional
-
♦
tmax
¶ builtins.float
(pyrocko.guts.Timestamp
), optional
-
♦
lat
¶ float
-
♦
lon
¶ float
-
♦
elevation
¶ float
, optional
-
♦
depth
¶ float
, optional
-
♦
description
¶ str
, optional
-
♦
-
class
Channel
(**kwargs)[source]¶ Bases:
pyrocko.squirrel.model.Content
A channel of a seismic station.
-
♦
agency
¶ str
, default:''
Agency code (2-5)
-
♦
network
¶ str
, default:''
Deployment/network code (1-8)
-
♦
station
¶ str
, default:''
Station code (1-5)
-
♦
location
¶ str
, default:''
Location code (0-2)
-
♦
channel
¶ str
, default:''
Channel code (3)
-
♦
tmin
¶ builtins.float
(pyrocko.guts.Timestamp
), optional
-
♦
tmax
¶ builtins.float
(pyrocko.guts.Timestamp
), optional
-
♦
lat
¶ float
-
♦
lon
¶ float
-
♦
elevation
¶ float
, optional
-
♦
depth
¶ float
, optional
-
♦
dip
¶ float
, optional
-
♦
azimuth
¶ float
, optional
-
♦
deltat
¶ float
, optional
-
♦
-
class
Nut
(file_path=None, file_format=None, file_mtime=None, file_size=None, file_segment=None, file_element=None, kind_id=0, codes='', tmin_seconds=None, tmin_offset=0, tmax_seconds=None, tmax_offset=0, deltat=None, content=None, tmin=None, tmax=None, values_nocheck=None)[source]¶ Bases:
pyrocko.guts.Object
Index entry referencing an elementary piece of content.
So-called nuts are used in Pyrocko’s Squirrel framework to hold common meta-information about individual pieces of waveforms, stations, channels, etc. together with the information where it was found or generated.
-
♦
file_path
¶ str
, optional
-
♦
file_format
¶ str
, optional
-
♦
file_mtime
¶ builtins.float
(pyrocko.guts.Timestamp
), optional
-
♦
file_size
¶ int
, optional
-
♦
file_segment
¶ int
, optional
-
♦
file_element
¶ int
, optional
-
♦
kind_id
¶ int
-
♦
codes
¶ str
-
♦
tmin_seconds
¶ builtins.float
(pyrocko.guts.Timestamp
)
-
♦
tmin_offset
¶ int
, optional, default:0
-
♦
tmax_seconds
¶ builtins.float
(pyrocko.guts.Timestamp
)
-
♦
tmax_offset
¶ int
, optional, default:0
-
♦
deltat
¶ float
, default:0.0
-
♦
-
iload
(paths, segment=None, format='detect', database=None, check=True, commit=True, skip_unchanged=False, content=['waveform', 'station', 'channel', 'response', 'event'])[source]¶ Iteratively load content or index/reindex meta-information from files.
Parameters: - paths – iterator yielding file names to load from or
pyrocko.squirrel.Selection
object - segment (str) – file-specific segment identifier (con only be used when loading from a single file.
- format (str) – file format identifier or
'detect'
for autodetection. When loading from a selection, per-file format assignation is taken from the hint in the selection and this flag is ignored. - database (
pyrocko.squirrel.Database
) – database to use for meta-information caching - check (bool) – if
True
, investigate modification time and file sizes of known files to debunk modified files (pessimistic mode), orFalse
to deactivate checks (optimistic mode) - commit (bool) – flag, whether to commit updated information to the meta-information database
- skip_unchanged (bool) – if
True
, only yield index nuts for new / modified files - content – list of strings, selection of content types to load
This generator yields
pyrocko.squirrel.Nut
objects for individual pieces of information found when reading the given files. Such a nut may represent a waveform, a station, a channel, an event or other data type. The nut itself only contains the meta-information. The actual content information is attached to the nut if requested. All nut meta-information is stored in the squirrel meta-information database. If possible, this function avoids accessing the actual disk files and provides the requested information straight from the database. Modified files are recognized and reindexed as needed.- paths – iterator yielding file names to load from or
-
detect_format
(path)[source]¶ Determine file type from first 512 bytes.
Parameters: path (str) – path of file
-
get_backend
(fmt)[source]¶ Get squirrel io backend module for a given file format.
Params str fmt: format identifier
-
exception
FormatDetectionFailed
(path)[source]¶ Bases:
pyrocko.io.io_common.FileLoadError
Exception raised when file format detection fails.
-
exception
UnknownFormat
(format)[source]¶ Bases:
Exception
Exception raised when user requests an unknown file format.
-
class
Source
(**kwargs)[source]¶ Bases:
pyrocko.guts.Object
Undocumented.
-
update_channel_inventory
(squirrel, constraint)[source]¶ Let local inventory be up-to-date with remote for a given constraint.
-