squirrel.base
¶
-
class
Selection
(database, persistent=None)[source]¶ Bases:
object
Database backed file selection (base class for
Squirrel
).Parameters: A selection in this context represents the list of files to be made available to the application. Instead of using
Selection
directly, user applications should usually use its subclassSquirrel
which adds content indices to the selection and provides high level data querying.By default, a temporary table in the database is created to hold the names of the files in the selection. This table is only visible inside the application which created it. If a name is given to
persistent
, a named selection is created, which is visible also in other applications using the same database.Besides the filename references, desired content kind masks and file format indications are stored in the selection’s database table to make the user choice regarding these options persistent on a per-file basis. Book-keeping on whether files are unknown, known or if modification checks are forced is handled in the selection’s file-state table.
Paths of files can be added to the selection using the
add()
method and removed withremove()
.undig_grouped()
can be used to iterate over all content known to the selection.-
add
(paths, kind_mask=255, format='detect')[source]¶ Add files to the selection.
Parameters: paths (iterator yielding str
objects) – Paths to files to be added to the selection.
-
remove
(paths)[source]¶ Remove files from the selection.
Parameters: paths ( list
ofstr
) – Paths to files to be removed from the selection.
-
iter_paths
()[source]¶ Iterate over all file paths currently belonging to the selection.
Yields: File paths.
-
get_paths
()[source]¶ Get all file paths currently belonging to the selection.
Returns: List of file paths.
-
undig_grouped
(skip_unchanged=False)[source]¶ Get inventory of cached content for all files in the selection.
Parameters: skip_unchanged (bool) – If True
only inventory of modified files is yielded (flag_modified()
must be called beforehand).This generator yields tuples
((format, path), nuts)
wherepath
is the path to the file,format
is the format assignation or'detect'
andnuts
is a list ofNut
objects representing the contents of the file.
-
flag_modified
(check=True)[source]¶ Mark files which have been modified.
Parameters: check (bool) – If True
query modification times of known files on disk. IfFalse
, only flag unknown files.Assumes file state is 0 for newly added files, 1 for files added again to the selection (forces check), or 2 for all others (no checking is done for those).
Sets file state to 0 for unknown or modified files, 2 for known and not modified files.
-
-
class
SquirrelStats
(**kwargs)[source]¶ Bases:
pyrocko.guts.Object
Container to hold statistics about contents available from a Squirrel.
See also
Squirrel.get_stats()
.-
♦
nfiles
¶ int
Number of files in selection.
-
♦
nnuts
¶ int
Number of index nuts in selection.
-
♦
codes
¶ list
oftuple
ofstr
objects objects, default:[]
Available code sequences in selection, e.g. (agency, network, station, location) for stations nuts.
-
♦
kinds
¶ list
ofstr
objects, default:[]
Available content types in selection.
-
♦
total_size
¶ int
Aggregated file size of files is selection.
-
♦
counts
¶ dict
ofdict
ofint
objects objects, default:{}
Breakdown of how many nuts of any content type and code sequence are available in selection,
counts[kind][codes]
.
-
♦
tmin
¶ builtins.float
(pyrocko.guts.Timestamp
), optionalEarliest start time of all nuts in selection.
-
♦
tmax
¶ builtins.float
(pyrocko.guts.Timestamp
), optionalLatest end time of all nuts in selection.
-
♦
-
class
Squirrel
(env=None, database=None, cache_path=None, persistent=None)[source]¶ Bases:
pyrocko.squirrel.base.Selection
Prompt, lazy, indexing, caching, dynamic seismological dataset access.
Parameters: - env (
SquirrelEnvironment
orstr
) – Squirrel environment instance or directory path to use as starting point for its detection. By default, the current directory is used as starting point. When searching for a usable environment the directory'.squirrel'
or'squirrel'
in the current (or starting point) directory is used if it exists, otherwise the parent directories are search upwards for the existence of such a directory. If no such directory is found, the user’s global Squirrel environment'$HOME/.pyrocko/squirrel'
is used. - database (
Database
orstr
) – Database instance or path to database. By default the database found in the detected Squirrel environment is used. - cache_path (
str
) – Directory path to use for data caching. By default, the'cache'
directory in the detected Squirrel environment is used. - persistent (
str
) – If given a name, create a persistent selection.
This is the central class of the Squirrel framework. It provides a unified interface to query and access seismic waveforms, station meta-data and event information from local file collections and remote data sources. For prompt responses, a profound database setup is used under the hood. To speed up assemblage of ad-hoc data selections, files are indexed on first use and the extracted meta-data is remembered in the database for subsequent accesses. Bulk data is lazily loaded from disk and remote sources, just when requested. Once loaded, data is cached in memory to expedite typical access patterns. Files and data sources can be dynamically added to and removed from the Squirrel selection at runtime.
Queries are restricted to the contents of the files currently added to the Squirrel selection (usually a subset of the information collected in the attached global file meta-information database). This list of files is referred to here as the “selection”. By default, temporary tables are created in the attached database to hold the names of the files in the selection as well as various indices and counters. These tables are only visible inside the application which created it and are deleted when the database connection is closed or the application exits. To create a selection which is not deleted at exit, supply a name to the
persistent
argument of the Squirrel constructor. Persistent selections are shared among applications using the same database.Squirrel.add
(paths[, kinds, format, check, …])Add files to the selection. Squirrel.add_virtual
(nuts[, virtual_paths])Add content which is not backed by files. Squirrel.add_source
(source)Add remote resource. Squirrel.add_fdsn
(*args, **kwargs)Add FDSN site for transparent remote data access. Squirrel.add_catalog
(*args, **kwargs)Add online catalog for transparent event data access. Squirrel.add_dataset
(path[, check, …])Read dataset description from file and add its contents. Squirrel.update
([constraint])Update or partially update channel and event inventories. Squirrel.update_waveform_promises
([constraint])Permit downloading of remote waveforms. Squirrel.advance_accessor
(accessor[, cache])Notify memory caches about consumer moving to a new data window. Squirrel.clear_accessor
(accessor[, cache])Notify memory caches about a consumer having finished. Squirrel.reload
()Check for modifications and reindex modified files. Squirrel.iter_nuts
([kind, tmin, tmax, …])Iterate content entities matching given constraints. Squirrel.iter_kinds
([codes])Iterate over content types available in selection. Squirrel.iter_deltats
([kind])Iterate over sampling intervals available in selection. Squirrel.iter_codes
([kind])Iterate over content identifier code sequences available in selection. Squirrel.iter_counts
([kind])Iterate over number of occurrences of any (kind, codes) combination. Squirrel.get_nuts
(*args, **kwargs)Get content entities matching given constraints. Squirrel.get_time_span
([kinds])Get time interval over all content in selection. Squirrel.get_deltat_span
(kind)Get min and max sampling interval of all content of given kind. Squirrel.get_deltats
([kind])Get sampling intervals available in selection. Squirrel.get_kinds
([codes])Get content types available in selection. Squirrel.get_codes
([kind])Get identifier code sequences available in selection. Squirrel.get_counts
([kind])Get number of occurrences of any (kind, codes) combination. Squirrel.get_nfiles
()Get number of files in selection. Squirrel.get_nnuts
()Get number of nuts in selection. Squirrel.get_total_size
()Get aggregated file size available in selection. Squirrel.get_stats
()Get statistics on contents available through this selection. Squirrel.get_coverage
(kind[, tmin, tmax, …])Get coverage information. Squirrel.get_content
(nut[, cache, accessor])Get and possibly load full content for a given index entry from file. Squirrel.get_stations
([obj, tmin, tmax, …])Get stations matching given constraints. Squirrel.get_channels
([obj, tmin, tmax, …])Get channels matching given constraints. Squirrel.get_responses
([obj, tmin, tmax, …])Squirrel.get_events
([obj, tmin, tmax, time, …])Squirrel.get_waveform_nuts
([obj, tmin, …])Squirrel.get_waveforms_primitive
([obj, …])Squirrel.get_waveforms
([obj, tmin, tmax, …])Squirrel.chopper_waveforms
([obj, tmin, …])Iterate window-wise over waveform data. Squirrel.pile
Squirrel.snuffle
()Squirrel.glob_codes
(kind, codes_list)Find codes matching given patterns. Squirrel.print_tables
([table_names, stream])Dump raw database tables in textual form (for debugging purposes). -
add
(paths, kinds=None, format='detect', check=True, progress_viewer='terminal')[source]¶ Add files to the selection.
Parameters: - paths (
list
ofstr
) – Iterator yielding paths to files or directories to be added to the selection. Recurses into directories. If given astr
, it is treated as a single path to be added. - kinds (
list
ofstr
) – Content types to be made available through the Squirrel selection. By default, all known content types are accepted. - format (
str
) – File format identifier or'detect'
to enable auto-detection.
Complexity: O(log N)
- paths (
-
reload
()[source]¶ Check for modifications and reindex modified files.
Based on file modification times.
-
add_virtual
(nuts, virtual_paths=None)[source]¶ Add content which is not backed by files.
Parameters: Stores to the main database and the selection.
-
add_source
(source)[source]¶ Add remote resource.
Parameters: source (subclass of Source
) – Remote data access client instance.
-
add_fdsn
(*args, **kwargs)[source]¶ Add FDSN site for transparent remote data access.
Arguments are passed to
FDSNSource
.
-
add_catalog
(*args, **kwargs)[source]¶ Add online catalog for transparent event data access.
Arguments are passed to
CatalogSource
.
-
add_dataset
(path, check=True, progress_viewer='terminal')[source]¶ Read dataset description from file and add its contents.
Parameters:
-
iter_nuts
(kind=None, tmin=None, tmax=None, codes=None, naiv=False, kind_codes_ids=None)[source]¶ Iterate content entities matching given constraints.
Parameters: - kind (
str
,list
ofstr
) – Content kind (or kinds) to extract. - tmin (timestamp) – Start time of query interval.
- tmax (timestamp) – End time of query interval.
- codes (
tuple
ofstr
) – Pattern of content codes to query. - naiv (
bool
) – Bypass time span lookup through indices (slow, for testing). - kind_codes_ids (
list
ofstr
) – Kind-codes IDs of contents to be retrieved (internal use).
Yields: Nut
objects representing the intersecting content.Complexity: O(log N) for the time selection part due to heavy use of database indices.
Query time span is treated as a half-open interval
[tmin, tmax)
. However, iftmin
equalstmax
, the edge logics are modified to closed-interval so that content intersecting with the time instantt = tmin = tmax
is returned (otherwise nothing would be returned as[t, t)
never matches anything).Time spans of content entities to be matched are also treated as half open intervals, e.g. content span
[0, 1)
is matched by query span[0, 1)
but not by[-1, 0)
or[1, 2)
. Also here, logics are modified to closed-interval when the content time span is an empty interval, i.e. to indicate a time instant. E.g. time instant 0 is matched by[0, 1)
but not by[-1, 0)
or[1, 2)
.- kind (
-
get_nuts
(*args, **kwargs)[source]¶ Get content entities matching given constraints.
Like
iter_nuts()
but returns results as a list.
-
get_time_span
(kinds=None)[source]¶ Get time interval over all content in selection.
Complexity: O(1), independent of the number of nuts. Returns: (tmin, tmax)
-
get_deltat_span
(kind)[source]¶ Get min and max sampling interval of all content of given kind.
Parameters: kind (str) – Content kind Returns: (deltat_min, deltat_max)
-
iter_kinds
(codes=None)[source]¶ Iterate over content types available in selection.
Parameters: codes ( tuple
ofstr
) – If given, get kinds only for selected codes identifier.Yields: Available content kinds as str
.Complexity: O(1), independent of number of nuts.
-
iter_deltats
(kind=None)[source]¶ Iterate over sampling intervals available in selection.
Parameters: kind (str) – If given, get sampling intervals only for a given content type. Yields: float
values.Complexity: O(1), independent of number of nuts.
-
iter_codes
(kind=None)[source]¶ Iterate over content identifier code sequences available in selection.
Parameters: kind (str) – If given, get codes only for a given content type. Yields: tuple
ofstr
Complexity: O(1), independent of number of nuts.
-
iter_counts
(kind=None)[source]¶ Iterate over number of occurrences of any (kind, codes) combination.
Parameters: kind (str) – If given, get counts only for selected content type. Yields: Tuples of the form ((kind, codes), count)
.Complexity: O(1), independent of number of nuts.
-
get_kinds
(codes=None)[source]¶ Get content types available in selection.
Parameters: codes ( tuple
ofstr
) – If given, get kinds only for selected codes identifier.Returns: Sorted list of available content types. Complexity: O(1), independent of number of nuts.
-
get_deltats
(kind=None)[source]¶ Get sampling intervals available in selection.
Parameters: kind (str) – If given, get codes only for selected content type. Complexity: O(1), independent of number of nuts. Returns: sorted list of available sampling intervals
-
get_codes
(kind=None)[source]¶ Get identifier code sequences available in selection.
Parameters: kind (str) – If given, get codes only for selected content type. Complexity: O(1), independent of number of nuts. Returns: sorted list of available codes as tuples of strings
-
get_counts
(kind=None)[source]¶ Get number of occurrences of any (kind, codes) combination.
Parameters: kind (str) – If given, get codes only for selected content type. Complexity: O(1), independent of number of nuts. Returns: dict
withcounts[kind][codes]
orcounts[codes]
if kind is notNone
-
glob_codes
(kind, codes_list)[source]¶ Find codes matching given patterns.
Parameters: Returns: List of matches of the form
[kind_codes_id, codes, deltat]
.
-
update
(constraint=None, **kwargs)[source]¶ Update or partially update channel and event inventories.
Parameters: - constraint (
Constraint
) – Selection of times or areas to be brought up to date. - **kwargs – Shortcut for setting
constraint=Constraint(**kwargs)
.
This function triggers all attached remote sources, to check for updates in the meta-data. The sources will only submit queries when their expiration date has passed, or if the selection spans into previously unseen times or areas.
- constraint (
-
update_waveform_promises
(constraint=None, **kwargs)[source]¶ Permit downloading of remote waveforms.
Parameters: - constraint (
Constraint
) – Remote waveforms compatible with the given constraint are enabled for download. - **kwargs – Shortcut for setting
constraint=Constraint(**kwargs)
.
Calling this method permits Squirrel to download waveforms from remote sources when processing subsequent waveform requests. This works by inserting so called waveform promises into the database. It will look into the available channels for each remote source and create a promise for each channel compatible with the given constraint. If the promise then matches in a waveform request, Squirrel tries to download the waveform. If the download is successful, the downloaded waveform is added to the Squirrel and the promise is deleted. If the download fails, the promise is kept if the reason of failure looks like being temporary, e.g. because of a network failure. If the cause of failure however seems to be permanent, the promise is deleted so that no further attempts are made to download a waveform which might not be available from that server at all. To force re-scheduling after a permanent failure, call
update_waveform_promises()
yet another time.- constraint (
-
get_content
(nut, cache='default', accessor='default')[source]¶ Get and possibly load full content for a given index entry from file.
Loads the actual content objects (channel, station, waveform, …) from file. For efficiency sibling content (all stuff in the same file segment) will also be loaded as a side effect. The loaded contents are cached in the Squirrel object.
-
advance_accessor
(accessor, cache=None)[source]¶ Notify memory caches about consumer moving to a new data window.
Parameters: See
pyrocko.squirrel.cache.ContentCache
for details on how Squirrel’s memory caching works and can be tuned. Default behaviour is to release data when it has not been used in the latest data window. If the accessor is never advanced, data is cached indefinitely - which is often desired e.g. for station meta-data. Methods for consecutive data traversal, likeSquirrel.chopper_waveforms
automatically advance and clear their accessor.
-
clear_accessor
(accessor, cache=None)[source]¶ Notify memory caches about a consumer having finished.
Parameters: Calling this method clears all references to cache entries held by the named accessor. Cache entries are then freed if not referenced by any other accessor.
-
get_stations
(obj=None, tmin=None, tmax=None, time=None, codes=None, model='squirrel')[source]¶ Get stations matching given constraints.
Parameters: - obj (Any object with attributes
tmin
,tmax
andcodes
.) – Object providingtmin
,tmax
andcodes
to be used to constrain the query. Direct arguments override those fromobj
. - tmin (timestamp) – Start time of query interval.
- tmax (timestamp) – End time of query interval.
- time (timestamp) – Time instant to query. Equivalent to setting
tmin
andtmax
to the same value. - codes (
tuple
ofstr
) – Pattern of content codes to query. - model (str) – Select object model for returned values:
'squirrel'
to get Squirrel station objects or'pyrocko'
to get Pyrocko station objects with channel information attached.
Returns: List of
pyrocko.squirrel.Station
objects by default or list ofpyrocko.model.Station
objects ifmodel='pyrocko'
is requested.See
Squirrel.iter_nuts()
for details on time span matching.- obj (Any object with attributes
-
get_channels
(obj=None, tmin=None, tmax=None, time=None, codes=None)[source]¶ Get channels matching given constraints.
Parameters: - obj (Any object with attributes
tmin
,tmax
andcodes
.) – Object providingtmin
,tmax
andcodes
to be used to constrain the query. Direct arguments override those fromobj
. - tmin (timestamp) – Start time of query interval.
- tmax (timestamp) – End time of query interval.
- time (timestamp) – Time instant to query. Equivalent to setting
tmin
andtmax
to the same value. - codes (
tuple
ofstr
) – Pattern of content codes to query.
Returns: List of
Channel
objects.See
Squirrel.iter_nuts()
for details on time span matching.- obj (Any object with attributes
-
chopper_waveforms
(obj=None, tmin=None, tmax=None, time=None, codes=None, tinc=None, tpad=0.0, want_incomplete=True, degap=True, maxgap=5, maxlap=None, snap=(<built-in function round>, <built-in function round>), include_last=False, load_data=True, accessor_id=None, keep_current_files_open=False, **kwargs)[source]¶ Iterate window-wise over waveform data.
Parameters: - tmin – start time (default uses start time of available data)
- tmax – end time (default uses end time of available data)
- tinc – time increment (window shift time) (default uses
tmax-tmin
) - tpad – padding time appended on either side of the data windows
(window overlap is
2*tpad
) - trace_selector – filter callback taking
pyrocko.trace.Trace
objects - want_incomplete – if set to
False
, gappy/incomplete traces are discarded from the results - degap – whether to try to connect traces and to remove gaps and overlaps
- maxgap – maximum gap size in samples which is filled with
interpolated samples when
degap
isTrue
- maxlap – maximum overlap size in samples which is removed when
degap
isTrue
- keep_current_files_open – whether to keep cached trace data in memory after the iterator has ended
- accessor_id – used as a key to identify different points of extraction for the decision of when to release cached trace data (should be used when data is alternately extracted from more than one region / selection)
- snap – replaces Python’s
round()
function which is used to determine indices where to start and end the trace data array - include_last – whether to include the very last sample
- load_data – whether to load the waveform data. If set to
False
, traces with no data samples, but with correct meta-information are returned
Returns: itererator yielding a list of
pyrocko.trace.Trace
objects for every extracted time window
-
get_coverage
(kind, tmin=None, tmax=None, codes_list=None, limit=None)[source]¶ Get coverage information.
Get information about strips of gapless data coverage.
Parameters: - kind (str) – Content kind to be queried.
- tmin (timestamp) – Start time of query interval.
- tmax (timestamp) – End time of query interval.
- codes_list (
list
oftuple
ofstr
) – List of code patterns to query. If not given or empty, an empty list is returned. - limit (int) – Limit query to return only up to a given maximum number of entries per matching channel (without setting this option, very gappy data could cause the query to execute for a very long time).
Returns: list of entries of the form
(pattern, codes, deltat, tmin, tmax, data)
wherepattern
is the request codes pattern which yielded this entry,codes
are the matching channel codes,tmin
andtmax
are the global min and max times for which data for this channel is available, regardless of any time restrictions in the query.data
is a list with (up tolimit
) checkpoints of the form(time, count)
where acount
of zero indicates a data gap, a value of 1 normal data coverage and higher values indicate duplicate/redundant data.
- env (
-
class
DatabaseStats
(**kwargs)[source]¶ Bases:
pyrocko.guts.Object
Container to hold statistics about contents cached in meta-information db.
-
♦
nfiles
¶ int
number of files in database
-
♦
nnuts
¶ int
number of index nuts in database
-
♦
codes
¶ list
oftuple
ofstr
objects objects, default:[]
available code sequences in database, e.g. (agency, network, station, location) for stations nuts.
-
♦
kinds
¶ list
ofstr
objects, default:[]
available content types in database
-
♦
total_size
¶ int
aggregated file size of files referenced in database
-
♦
counts
¶ dict
ofdict
ofint
objects objects, default:{}
breakdown of how many nuts of any content type and code sequence are available in database,
counts[kind][codes]
-
♦
-
class
Database
(database_path=':memory:', log_statements=False)[source]¶ Bases:
object
Shared meta-information database used by Squirrel.
-
dig
(nuts)[source]¶ Store or update content meta-information.
Given
nuts
are assumed to represent an up-to-date and complete inventory of a set of files. Any old information about these files is first pruned from the database (via database triggers). If such content is part of a live selection, it is also removed there. Then the new content meta-information is inserted into the main database. The content is not automatically inserted into the live selections again. It is in the responsibility of the selection object to perform this step.
-
remove
(path)[source]¶ Prune content meta-inforamation about a given file.
All content pieces belonging to file
path
are removed from the main database and any attached live selections (via database triggers).
-
reset
(path)[source]¶ Prune information associated with a given file, but keep the file path.
This method is called when reading a file failed. File attributes, format, size and modification time are set to NULL. File content meta-information is removed from the database and any attached live selections (via database triggers).
-