pypeit.pypeitdata module
PypeIt uses the astropy.utils.data caching system to limit the size of its package distribution in PyPI by enabling on-demand downloading of reference files needed for specific data-reduction steps. This module provides the class used to access to the data files in the code base.
The cache
module implements the low-level function used to
interface with the PypeIt cache. To get the location of your pypeit cache (by
default ~/.pypeit/cache
) you can run:
import astropy.config.paths
print(astropy.config.paths.get_cache_dir('pypeit'))
Every time PypeIt is imported, a new instance of
PypeItDataPaths
is created, and this instance is
used to set paths to PypeIt data files. “Data files” in this context
essentially refers to anything in the pypeit/data
directory tree; see the
attributes of PypeItDataPaths
for the list of
directories that can be directly accessed.
Some of files in these directories are included in the package distribution, but
most are not. Regardless, the PypeItDataPaths
object should be used to define the relevant file paths. For example, to access
a given NIST line list, one would run:
from pypeit import dataPaths
thar = dataPaths.nist.get_file_path('ThAr_vacuum.ascii')
All of the attributes of PypeItDataPaths
are
PypeItDataPath
objects, such that the code above is
effectively equivalent to:
from pypeit.pypeitdata import PypeItDataPath
thar = PypeItDataPath('arc_lines/NIST').get_file_path('ThAr_vacuum.ascii')
Although PypeItDataPath
objects can be treated
similarly to Path objects, you should always use the
get_file_path()
function to access the
relevant file path. Behind the scenes, this function looks for the requested
file in your package distribution and/or downloads the file to your cache before
returning the appropriate path.
Data directories that MUST exist as part of the package distribution are:
pypeit/data/spectrographs/keck_deimos/gain_ronoise
- class pypeit.pypeitdata.PypeItDataPath(subdirs, remote_host=None)[source]
Bases:
object
Convenience class that enables a general interface between a pypeit data directory and the rest of the code, regardless of whether or not the directory is expected to leverage the cache system for package-installed (as opposed to GitHub/source-installed) versions.
- Parameters:
subdirs (
str
, Path) – The subdirectories within the mainpypeit/data
directory that contains the data.remote_host (
str
, optional) – The remote host for the data. By definition, all files in this data path must have the same host. Currently must be None,'github'
, or's3_cloud'
. If None, all the files in this path are expected to be local to any pypeit installation.
- __truediv__(p)[source]
Instantiate a new path object that points to a subdirectory using the (true) division operator,
/
.This operation should only be used for contents that exist on the users distribution. I.e., any file that is distributed using the cache should not use this; use
get_file_path()
instead.- Parameters:
p (
str
, Path) – A sub-directory or file within thepath
directory. If this is a sub-directory, a newPypeItDataPath
object is returned. If a file, a Path object is returned.- Returns:
Path to contents of
path
, where the output type is based on the type ofp
.- Return type:
- Raises:
PypeItPathError – Raised if the requested contents do not exist.
- static _get_file_path_return(f, return_format, format=None)[source]
Convience function that formats the return of
get_file_path()
, depending on whether or not the “format” of the file is requested.- Parameters:
f (Path) – The file path to return.
return_format (
bool
) – If True, parse and return the file suffix (e.g., ‘fits’). If False, only the file path is returned.format (
str
) – Ifreturn_format
is True, override (i.e. do not parse) the format from the file name, and use this string instead. Ignored if None orreturn_format
is False.
- Returns:
The file path and, if requested, the file format. See the arguments list.
- Return type:
Path, tuple
- static check_isdir(path: Path) Path [source]
Check that the hardwired directory exists.
- Parameters:
path (Path) – The path to check. This must be a directory (not a file).
- Returns:
The input path is returned if it is valid.
- Return type:
- Raises:
PypeItPathError – Raised if the path does not exist or is not a directory.
- get_file_path(data_file, force_update=False, to_pkg=None, return_format=False, return_none=False, quiet=False)[source]
Return the path to a file.
The file must either exist locally or be downloadable from the
host
. To define a path to a file that does not meet these criteria, useself.path / data_file
.If
data_file
is a valid path to a file or is a file withinpath
, the full path is returned. Otherwise, it is assumed that the file is accessible remotely in the GitHub repository and can be downloaded usingfetch_remote_file()
. Note,data_file
must be a file, not a subdirectory withinpath
.Throughout the code base, this is the main function that should be used to obtain paths to files within
path
. I.e., for any data in thepypeit/data
directory, developers should be usingp = path.get_file_path(file)
instead ofp = path / file
to define the path to a data file.- Parameters:
force_update (
bool
, optional) – If the file is in the cache, force astropy.utils.data.download_file to update the cache by downloading the latest version.to_pkg (
str
, optional) – If the file is in the cache, this argument affects how the cached file is connected to the package installation. If'symlink'
, a symbolic link is created in the package directory tree that points to the cached file. If'move'
, the cached file is moved (not copied) from the cache into the package directory tree. If anything else (including None), no operation is performed; no warning is issued if the value ofto_pkg
is not one of these three options (None,'symlink'
, or'move'
). This argument is ignored ifdata_file
is a value path or a file withinpath
.return_format (
bool
, optional) – If True, the returned object is atuple
that includes the file path and its format (e.g.,'fits'
). If False, only the file path is returned.return_none (
bool
, optional) – If True, return None if the file does not exist. If False, an error is raised if the file does not exist.quiet (
bool
, optional) – Suppress messages
- Returns:
The file path and, if requested, the file format; see
return_format
.- Return type:
Path, tuple
- glob(pattern)[source]
Search for all contents of
path
that match the provided search string; see Path.glob.Important
This method only works for files that are on-disk in the correct directory. It does not return any files that are in the cache or that have not yet been downloaded into the cache.
- Parameters:
pattern (
str
) – Search string with wild cards.- Returns:
Generator object that provides contents matching the search string.
- Return type:
generator
- class pypeit.pypeitdata.PypeItDataPaths[source]
Bases:
object
List of hardwired data path objects, primarily for developers.
The top-level directory for all attributes is
pypeit/data
. All of these directories should, at minimum, include a README file that is version-controlled and hosted by GitHub. I.e., the code assumes these paths exist, and maintaining a version-controlled README ensures that is true, even if the directory is empty otherwise.- defined_paths = {'arc_plot': {'host': None, 'path': 'arc_lines/plots'}, 'arclines': {'host': None, 'path': 'arc_lines'}, 'extinction': {'host': None, 'path': 'extinction'}, 'filters': {'host': None, 'path': 'filters'}, 'linelist': {'host': None, 'path': 'arc_lines/lists'}, 'nist': {'host': 'github', 'path': 'arc_lines/NIST'}, 'pixelflat': {'host': 'github', 'path': 'pixelflats'}, 'reid_arxiv': {'host': 'github', 'path': 'arc_lines/reid_arxiv'}, 'sensfunc': {'host': 'github', 'path': 'sensfuncs'}, 'skisim': {'host': 'github', 'path': 'skisim'}, 'sky_spec': {'host': None, 'path': 'sky_spec'}, 'spectrographs': {'host': None, 'path': 'spectrographs'}, 'standards': {'host': 'github', 'path': 'standards'}, 'static_calibs': {'host': None, 'path': 'static_calibs'}, 'tel_model': {'host': None, 'path': 'telluric/models'}, 'telgrid': {'host': 's3_cloud', 'path': 'telluric/atm_grids'}, 'tests': {'host': 'github', 'path': 'tests'}}
Dictionary providing the metadata for all the paths defined by the class.
- classmethod github_paths()[source]
Return the subset paths hosted on GitHub.
- Returns:
A dictionary with the same format as
defined_paths
, but only includes those paths hosted on GitHub.- Return type:
- classmethod remote_paths()[source]
Return the subset paths with data hosted remotely.
- Returns:
A dictionary with the same format as
defined_paths
, but only includes those paths with data hosted remotely.- Return type:
- classmethod s3_paths()[source]
Return the subset paths hosted on aws s3.
- Returns:
A dictionary with the same format as
defined_paths
, but only includes those paths hosted on aws s3.- Return type: