pypeit.pkg.pypeitdata module
PypeIt uses the astropy.utils.data caching system to limit the size of
its package distribution in PyPI by enabling on-demand downloading of reference
files needed for specific data-reduction steps. This module provides the class
used to access to the data files in the code base.
The cache module implements the low-level function used to
interface with the PypeIt cache. To get the location of your pypeit cache (by
default ~/.pypeit/cache) you can run:
import astropy.config.paths
print(astropy.config.paths.get_cache_dir('pypeit'))
Every time PypeIt is imported, a new instance of
PypeItDataPaths is created, and this instance is
used to set paths to PypeIt data files. “Data files” in this context
essentially refers to anything in the pypeit/data directory tree; see the
attributes of PypeItDataPaths for the list of
directories that can be directly accessed.
Some of files in these directories are included in the package distribution, but
most are not. Regardless, the PypeItDataPaths
object should be used to define the relevant file paths. For example, to access
a given NIST line list, one would run:
from pypeit import dataPaths
thar = dataPaths.nist.get_file_path('ThAr_vacuum.ascii')
All of the attributes of PypeItDataPaths are
PypeItDataPath objects, such that the code above
is effectively equivalent to:
from pypeit.pkg.pypeitdata import PypeItDataPath
thar = PypeItDataPath('arc_lines/NIST').get_file_path('ThAr_vacuum.ascii')
Although PypeItDataPath objects can be treated
similarly to Path objects, you should always use the
get_file_path() function to access
the relevant file path. Behind the scenes, this function looks for the
requested file in your package distribution and/or downloads the file to your
cache before returning the appropriate path.
Data directories that MUST exist as part of the package distribution are:
pypeit/data/spectrographs/keck_deimos/gain_ronoise
- class pypeit.pkg.pypeitdata.PypeItDataPath(subdirs, remote_host=None)[source]
Bases:
objectConvenience class that enables a general interface between a pypeit data directory and the rest of the code, regardless of whether or not the directory is expected to leverage the cache system for package-installed (as opposed to GitHub/source-installed) versions.
- Parameters:
subdirs (
str, Path) – The subdirectories within the mainpypeit/datadirectory that contains the data.remote_host (
str, optional) – The remote host for the data. By definition, all files in this data path must have the same host. Currently must be None,'github', or's3_cloud'. If None, all the files in this path are expected to be local to any pypeit installation.
- Variables:
- __truediv__(p)[source]
Instantiate a new path object that points to a subdirectory using the (true) division operator,
/.This operation should only be used for contents that exist on the users distribution. I.e., any file that is distributed using the cache should not use this; use
get_file_path()instead.- Parameters:
p (
str, Path) – A sub-directory or file within thepathdirectory. If this is a sub-directory, a newPypeItDataPathobject is returned. If a file, a Path object is returned.- Returns:
Path to contents of
path, where the output type is based on the type ofp.- Return type:
- Raises:
PypeItPathError – Raised if the requested contents do not exist.
- static _get_file_path_return(f, return_format, format=None)[source]
Convience function that formats the return of
get_file_path(), depending on whether or not the “format” of the file is requested.- Parameters:
f (Path) – The file path to return.
return_format (
bool) – If True, parse and return the file suffix (e.g., ‘fits’). If False, only the file path is returned.format (
str) – Ifreturn_formatis True, override (i.e. do not parse) the format from the file name, and use this string instead. Ignored if None orreturn_formatis False.
- Returns:
The file path and, if requested, the file format. See the arguments list.
- Return type:
Path, tuple
- get_file_path(data_file, force_update=False, to_pkg=None, return_format=False, return_none=False)[source]
Return the path to a file.
The file must either exist locally or be downloadable from the
host. To define a path to a file that does not meet these criteria, useself.path / data_file.If
data_fileis a valid path to a file or is a file withinpath, the full path is returned. Otherwise, it is assumed that the file is accessible remotely in the GitHub repository and can be downloaded usingfetch_remote_file(). Note,data_filemust be a file, not a subdirectory withinpath.Throughout the code base, this is the main function that should be used to obtain paths to files within
path. I.e., for any data in thepypeit/datadirectory, developers should be usingp = path.get_file_path(file)instead ofp = path / fileto define the path to a data file.- Parameters:
force_update (
bool, optional) – If the file is in the cache, force astropy.utils.data.download_file to update the cache by downloading the latest version.to_pkg (
str, optional) – If the file is in the cache, this argument affects how the cached file is connected to the package installation. If'symlink', a symbolic link is created in the package directory tree that points to the cached file. If'move', the cached file is moved (not copied) from the cache into the package directory tree. If anything else (including None), no operation is performed; no warning is issued if the value ofto_pkgis not one of these three options (None,'symlink', or'move'). This argument is ignored ifdata_fileis a value path or a file withinpath.return_format (
bool, optional) – If True, the returned object is atuplethat includes the file path and its format (e.g.,'fits'). If False, only the file path is returned.return_none (
bool, optional) – If True, return None if the file does not exist. If False, an error is raised if the file does not exist.
- Returns:
The file path and, if requested, the file format; see
return_format.- Return type:
Path, tuple
- glob(pattern)[source]
Search for all contents of
paththat match the provided search string; see Path.glob.Important
This method only works for files that are on-disk in the correct directory. It does not return any files that are in the cache or that have not yet been downloaded into the cache.
- Parameters:
pattern (
str) – Search string with wild cards.- Returns:
Generator object that provides contents matching the search string.
- Return type:
generator
- class pypeit.pkg.pypeitdata.PypeItDataPaths[source]
Bases:
objectList of hardwired data path objects, primarily for developers.
The top-level directory for all attributes is
pypeit/data. All of these directories should, at minimum, include a README file that is version-controlled and hosted by GitHub. I.e., the code assumes these paths exist, and maintaining a version-controlled README ensures that is true, even if the directory is empty otherwise.- defined_paths = {'arc_plot': {'host': None, 'path': 'arc_lines/plots'}, 'arclines': {'host': None, 'path': 'arc_lines'}, 'extinction': {'host': None, 'path': 'extinction'}, 'filters': {'host': None, 'path': 'filters'}, 'line_lists': {'host': None, 'path': 'line_lists'}, 'linelist': {'host': None, 'path': 'arc_lines/lists'}, 'nist': {'host': 'github', 'path': 'arc_lines/NIST'}, 'pixelflat': {'host': 'github', 'path': 'pixelflats'}, 'reid_arxiv': {'host': 'github', 'path': 'arc_lines/reid_arxiv'}, 'sensfunc': {'host': 'github', 'path': 'sensfuncs'}, 'skisim': {'host': 'github', 'path': 'skisim'}, 'sky_spec': {'host': None, 'path': 'sky_spec'}, 'spectrographs': {'host': None, 'path': 'spectrographs'}, 'standards': {'host': 'github', 'path': 'standards'}, 'static_calibs': {'host': None, 'path': 'static_calibs'}, 'tel_model': {'host': None, 'path': 'telluric/models'}, 'telgrid': {'host': 's3_cloud', 'path': 'telluric/atm_grids'}, 'tests': {'host': 'github', 'path': 'tests'}}
Dictionary providing the metadata for all the paths defined by the class.
- classmethod github_paths()[source]
Return the subset paths hosted on GitHub.
- Returns:
A dictionary with the same format as
defined_paths, but only includes those paths hosted on GitHub.- Return type:
- classmethod remote_paths()[source]
Return the subset paths with data hosted remotely.
- Returns:
A dictionary with the same format as
defined_paths, but only includes those paths with data hosted remotely.- Return type:
- classmethod s3_paths()[source]
Return the subset paths hosted on aws s3.
- Returns:
A dictionary with the same format as
defined_paths, but only includes those paths hosted on aws s3.- Return type: