pypeit.pypeitdata module

PypeIt uses the astropy.utils.data caching system to limit the size of its package distribution in PyPI by enabling on-demand downloading of reference files needed for specific data-reduction steps. This module provides the class used to access to the data files in the code base.

The cache module implements the low-level function used to interface with the PypeIt cache. To get the location of your pypeit cache (by default ~/.pypeit/cache) you can run:

import astropy.config.paths
print(astropy.config.paths.get_cache_dir('pypeit'))

Every time PypeIt is imported, a new instance of PypeItDataPaths is created, and this instance is used to set paths to PypeIt data files. “Data files” in this context essentially refers to anything in the pypeit/data directory tree; see the attributes of PypeItDataPaths for the list of directories that can be directly accessed.

Some of files in these directories are included in the package distribution, but most are not. Regardless, the PypeItDataPaths object should be used to define the relevant file paths. For example, to access a given NIST line list, one would run:

from pypeit import dataPaths
thar = dataPaths.nist.get_file_path('ThAr_vacuum.ascii')

All of the attributes of PypeItDataPaths are PypeItDataPath objects, such that the code above is effectively equivalent to:

from pypeit.pypeitdata import PypeItDataPath
thar = PypeItDataPath('arc_lines/NIST').get_file_path('ThAr_vacuum.ascii')

Although PypeItDataPath objects can be treated similarly to Path objects, you should always use the get_file_path() function to access the relevant file path. Behind the scenes, this function looks for the requested file in your package distribution and/or downloads the file to your cache before returning the appropriate path.

Data directories that MUST exist as part of the package distribution are:

  • pypeit/data/spectrographs/keck_deimos/gain_ronoise

class pypeit.pypeitdata.PypeItDataPath(subdirs, remote_host=None)[source]

Bases: object

Convenience class that enables a general interface between a pypeit data directory and the rest of the code, regardless of whether or not the directory is expected to leverage the cache system for package-installed (as opposed to GitHub/source-installed) versions.

Parameters:
  • subdirs (str, Path) – The subdirectories within the main pypeit/data directory that contains the data.

  • remote_host (str, optional) – The remote host for the data. By definition, all files in this data path must have the same host. Currently must be None, 'github', or 's3_cloud'. If None, all the files in this path are expected to be local to any pypeit installation.

host

String representing the remote host

Type:

str

subdirs

The subdirectory path within the pypeit/data directory.

Type:

str

data

Path to the top-level data directory on the user’s system.

Type:

Path

path

Path to the specific data directory.

Type:

Path

__repr__()[source]

Provide a string representation of the path. Mimics pathlib.

__truediv__(p)[source]

Instantiate a new path object that points to a subdirectory using the (true) division operator, /.

This operation should only be used for contents that exist on the users distribution. I.e., any file that is distributed using the cache should not use this; use get_file_path() instead.

Parameters:

p (str, Path) – A sub-directory or file within the path directory. If this is a sub-directory, a new PypeItDataPath object is returned. If a file, a Path object is returned.

Returns:

Path to contents of path, where the output type is based on the type of p.

Return type:

PypeItDataPath, Path

Raises:

PypeItPathError – Raised if the requested contents do not exist.

static _get_file_path_return(f, return_format, format=None)[source]

Convience function that formats the return of get_file_path(), depending on whether or not the “format” of the file is requested.

Parameters:
  • f (Path) – The file path to return.

  • return_format (bool) – If True, parse and return the file suffix (e.g., ‘fits’). If False, only the file path is returned.

  • format (str) – If return_format is True, override (i.e. do not parse) the format from the file name, and use this string instead. Ignored if None or return_format is False.

Returns:

The file path and, if requested, the file format. See the arguments list.

Return type:

Path, tuple

static _parse_format(f)[source]

Parse the file format, ignoring gz extensions.

Parameters:

f (Path) – File path to parse.

Returns:

The extension of the file that should indicate its format. Any '.gz' extensions are stripped.

Return type:

str

static check_isdir(path: Path) Path[source]

Check that the hardwired directory exists.

Parameters:

path (Path) – The path to check. This must be a directory (not a file).

Returns:

The input path is returned if it is valid.

Return type:

Path

Raises:

PypeItPathError – Raised if the path does not exist or is not a directory.

get_file_path(data_file, force_update=False, to_pkg=None, return_format=False, return_none=False, quiet=False)[source]

Return the path to a file.

The file must either exist locally or be downloadable from the host. To define a path to a file that does not meet these criteria, use self.path / data_file.

If data_file is a valid path to a file or is a file within path, the full path is returned. Otherwise, it is assumed that the file is accessible remotely in the GitHub repository and can be downloaded using fetch_remote_file(). Note, data_file must be a file, not a subdirectory within path.

Throughout the code base, this is the main function that should be used to obtain paths to files within path. I.e., for any data in the pypeit/data directory, developers should be using p = path.get_file_path(file) instead of p = path / file to define the path to a data file.

Parameters:
  • data_file (str, Path) – File name or path. See above.

  • force_update (bool, optional) – If the file is in the cache, force astropy.utils.data.download_file to update the cache by downloading the latest version.

  • to_pkg (str, optional) – If the file is in the cache, this argument affects how the cached file is connected to the package installation. If 'symlink', a symbolic link is created in the package directory tree that points to the cached file. If 'move', the cached file is moved (not copied) from the cache into the package directory tree. If anything else (including None), no operation is performed; no warning is issued if the value of to_pkg is not one of these three options (None, 'symlink', or 'move'). This argument is ignored if data_file is a value path or a file within path.

  • return_format (bool, optional) – If True, the returned object is a tuple that includes the file path and its format (e.g., 'fits'). If False, only the file path is returned.

  • return_none (bool, optional) – If True, return None if the file does not exist. If False, an error is raised if the file does not exist.

  • quiet (bool, optional) – Suppress messages

Returns:

The file path and, if requested, the file format; see return_format.

Return type:

Path, tuple

glob(pattern)[source]

Search for all contents of path that match the provided search string; see Path.glob.

Important

This method only works for files that are on-disk in the correct directory. It does not return any files that are in the cache or that have not yet been downloaded into the cache.

Parameters:

pattern (str) – Search string with wild cards.

Returns:

Generator object that provides contents matching the search string.

Return type:

generator

class pypeit.pypeitdata.PypeItDataPaths[source]

Bases: object

List of hardwired data path objects, primarily for developers.

The top-level directory for all attributes is pypeit/data. All of these directories should, at minimum, include a README file that is version-controlled and hosted by GitHub. I.e., the code assumes these paths exist, and maintaining a version-controlled README ensures that is true, even if the directory is empty otherwise.

defined_paths = {'arc_plot': {'host': None, 'path': 'arc_lines/plots'}, 'arclines': {'host': None, 'path': 'arc_lines'}, 'extinction': {'host': None, 'path': 'extinction'}, 'filters': {'host': None, 'path': 'filters'}, 'linelist': {'host': None, 'path': 'arc_lines/lists'}, 'nist': {'host': 'github', 'path': 'arc_lines/NIST'}, 'pixelflat': {'host': 'github', 'path': 'pixelflats'}, 'reid_arxiv': {'host': 'github', 'path': 'arc_lines/reid_arxiv'}, 'sensfunc': {'host': 'github', 'path': 'sensfuncs'}, 'skisim': {'host': 'github', 'path': 'skisim'}, 'sky_spec': {'host': None, 'path': 'sky_spec'}, 'spectrographs': {'host': None, 'path': 'spectrographs'}, 'standards': {'host': 'github', 'path': 'standards'}, 'static_calibs': {'host': None, 'path': 'static_calibs'}, 'tel_model': {'host': None, 'path': 'telluric/models'}, 'telgrid': {'host': 's3_cloud', 'path': 'telluric/atm_grids'}, 'tests': {'host': 'github', 'path': 'tests'}}

Dictionary providing the metadata for all the paths defined by the class.

classmethod github_paths()[source]

Return the subset paths hosted on GitHub.

Returns:

A dictionary with the same format as defined_paths, but only includes those paths hosted on GitHub.

Return type:

dict

classmethod remote_paths()[source]

Return the subset paths with data hosted remotely.

Returns:

A dictionary with the same format as defined_paths, but only includes those paths with data hosted remotely.

Return type:

dict

classmethod s3_paths()[source]

Return the subset paths hosted on aws s3.

Returns:

A dictionary with the same format as defined_paths, but only includes those paths hosted on aws s3.

Return type:

dict