pypeit.data.utils module

Data utilities for built-in PypeIt data files

Note

If the hostname URL for the telluric atmospheric grids on S3 changes, the only place that needs to change is the file s3_url.txt.


Implementation Documentation

This module contains the organization scheme for the pypeit/data files needed by the PypeIt package. Any routine in the package that needs to load a data file stored in this directory should use the paths supplied by this module and not call, e.g. importlib.resources.files or attempt to otherwise directly access the package directory structure. In this way, if structural changes to this directory are needed, only this module need be modified and the remainder of the package can remain ignorant of those changes and continue to call the paths supplied by this module.

Furthermore, all paths returned by this module are pathlib.Path objects rather than pure strings, with all of the functionality therein contained.

Most (by number) of the package data files here are distributed with the PypeIt package and are accessed via the Paths class. For instance, the NIR spectrophotometry for Vega is accessed via:

vega_file = data.Paths.standards / 'vega_tspectool_vacuum.dat'

For some directories, however, the size of the included files is large enough that it was beginning to cause problems with distributing the package via PyPI. For these specific directories, the data is still stored in the GitHub repository but is not distributed with the PyPI package. In order to access and use these files, we use the AstroPy download/cache system, and specific functions (get_*_filepath()) are required to interact with these files. Currently, the directories covered by the AstroPy download/cache system are:

  • arc_lines/reid_arxiv

  • skisim

  • sensfuncs

From time to time, it may be necessary to add additional files/directories to the AstroPy download/cache system. In this case, there is a particular sequence of steps required. The caching routines look for remote-hosted data files in either the develop tree or a tagged version tree (e.g., 1.8.0) of the repository, any new files must be already present in the repo before testing a new get_*_filepath() routine. Order of operations is:

  1. Add any new remote-hosted files to the GitHub repo via a separate PR that also modifies MANIFEST.in to exclude these files from the distributed package.

  2. Create a new get_*_filepath() function in this module, following the example of one of the existing functions. Elsewhere in PypeIt, load the needed file by invoking the new get_*_filepath() function. An example of this can be found in pypeit/core/flux_calib.py where get_skisim_filepath() is called to locate sky transmission files.

If new package-included data are added that are not very large (total directory size < a few MB), it is not necessary to use the AstroPy cache/download system. In this case, simply add the directory path to the Paths class and access the enclosed files similarly to the Vega example above.

class pypeit.data.utils.Paths[source]

Bases: object

List of hardwired paths within the pypeit.data module

Each @property method returns a pathlib.Path object

_data = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/pypeit/envs/release/lib/python3.9/site-packages/pypeit/data')
class property arc_plot: Path

Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

class property arclines: Path

Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

static check_isdir(path: Path) Path[source]

Check that the hardwired directory exists

If yes, return the directory path, else raise an error message

class property data: Path

Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

class property extinction: Path

Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

class property filters: Path

Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

class property linelist: Path

Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

class property nist: Path

Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

class property reid_arxiv: Path

Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

class property sensfuncs: Path

Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

class property skisim: Path

Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

class property sky_spec: Path

Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

class property spectrographs: Path

Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

class property standards: Path

Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

class property static_calibs: Path

Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

class property tel_model: Path

Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

class property telgrid: Path

Path subclass for non-Windows systems.

On a POSIX system, instantiating a Path should return this object.

pypeit.data.utils.fetch_remote_file(filename: str, filetype: str, remote_host: str = 'github', install_script: bool = False, force_update: bool = False, full_url: str | None = None) Path[source]

Use astropy.utils.data to fetch file from remote or cache

The function download_file() will first look in the local cache (the option cache=True is used with this function to retrieve downloaded files from the cache, as needed) before downloading the file from the remote server.

The remote file can be forcibly downloaded through the use of force_update.

Parameters:
  • filename (str) – The base filename to search for

  • filetype (str) – The subdirectory of pypeit/data/ in which to find the file (e.g., arc_lines/reid_arxiv or sensfuncs)

  • remote_host (str, optional) – The remote host scheme. Currently only ‘github’ and ‘s3_cloud’ are supported. Defaults to ‘github’].

  • install_script (bool, optional) – This function is being called from an install script (i.e., pypeit_install_telluric) – relates to warnings displayed. Defaults to False.

  • force_update (bool, optional) – Force astropy.utils.data.download_file() to update the cache by downloading the latest version. Defaults to False.

  • full_url (str, optional) – The full url (i.e., skip _build_remote_url()) Defaults to None.

Returns:

The local path to the desired file in the cache

Return type:

pathlib.Path

pypeit.data.utils.get_extinctfile_filepath(extinction_file: str) Path[source]

Return the full path to the extinction file

Unlike other get_*_filepath() functions, the extinction files are included with the PyPI distribution since they are small text files. The purpose of this function is to be able to load in user-installed extinction files from observatories not already included. Users may self-install such files using the pypeit_install_extinctfile script.

Parameters:

extinction_file (str) – The base filename of the extinction file to be located

Returns:

The full path to the extinction file

Return type:

pathlib.Path

pypeit.data.utils.get_linelist_filepath(linelist_file: str) Path[source]

Return the full path to the linelist file

It is desired to allow users to utilize their own arc line lists for wavelength calibration without modifying the distributed version of the package. We can utilize the astropy download/cache system added previously to include this functionality.

Using the script pypeit_install_linelist, custom arc line lists can be installed into the PypeIt cache (nominally ~/.pypeit/cache), and are not placed into the package directory itself.

Given the line list filename, this function checks first for the existance of the file in the package directory, then checks the PypeIt cache. For all built-in line lists, this function returns the file location within the package directory. For user-supplied lists that were installed using the script, this function returns the location within the cache.

The cache keeps a hash of the file URL, which contains the PypeIt version number. As users update to newer versions, the linelist files must be reinstalled using the included script.

Parameters:

linelist_file (str) – The base filename of the linelist file to be located

Returns:

The full path to the linelist file

Return type:

pathlib.Path

pypeit.data.utils.get_reid_arxiv_filepath(arxiv_file: str) tuple[pathlib.Path, str][source]

Return the full path to the reid_arxiv file

In an attempt to reduce the size of the PypeIt package as distributed on PyPI, the reid_arxiv files are not longer distributed with the package. The collection of files are hosted remotely, and only the reid_arxiv files needed by a particular user are downloaded to the local machine.

This function checks for the local existance of the reid_arxiv file, and downloads it from the remote server using AstroPy’s download_file() function. The file downloaded in this fashion is kept in the PypeIt cache (nominally ~/.pypeit/cache) and is not placed into the package directory itself.

The cache keeps a hash of the file URL, which contains the PypeIt version number. As users update to newer versions, the reid_arxiv files will be downloaded again (matching the new version #) to catch any changes.

As most users will need only a small number of reid_arxiv files for thier particular reductions, the remote fetch will only occur once per file (per version of PypeIt).

Parameters:

arxiv_file (str) – The base filename of the reid_arxiv file to be located

Returns:

The full path and whether the path is in the cache:

  • reid_path (Path): The full path to the reid_arxiv file

  • arxiv_fmt (str): The extension of the reid_arxiv file (format)

Return type:

tuple

pypeit.data.utils.get_sensfunc_filepath(sensfunc_file: str, symlink_in_pkgdir: bool = False) Path[source]

Return the full path to the sensfunc file

In an attempt to reduce the size of the PypeIt package as distributed on PyPI, the sensfunc files are not longer distributed with the package. The collection of files are hosted remotely, and only the sensfunc files needed by a particular user are downloaded to the local machine.

This function checks for the local existance of the sensfunc file, and downloads it from the remote server using AstroPy’s download_file() function. The file downloaded in this fashion is kept in the PypeIt cache (nominally ~/.pypeit/cache) and is not placed into the package directory itself.

The cache keeps a hash of the file URL, which contains the PypeIt version number. As users update to newer versions, the sensfunc files will be downloaded again (matching the new version #) to catch any changes.

As most users will need only a small number of sensfunc files for thier particular reductions, the remote fetch will only occur once per file (per version of PypeIt).

Parameters:
  • sensfunc_file (str) – The base filename of the sensfunc file to be located

  • symlink_in_pkgdir (bool, optional) – Create a symlink (with the canonical filename) in the package directory pointing to the cached downloaded file. Defaults to False.

Returns:

The full path to the sensfunc file

Return type:

pathlib.Path

pypeit.data.utils.get_skisim_filepath(skisim_file: str) Path[source]

Return the full path to the skisim file

In an attempt to reduce the size of the PypeIt package as distributed on PyPI, the skisim files are not longer distributed with the package. The collection of files are hosted remotely, and only the skisim files needed by a particular user are downloaded to the local machine.

This function checks for the local existance of the skisim file, and downloads it from the remote server using AstroPy’s download_file() function. The file downloaded in this fashion is kept in the PypeIt cache (nominally ~/.pypeit/cache) and is not placed into the package directory itself.

The cache keeps a hash of the file URL, which contains the PypeIt version number. As users update to newer versions, the skisim files will be downloaded again (matching the new version #) to catch any changes.

As most users will need only a small number of skisim files for thier particular reductions, the remote fetch will only occur once per file (per version of PypeIt).

Parameters:

skisim_file (str) – The base filename of the skisim file to be located

Returns:

The full path to the skisim file

Return type:

pathlib.Path

pypeit.data.utils.get_telgrid_filepath(telgrid_file: str) Path[source]

Return the full path to the telgrid file

Atmospheric Telluric Grid files are not part of the PypeIt package itself due to their large (~4-8GB) size. These files are hosted remotely (see the PyepIt documentation), and only the telgrid files needed by a particular user are downloaded to the local machine.

This function checks for the local existance of the telgrid file, and downloads it from the remote server using AstroPy’s download_file() function. The file downloaded in this fashion is kept in the PypeIt cache (nominally ~/.pypeit/cache) and is not placed into the package directory itself.

As most users will need only a small number of telgrid files for thier particular reductions, the remote fetch will only occur once per file.

Parameters:

telgrid_file (str) – The base filename of the telgrid file to be located

Returns:

The full path to the telgrid file

Return type:

pathlib.Path

pypeit.data.utils.load_sky_spectrum(sky_file: str) XSpectrum1D[source]

Load a sky spectrum into an XSpectrum1D object

NOTE: This is where the path to the data directory is added!

Todo

Try to eliminate the XSpectrum1D dependancy

Parameters:

sky_file (str) – The filename (NO PATH) of the sky file to use.

Returns:

Sky spectrum

Return type:

(linetools.spectra.xspectrum1d.XSpectrum1D)

pypeit.data.utils.load_telluric_grid(filename: str)[source]

Load a telluric atmospheric grid

NOTE: This is where the path to the data directory is added!

Parameters:

filename (str) – The filename (NO PATH) of the telluric atmospheric grid to use.

Returns:

Telluric Grid FITS HDU list

Return type:

(astropy.io.fits.HDUList)

pypeit.data.utils.load_thar_spec()[source]

Load the archived ThAr spectrum

NOTE: This is where the path to the data directory is added!

Parameters:

filename (str) – The filename (NO PATH) of the telluric atmospheric grid to use.

Returns:

ThAr Spectrum FITS HDU list

Return type:

(astropy.io.fits.HDUList)

pypeit.data.utils.search_cache(pattern_str: str) list[pathlib.Path][source]

Search the cache for items matching a pattern string

This function searches the PypeIt cache for files whose URL keys contain the input pattern_str, and returns the local filesystem path to those files.

Parameters:

pattern_str (str) – The filename pattern to match

Returns:

The list of local paths for the objects whose normal filenames match the pattern_str.

Return type:

list

pypeit.data.utils.write_file_to_cache(filename: str, cachename: str, filetype: str, remote_host: str = 'github')[source]

Use astropy.utils.data to save local file to cache

This function writes a local file to the PypeIt cache as if it came from a remote server. This is useful for being able to use locally created or separately downloaded files in place of PypeIt-distributed versions.

Parameters:
  • filename (str) – The filename of the local file to save

  • cachename (str) – The name of the cached version of the file

  • filetype (str) – The subdirectory of pypeit/data/ in which to find the file (e.g., arc_lines/reid_arxiv or sensfuncs)

  • remote_host (str, optional) – The remote host scheme. Currently only ‘github’ and ‘s3_cloud’ are supported. Defaults to ‘github’.