pypeit.cache module

PypeIt uses the astropy.utils.data caching system to limit the size of its package distribution in PyPI by enabling on-demand downloading of reference files needed for specific data-reduction steps. This module provides the low-level utility functions that interface with the cache.

Access to the data files are handled in the code base using the PypeItDataPaths object instantiated every time PypeIt is imported.

To get the location of your pypeit cache (by default ~/.pypeit/cache) you can run:

import astropy.config.paths
print(astropy.config.paths.get_cache_dir('pypeit'))

Note

If the hostname URL for the telluric atmospheric grids on S3 changes, the only place that needs to change is the file pypeit/data/s3_url.txt.

pypeit.cache._build_remote_url(f_name, f_type, remote_host=None)[source]

Build the remote URL for the astropy.utils.data functions

This function keeps the URL-creation in one place. In the event that files are moved from GitHub or S3_Cloud, this is the only place that would need to be changed.

Parameters:

f_name (str) – The base filename to search for
f_type (str) – The subdirectory of pypeit/data/ in which to find the file (e.g., arc_lines/reid_arxiv or sensfuncs)
remote_host (str, optional) – The remote host scheme. Currently only ‘github’ and ‘s3_cloud’ are supported. Defaults to None.

Returns:

url (str) – The URL of the f_name of f_type on server remote_host
sources (list or None) – For ‘s3_cloud’, the list of URLs to actually try, passed to astropy.utils.data.download_file, used in the event that the S3 location changes. We maintain the static URL for the name to prevent re-downloading large data files in the event the S3 location changes (but the file itself is unchanged). If None (e.g. for ‘github’), then astropy.utils.data.download_file is unaffected, and the url (above) is what controls the download.

pypeit.cache._get_s3_hostname()[source]

Get the current S3 hostname from the package file

Since the S3 server hostname used to hold package data such as telluric atmospheric grids may change periodically, we keep the current hostname in a separate file (pypeit/data/s3_url.txt), and pull the current version from the PypeIt release branch whenever needed.

Note

When/if the S3 URL changes, the release branch version of pypeit/data/s3_url.txt can be updated easily with a hotfix PR, and this routine will pull it.

If GitHub cannot be reached, the routine uses the version of pypeit/data/s3_url.txt included with the package distribution.

Returns:: The current hostname URL of the S3 server holding package data
Return type:: str

pypeit.cache.fetch_remote_file(filename, filetype, remote_host='github', install_script=False, force_update=False, full_url=None, return_none=False)[source]

Use astropy.utils.data to fetch file from remote or cache

The function download_file() will first look in the local cache (the option cache=True is used with this function to retrieve downloaded files from the cache, as needed) before downloading the file from the remote server.

The remote file can be forcibly downloaded through the use of force_update.

Parameters:

filename (str) – The base filename to search for
filetype (str) – The subdirectory of pypeit/data/ in which to find the file (e.g., arc_lines/reid_arxiv or sensfuncs)
remote_host (str, optional) – The remote host scheme. Currently only ‘github’ and ‘s3_cloud’ are supported. Defaults to ‘github’.
install_script (bool, optional) – This function is being called from an install script (i.e., pypeit_install_telluric) – relates to warnings displayed. Defaults to False.
force_update (bool, optional) – Force astropy.utils.data.download_file to update the cache by downloading the latest version. Defaults to False.
full_url (str, optional) – The full url. If None, use _build_remote_url()). Defaults to None.
return_none (bool, optional) – Return None if the file is not found. Defaults to False.

Returns:

The local path to the desired file in the cache

Return type:

Path

pypeit.cache.git_branch()[source]

Return the name/hash of the currently checked out branch

Returns:: Branch name or hash. Defaults to “develop” if PypeIt is not currently in a repository or pygit2 is not installed.
Return type:: str

pypeit.cache.git_most_recent_tag()[source]

Return the version number for the most recent tag and the date of its last commit.

Returns:: The version number and a ISO format string with the date of the last commit included in the tag. If pygit2 is not installed or no tags are found, the returned version is the same as pypeit.__version__ and the date is None.
Return type:: tuple

pypeit.cache.git_remote_path()[source]

The main path to the GitHub repository.

This defaults to the main repository if the repository cannot be defined (see git_repo()) or if the “origin” remote URL cannot be determined.

Returns:: Remote path
Return type:: str

pypeit.cache.git_repo()[source]: Get a reference to the local repository, if possible.

pypeit.cache.github_contents(repo, branch, path, recursive=True)[source]

(Recursively) Acquire a listing of the contents of a repository directory.

Parameters:

repo (github.Repository) – Repository to search
branch (str) – Name of the branch or commit hash
path (str) – Path relative to the top-level directory of the repository to search.
recursive (bool, optional) – Flag to search the directory recursively. If False, subdirectory names are included in the list of returned objects. If True, subdirectories are removed from the listing and replaced by their contents; in this case the list of all objects should only include repository files.

Returns:

A list of github.ContentFile objects with the repo contents.

Return type:

list

pypeit.cache.list_cache_contents(contents)[source]

Print the list of cache contents

Parameters:: contents (dict) – A dictionary with key-value pairs that provide the original source url (key) and the path to the local file (value). This can be generated using search_cache().

pypeit.cache.parse_cache_url(url)[source]

Parse a URL from the cache into its relevant components.

Parameters:

url (str) – URL of a file in the pypeit cache. A valid cache URL must include either 'github' or 's3.cloud' in its address.

Returns:

host (str) – Host name, either 'github' or 's3_cloud'. None if the url is not valid.
fork (str) – Fork name. None if the url is not valid or if the host is 's3_cloud'.
branch (str) – Branch name. None if the url is not valid or if the host is 's3_cloud'.
dir (str) – Directory name. None if the url is not valid.
file (str) – File name. None if the url is not valid.

pypeit.cache.remove_from_cache(cache_url=None, pattern=None, allow_multiple=False)[source]

Remove a previously downloaded file from the pypeit-specific astropy.utils.data cache.

To specify the file, the full URL can be provided or a name used in a cache search.

Parameters:

cache_url (list, str, optional) – One or more URLs in the cache to be deleted (if they exist in the cache). If allow_multiple is False, this must be a single string.
pattern (str, optional) – A pattern to use when searching the cache for the relevant file(s). If allow_mulitple is False, this must return a single file, otherwise the function will issue a warning and nothing will be deleted.
allow_multiple (bool, optional) – If the search pattern yields multiple results, remove them all.

pypeit.cache.search_cache(pattern, path_only=True)[source]

Search the cache for items matching a pattern string.

This function searches the PypeIt cache for files whose URL keys contain the input pattern, and returns the local filesystem path to those files.

Parameters:

pattern (str) – The pattern to match within the file name of the source url. This can be None, meaning that the full contents of the cache is returned. However, note that setting pattern to None and path_only=True may not be very useful given the abstraction of the file names.
path_only (bool, optional) – Only return the path(s) to the files found in the cache. If False, a dictionary is returned where each key is the source url, and the value is the local path.

Returns:

If path_only is True, this is a list of local paths for the objects whose normal filenames match the pattern. Otherwise, this is a dictionary with keys matching the original source url, and the value set to the local path.

Return type:

list, dict

pypeit.cache.write_file_to_cache(filename, cachename, filetype, remote_host='github')[source]

Use astropy.utils.data to save local file to cache

This function writes a local file to the PypeIt cache as if it came from a remote server. This is useful for being able to use locally created or separately downloaded files in place of PypeIt-distributed versions.

Parameters:

filename (str) – The filename of the local file to save
cachename (str) – The name of the cached version of the file
filetype (str) – The subdirectory of pypeit/data/ in which to find the file (e.g., arc_lines/reid_arxiv or sensfuncs)
remote_host (str, optional) – The remote host scheme. Currently only ‘github’ and ‘s3_cloud’ are supported. Defaults to ‘github’.