pypeit.cache module
PypeIt uses the astropy.utils.data caching system to limit the size of its package distribution in PyPI by enabling on-demand downloading of reference files needed for specific data-reduction steps. This module provides the low-level utility functions that interface with the cache.
Access to the data files are handled in the code base using the
PypeItDataPaths
object instantiated every time
PypeIt is imported.
To get the location of your pypeit cache (by default ~/.pypeit/cache
) you
can run:
import astropy.config.paths
print(astropy.config.paths.get_cache_dir('pypeit'))
Note
If the hostname URL for the telluric atmospheric grids on S3 changes, the
only place that needs to change is the file pypeit/data/s3_url.txt
.
- pypeit.cache._build_remote_url(f_name: str, f_type: str, remote_host: str = None)[source]
Build the remote URL for the astropy.utils.data functions
This function keeps the URL-creation in one place. In the event that files are moved from GitHub or S3_Cloud, this is the only place that would need to be changed.
- Parameters:
- Returns:
url (str) – The URL of the
f_name
off_type
on serverremote_host
sources (
list
orNone
) – For ‘s3_cloud’, the list of URLs to actually try, passed to astropy.utils.data.download_file, used in the event that the S3 location changes. We maintain the static URL for the name to prevent re-downloading large data files in the event the S3 location changes (but the file itself is unchanged). If None (e.g. for ‘github’), then astropy.utils.data.download_file is unaffected, and theurl
(above) is what controls the download.
- pypeit.cache._get_s3_hostname() str [source]
Get the current S3 hostname from the package file
Since the S3 server hostname used to hold package data such as telluric atmospheric grids may change periodically, we keep the current hostname in a separate file (
pypeit/data/s3_url.txt
), and pull the current version from the PypeItrelease
branch whenever needed.Note
When/if the S3 URL changes, the
release
branch version ofpypeit/data/s3_url.txt
can be updated easily with a hotfix PR, and this routine will pull it.If GitHub cannot be reached, the routine uses the version of
pypeit/data/s3_url.txt
included with the package distribution.- Returns:
The current hostname URL of the S3 server holding package data
- Return type:
- pypeit.cache.fetch_remote_file(filename: str, filetype: str, remote_host: str = 'github', install_script: bool = False, force_update: bool = False, full_url: str = None, return_none: bool = False) Path [source]
Use astropy.utils.data to fetch file from remote or cache
The function
download_file()
will first look in the local cache (the optioncache=True
is used with this function to retrieve downloaded files from the cache, as needed) before downloading the file from the remote server.The remote file can be forcibly downloaded through the use of
force_update
.- Parameters:
filename (str) – The base filename to search for
filetype (str) – The subdirectory of
pypeit/data/
in which to find the file (e.g.,arc_lines/reid_arxiv
orsensfuncs
)remote_host (
str
, optional) – The remote host scheme. Currently only ‘github’ and ‘s3_cloud’ are supported. Defaults to ‘github’.install_script (
bool
, optional) – This function is being called from an install script (i.e.,pypeit_install_telluric
) – relates to warnings displayed. Defaults to False.force_update (
bool
, optional) – Force astropy.utils.data.download_file to update the cache by downloading the latest version. Defaults to False.full_url (
str
, optional) – The full url. If None, use_build_remote_url()
). Defaults to None.return_none (
bool
, optional) – Return None if the file is not found. Defaults to False.
- Returns:
The local path to the desired file in the cache
- Return type:
- pypeit.cache.git_branch()[source]
Return the name/hash of the currently checked out branch
- Returns:
Branch name or hash. Defaults to “develop” if PypeIt is not currently in a repository or pygit2 is inot installed.
- Return type:
- pypeit.cache.git_most_recent_tag()[source]
Return the version number for the most recent tag and the date of its last commit.
- Returns:
The version number and a ISO format string with the date of the last commit included in the tag. If
pygit2
is not installed or no tags are found, the returned version is the same aspypeit.__version__
and the date is None.- Return type:
- pypeit.cache.github_contents(repo, branch, path, recursive=True)[source]
(Recursively) Acquire a listing of the contents of a repository directory.
- Parameters:
repo (github.Repository) – Repository to search
branch (
str
) – Name of the branch or commit hashpath (
str
) – Path relative to the top-level directory of the repository to search.recursive (
bool
, optional) – Flag to search the directory recursively. If False, subdirectory names are included in the list of returned objects. If True, subdirectories are removed from the listing and replaced by their contents; in this case the list of all objects should only include repository files.
- Returns:
A list of github.ContentFile objects with the repo contents.
- Return type:
- pypeit.cache.parse_cache_url(url)[source]
Parse a URL from the cache into its relevant components.
- Parameters:
url (
str
) – URL of a file in the pypeit cache. A valid cache URL must include either'github'
or's3.cloud'
in its address.- Returns:
A tuple of four strings parsed from the URL. If the URL is not considered a valid cache URL, all elements of the tuple are None. The parsed elements of the url are: (1) the host name, which will be either
'github'
or's3_cloud'
, (2) the branch name, which will be None when the host is's3_cloud'
, (3) the subdirectory ofpypeit/data/
in which to find the file (e.g.,arc_lines/reid_arxiv
orsensfuncs
), and (4) the file name.- Return type:
- pypeit.cache.remove_from_cache(cache_url=None, pattern=None, allow_multiple=False)[source]
Remove a previously downloaded file from the pypeit-specific astropy.utils.data cache.
To specify the file, the full URL can be provided or a name used in a cache search.
- Parameters:
cache_url (
list
,str
, optional) – One or more URLs in the cache to be deleted (if they exist in the cache). Ifallow_multiple
is False, this must be a single string.pattern (
str
, optional) – A pattern to use when searching the cache for the relevant file(s). Ifallow_mulitple
is False, this must return a single file, otherwise the function will issue a warning and nothing will be deleted.allow_multiple (
bool
, optional) – If the search pattern yields multiple results, remove them all.
- pypeit.cache.search_cache(pattern: str, path_only=True)[source]
Search the cache for items matching a pattern string.
This function searches the PypeIt cache for files whose URL keys contain the input
pattern
, and returns the local filesystem path to those files.- Parameters:
pattern (
str
) – The pattern to match within the file name of the source url. This can be None, meaning that the full contents of the cache is returned. However, note that settingpattern
to None andpath_only=True
may not be very useful given the abstraction of the file names.path_only (
bool
, optional) – Only return the path(s) to the files found in the cache. If False, a dictionary is returned where each key is the source url, and the value is the local path.
- Returns:
If
path_only
is True, this is alist
of local paths for the objects whose normal filenames match thepattern
. Otherwise, this is a dictionary with keys matching the original source url, and the value set to the local path.- Return type:
- pypeit.cache.write_file_to_cache(filename: str, cachename: str, filetype: str, remote_host: str = 'github')[source]
Use astropy.utils.data to save local file to cache
This function writes a local file to the PypeIt cache as if it came from a remote server. This is useful for being able to use locally created or separately downloaded files in place of PypeIt-distributed versions.
- Parameters:
filename (str) – The filename of the local file to save
cachename (str) – The name of the cached version of the file
filetype (str) – The subdirectory of
pypeit/data/
in which to find the file (e.g.,arc_lines/reid_arxiv
orsensfuncs
)remote_host (
str
, optional) – The remote host scheme. Currently only ‘github’ and ‘s3_cloud’ are supported. Defaults to ‘github’.