pypeit.core.pca module

Implement principle-component-analysis tools.

pypeit.core.pca.fit_pca_coefficients(coeff, order, ivar=None, weights=None, function='legendre', lower=3.0, upper=3.0, maxrej=1, maxiter=25, coo=None, minx=None, maxx=None, debug=False)[source]

Fit a parameterized function to a set of PCA coefficients, primarily for the purpose of predicting coefficients at intermediate locations.

The coefficients of each PCA component are fit by a low-order polynomial, where the abscissa is set by the coo argument (see robust_fit()).

Note

This is a general function, not really specific to the PCA; and is really just a wrapper for robust_fit().

Parameters:
  • coeff (numpy.ndarray) – PCA component coefficients. If the PCA decomposition used \(N_{\rm comp}\) components for \(N_{\rm vec}\) vectors, the shape of this array must be \((N_{\rm vec}, N_{\rm comp})\). The array can be 1D with shape \((N_{\rm vec},)\) if there was only one PCA component.

  • order (int, numpy.ndarray) – The order, \(o\), of the function used to fit the PCA coefficients. Can be a single number for all PCA components, or an array with an order specific to each component. If the latter, the shape must be \((N_{\rm comp},)\).

  • ivar (numpy.ndarray, optional) – Inverse variance in the PCA coefficients to use during the fit; see the invvar parameter of robust_fit(). If None, fit is not error weighted. If a vector with shape \((N_{\rm vec},)\), the same error will be assumed for all PCA components (i.e., ivar will be expanded to match the shape of coeff). If a 2D array, the shape must match coeff.

  • weights (numpy.ndarray, optional) – Weights to apply to the PCA coefficients during the fit; see the weights parameter of robust_fit(). If None, the weights are uniform. If a vector with shape \((N_{\rm vec},)\), the same weights will be assumed for all PCA components (i.e., weights will be expanded to match the shape of coeff). If a 2D array, the shape must match coeff.

  • function (str, optional) – Type of function used to fit the data.

  • lower (float, optional) – Number of standard deviations used for rejecting data below the mean residual. If None, no rejection is performed. See robust_fit().

  • upper (float, optional) – Number of standard deviations used for rejecting data above the mean residual. If None, no rejection is performed. See robust_fit().

  • maxrej (int, optional) – Maximum number of points to reject during fit iterations. See robust_fit().

  • maxiter (int, optional) – Maximum number of rejection iterations allows. To force no rejection iterations, set to 0.

  • coo (numpy.ndarray, optional) – Floating-point array with the independent coordinates to use when fitting the PCA coefficients. If None, simply uses a running number. Shape must be \((N_{\rm vec},)\).

  • minx (float, optional) – Minimum and maximum values used to rescale the independent axis data. If None, the minimum and maximum values of coo are used. See robust_fit().

  • maxx (float, optional) – Minimum and maximum values used to rescale the independent axis data. If None, the minimum and maximum values of coo are used. See robust_fit().

  • debug (bool, optional) – Show plots useful for debugging.

Returns:

One or more PypeItFit instances, one per PCA component, that models the PCA component coefficients as a function of the reference coordinates. These can be used to predict new vectors that follow the PCA model at a new coordinate; see pca_predict().

Return type:

numpy.ndarray

pypeit.core.pca.pca_decomposition(vectors, npca=None, pca_explained_var=99.0, mean=None)[source]

Perform principle-component analysis (PCA) for a set of 1D vectors.

The vectors are first passed to an unconstrained PCA to determine the growth curve of the accounted variance as a function of the PCA component. If specifying a number of PCA components to use (see npca), this yields the percentage of the variance accounted for in the analysis. If instead specifying the target variance percentage (see pca_explained_var), this is used to determine the number of PCA components to use in the final analysis.

Note

This is a fully generalized convenience function for a specific use of sklearn.decomposition.PCA. When used within PypeIt, the vectors to decompose typically have the length of the spectral axis. This means that, within PypeIt, arrays are typically transposed when passed to this function.

Parameters:
  • vectors (numpy.ndarray) – A 2D array with vectors to analyze with shape \((N_{\rm vec}, N_{\rm pix})\). All vectors must be the same length and cannot be masked.

  • npca (bool, optional) – The number of PCA components to keep, which must be less than \(N_{\rm vec}\). If npca==nvec, no PCA compression occurs. If None, npca is automatically determined by calculating the minimum number of components required to explain a given percentage of variance in the data. (see pca_explained_var).

  • pca_explained_var (float, optional) – The percentage (i.e., not the fraction) of the variance in the data accounted for by the PCA used to truncate the number of PCA coefficients to keep (see npca). Ignored if npca is provided directly.

  • mean (numpy.ndarray, optional) – The mean value of each vector to subtract from the data before performing the PCA. If None, this is determined directly from the data. Shape must be \(N_{\rm vec}\).

Returns:

  • The coefficients of each PCA component, coeffs. Shape is \((N_{\rm vec},N_{\rm comp})\).

  • The PCA component vectors, components. Shape is \((N_{\rm comp},N_{\rm pix})\).

  • The mean offset of each PCA for each pixel, pca_mean. Shape is \((N_{\rm pix},)\).

  • The mean offset applied to each vector before the PCA, vec_mean. Shape is \((N_{\rm vec},)\).

To reconstruct the PCA representation of the input vectors, compute:

np.dot(coeffs, components) + pca_mean[None,:] + vec_mean[:,None]

Return type:

Returns four numpy.ndarray objects

pypeit.core.pca.pca_predict(x, pca_coeffs_model, pca_components, pca_mean, mean)[source]

Use a model of the PCA coefficients to predict vectors at the specified coordinates.

Parameters:
  • x (float, numpy.ndarray) – One or more trace coordinates at which to sample the PCA coefficients and produce the PCA-driven model. As used within PypeIt, this is typically the spatial pixel coordinate or echelle order number.

  • pca_coeffs_model (numpy.ndarray) – An array of PypeItFit objects, one PCA component, used to calculate the PCA coefficients at the provided position, x. See fit_pca_coefficients().

  • pca_components (numpy.ndarray) – Vectors with the PCA components. Shape must be \((N_{\rm comp}, N_{\rm pix})\).

  • pca_mean (numpy.ndarray) – The mean offset of the PCA decomposition for each pixel. Shape is \((N_{\rm pix},)\).

  • mean (float, numpy.ndarray) – The mean offset of each trace coordinate to use for the PCA prediction. This is typically identical to x, and its shape must match x.

Returns:

PCA constructed vectors, one per position x. Shape is either \((N_{\rm pix},)\) or \((N_{\rm x},N_{\rm pix})\), depending on the input shape/type of x.

Return type:

numpy.ndarray