xmca.array.MCA¶
- class xmca.array.MCA(*fields)¶
Bases:
object
Perform MCA on two
numpy.ndarray
.MCA is a more general form of Principal Component Analysis (PCA) for two input fields (left, right). If both data fields are the same, it is equivalent to PCA.
- __init__(*fields)¶
Load data fields and store information about data size/shape.
- Parameters
left (ndarray) – Left input data. First dimension needs to be time.
right (ndarray, optional) – Right input data. First dimension needs to be time. If none is provided, automatically, right field is assumed to be the same as left field. In this case, MCA reducdes to normal PCA. The default is None.
Examples
Let left and right be some geophysical fields (e.g. SST and SLP). To perform PCA on left use:
>>> from xmca.array import MCA >>> pca = MCA(left) >>> pca.solve() >>> exp_var = pca.explained_variance() >>> pcs = pca.pcs() >>> eofs = pca.eofs()
To perform MCA on left and right use:
>>> mca = MCA(left, right) >>> mca.solve() >>> exp_var = mca.explained_variance() >>> pcs = mca.pcs() >>> eofs = mca.eofs()
Methods
__init__
(*fields)Load data fields and store information about data size/shape.
apply_weights
([left, right])Apply weights to the left and/or right field.
bootstrapping
(n_runs[, n_modes, axis, ...])Perform Monte Carlo bootstrapping on model.
Return the correlation matrix of PCs.
eofs
([n, scaling, phase_shift, rotated])Return the first n EOFs.
explained_variance
([n])Return the CF of the first n modes.
fields
([original_scale])Return left (and right) input field.
heterogeneous_patterns
([n, phase_shift])homogeneous_patterns
([n, phase_shift])load_analysis
(path[, fields, eofs, ...])Load a model.
norm
([n, sorted])Return L2 norm of first n loaded singular vectors.
Normalize each time series by its standard deviation.
pcs
([n, scaling, phase_shift, rotated])Return the first n PCs.
plot
(mode[, threshold, phase_shift, ...])Plot results for mode.
predict
([left, right, n, scaling, phase_shift])Predict PCs of new data.
reconstructed_fields
([mode, original_scale])rotate
(n_rot[, power, tol])Perform Promax rotation on the first n EOFs.
rotation_matrix
([inverse_transpose])Return the rotation matrix used for rotation.
rule_n
(n_runs[, n_modes])Apply Rule N by Overland and Preisendorfer, 1982.
rule_north
([n])Uncertainties of singular values based on North's rule of thumb.
save_plot
(mode[, path, plot_kwargs, save_kwargs])Create and save a plot to local disk.
scf
([n])Return the SCF of the first n modes.
set_field_names
([left, right])Set the name of the left and/or right field.
singular_values
([n])Return the first n singular_values.
solve
([complexify, extend, period])Call the solver to perform EOF analysis/MCA.
spatial_amplitude
([n, scaling, rotated])Return the spatial amplitude fields for the first n EOFs.
spatial_phase
([n, phase_shift, rotated])Return the spatial phase fields for the first n EOFs.
summary
()Return meta information of the performed analysis.
temporal_amplitude
([n, scaling, rotated])Return the temporal amplitude time series for the first n PCs.
temporal_phase
([n, phase_shift, rotated])Return the temporal phase function for the first n PCs.
truncate
(n)Truncate the solution to the first n modes.
variance
([n, sorted])Return variance of first n loaded singular vectors.
- apply_weights(left=None, right=None)¶
Apply weights to the left and/or right field.
- Parameters
left (ndarray) – Weights for left field.
right (ndarray) – Weights for right field.
Examples
Let left and right be some input data (e.g. SST and precipitation).
>>> left = np.random.randn(100, 30) >>> right = np.random.randn(100, 40)
Call constructor, apply weights and then solve:
>>> left_weights = np.random.randn(1, 30) # some random weights >>> right_weights = np.random.randn(1, 40) >>> mca = MCA(left, right) >>> mca.apply_weights(lw, rw) >>> mca.solve()
- bootstrapping(n_runs, n_modes=20, axis=0, on_left=True, on_right=False, block_size=1, replace=True, strategy='standard', disable_progress=False)¶
Perform Monte Carlo bootstrapping on model.
Monte Carlo simulations allow to assess the signifcance of the obtained singular values and hence modes by re-performing the analysis on synthetic sample data. Using bootstrapping the synthetic data is created by resampling the original data.
- Parameters
n_runs (int) – Number of Monte Carlo simulations.
n_modes (int) – Number of modes to be returned. By default return all modes.
axis (int) – Whether to resample along time (axis=0) or in space (axis=1). The default is 0.
on_left (bool) – Whether to resample the left field. True by default.
on_right (bool) – Whether to resample the right field. False by default.
block_size (int) – Resamples blocks of data of size block_size. This is particular useful when there is strong autocorrelation (e.g. annual cycle) which would be destroyed under standard bootstrapping. This procedure is known as moving-block bootstrapping. By default block size is 1.
replace (bool) – Whether to resample with (bootstrap) or without replacement (permutation). True by default (bootstrap).
strategy (['standard', 'iterative']) – Whether to perform standard or iterative permutation. Standard permutation typically is overly conservative since it estimates the entire singular value spectrum at once. Iterative approach is more realistic taking into account each singular value before estimating the next one. The iterative approach usually takes much more time. Consult Winkler et al. (2020) for more details on the iterative approach.
disable_progress (bool) – Whether to disable progress bar or not. By default False.
- Returns
2d-array containing the singular values for each Monte Carlo simulation.
- Return type
np.ndarray
References
Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. Chapman and Hall. 436 pp.
Winkler, A. M., Renaud, O., Smith, S. M. & Nichols, T. E. Permutation inference for canonical correlation analysis. NeuroImage 220, 117065 (2020).
- correlation_matrix()¶
Return the correlation matrix of PCs.
For non-rotated and Varimax-rotated solutions the correlation matrix is equivalent to the unit matrix.
- Returns
Correlation matrix.
- Return type
ndarray
- eofs(n=None, scaling='None', phase_shift=0, rotated=True)¶
Return the first n EOFs.
Depending on the model the PCs may be real or complex, rotated or unrotated. Depending on the rotation type (Varimax/Proxmax), the PCs may be correlated.
- Parameters
n (int, optional) – Number of EOFs to be returned. The default is None.
scaling ({'None', 'eigen', 'max', 'std'}, optional) – Scale by square root of eigenvalues (‘eigen’), maximum value (‘max’) or standard deviation (‘std’).
phase_shift (float, optional) – If complex, apply a phase shift to the EOFs. Default is 0.
rotated (boolean, optional) – When rotation was performed, True returns the rotated PCs while False returns the unrotated (original) EOFs. Default is True.
- Returns
EOFs associated to left and right input field.
- Return type
dict[ndarray, ndarray]
- explained_variance(n=None)¶
Return the CF of the first n modes.
The covariance fraction (CF) is a measure of importance of each mode. It is calculated as the singular values divided by the sum of all singluar values.
- Parameters
n (int, optional) – Number of modes to return. The default is all.
- Returns
Fraction of described covariance of each mode.
- Return type
ndarray
- fields(original_scale=False)¶
Return left (and right) input field.
- Parameters
original_scale (boolean, optional) – If True, decenter and denormalize (if normalized) the input fields to obtain the original unit scale. Default is False.
- Returns
Fields associated to left and right input field.
- Return type
dict[ndarray, ndarray]
- load_analysis(path, fields=None, eofs=None, singular_values=None)¶
Load a model.
This method allows to load a models which was saved by .save_analysis().
- Parameters
path (str) – Location of the .info file created by .save_analysis().
fields (ndarray) – The original input fields.
eofs (ndarray) – The obtained EOFs.
singular_values (ndarray) – The obtained singular values.
- norm(n=None, sorted=True)¶
Return L2 norm of first n loaded singular vectors.
- Parameters
n (int, optional) – Number of modes to return. The default will return all modes.
- Returns
L2 norm associated to each mode and vector.
- Return type
dict[str, ndarray]
- normalize()¶
Normalize each time series by its standard deviation.
- pcs(n=None, scaling='None', phase_shift=0, rotated=True)¶
Return the first n PCs.
Depending on the model the PCs may be real or complex, rotated or unrotated. Depending on the rotation type (Varimax/Proxmax), the PCs may be correlated.
- Parameters
n (int, optional) – Number of PCs to be returned. By default return all.
scaling ({'None', 'eigen', 'max', 'std'}, optional) – Scale PCs by square root of eigenvalues (‘eigen’), maximum value (‘max’) or standard deviation (‘std’).
phase_shift (float, optional) – If complex, apply a phase shift to the PCs. Default is 0.
rotated (boolean, optional) – When rotation was performed, True returns the rotated PCs while False returns the unrotated (original) PCs. Default is True.
- Returns
PCs associated to left and right input field.
- Return type
dict[ndarray, ndarray]
- plot(mode, threshold=0, phase_shift=0, cmap_eof=None, cmap_phase=None, figsize=(8.3, 5.0))¶
Plot results for mode.
- Parameters
mode (int, optional) – Mode to plot. The default is 1.
threshold (int, optional) – Amplitude threshold below which the fields are masked out. The default is 0.
phase_shift (float, optional) – If complex, apply a phase shift to the shown results. Default is 0.
cmap_eof (str or Colormap) – The colormap used to map the spatial patterns. The default is ‘Blues’.
cmap_phase (str or Colormap) – The colormap used to map the spatial phase function. The default is ‘twilight’.
figsize (tuple) – Figure size provided to plt.figure().
- predict(left=None, right=None, n=None, scaling='None', phase_shift=0)¶
Predict PCs of new data.
left and right are projected on the left and right singular vectors. If rotation was performed, the predicted PCs will be rotated as well.
- Parameters
left (ndarray) – Description of parameter left.
right (ndarray) – Description of parameter right.
n (int) – Number of PC modes to return. If None, return all modes. The default is None.
scaling ({'None', 'eigen', 'max', 'std'}, optional) – Scale PCs by square root of eigenvalues (‘eigen’), maximum value (‘max’) or standard deviation (‘std’).
phase_shift (float, optional) – If complex, apply a phase shift to the temporal phase. Default is 0.
- Returns
Predicted PCs associated to left and right input field.
- Return type
dict[ndarray, ndarray]
- rotate(n_rot, power=1, tol=1e-08)¶
Perform Promax rotation on the first n EOFs.
Promax rotation (Hendrickson & White 1964) is an oblique rotation which seeks to find simple structures in the EOFs. It transforms the EOFs via an orthogonal Varimax rotation (Kaiser 1958) followed by the Promax equation. If power=1, Promax reduces to Varimax rotation. In general, a Promax transformation breaks the orthogonality of EOFs and introduces some correlation between PCs.
- Parameters
n_rot (int) – Number of EOFs to rotate. For values below 2, nothing is done.
power (int, optional) – Power of Promax rotation. The default is 1 (= Varimax).
tol (float, optional) – Tolerance of rotation process. The default is 1e-5.
- Raises
ValueError – If number of rotations are <2.
- Returns
- Return type
None.
- rotation_matrix(inverse_transpose=False)¶
Return the rotation matrix used for rotation.
For non-rotated solutions the rotation matrix equals the unit matrix.
- Parameters
inverse_transpose (boolean) – If True, return the inverse transposed of the rotation matrix. For orthogonal rotations (Varimax) it is the same as the rotation matrix. The default is False.
- Returns
Rotation matrix
- Return type
ndarray
- rule_n(n_runs, n_modes=None)¶
Apply Rule N by Overland and Preisendorfer, 1982.
The aim of Rule N is to provide a rule of thumb for the significance of the obtained singular values via Monte Carlo simulations of uncorrelated Gaussian random variables. The obtained singular values are scaled such that their sum equals the sum of true singular value spectrum.
- Parameters
n_runs (int) – Number of Monte Carlo simulations.
n_modes (int) – Number of singular values to return.
- Returns
Singular values obtained by Rule N.
- Return type
DataArray
References
- Overland, J.E., Preisendorfer, R.W., 1982. A significance test for
principal components applied to a cyclone climatology. Mon. Weather Rev. 110, 1–4.
- rule_north(n=None)¶
Uncertainties of singular values based on North’s rule of thumb.
In case of compex PCA/MCA, the rule of thumb includes another factor of spqrt(2) according to Horel 1984.
- Parameters
n (int, optional) – Number of modes to be returned. By default return all.
- Returns
Uncertainties associated to singular values.
- Return type
ndarray
References
North, G, T L. Bell, R Cahalan, and F J. Moeng. 1982. “Sampling Errors in the Estimation of Empirical Orthogonal Functions.” Monthly Weather Review 110. https://doi.org/10.1175/1520-0493(1982)110<0699:SEITEO>2.0.CO;2.
Horel, JD. 1984. “Complex Principal Component Analysis: Theory and Examples.” Journal of Climate and Applied Meteorology 23 (12): 1660–73. https://doi.org/10.1175/1520-0450(1984)023<1660:CPCATA>2.0.CO;2.
- save_plot(mode, path=None, plot_kwargs={}, save_kwargs={})¶
Create and save a plot to local disk.
- Parameters
mode (int) – Mode to plot.
path (str) – Path where to save the plot. If none is provided, an automatic name will be generated based on the mode number.
plot_kwargs (dict) – Additional parameters provided to xmca.array.plot.
save_kwargs (dict) – Additional parameters provided to matplotlib.pyplot.savefig.
- scf(n=None)¶
Return the SCF of the first n modes.
The squared covariance fraction (SCF) is a measure of importance of each mode. It is calculated as the squared singular values divided by the sum of all squared singluar values.
- Parameters
n (int, optional) – Number of modes to return. The default is all.
- Returns
Fraction of described squared covariance of each mode.
- Return type
ndarray
- set_field_names(left='left', right='right')¶
Set the name of the left and/or right field.
Field names will be reflected when results are plotted or saved.
- Parameters
left (string) – Name of the left field. (the default is ‘left’).
right (string) – Name of the right field. (the default is ‘right’).
- singular_values(n=None)¶
Return the first n singular_values.
- Parameters
n (int, optional) – Number of singular_values to return. The default is 5.
- Returns
Singular values of the obtained solution.
- Return type
ndarray
- solve(complexify=False, extend=False, period=1)¶
Call the solver to perform EOF analysis/MCA.
Under the hood the method performs singular value decomposition on the covariance matrix.
- Parameters
complexify (boolean, optional) – Use Hilbert transform to complexify the input data fields in order to perform complex PCA/MCA. Default is false.
extend (['exp', 'theta', False], optional) – If specified, time series are extended by fore/backcasting based on either an exponential or a Theta model. Artificially extending the time series sometimes helps to reduce spectral leakage inherent to the Hilbert transform when time series are not stationary. Only used for complex time series i.e. when omplexify=True. Default is False.
period (float, optional) – If Theta model, it represents the number of time steps for a season. If exponential model, it represents the number of time steps for the exponential to decrease to 1/e. If no extension is selected, this parameter has no effect. Default is 1.
- spatial_amplitude(n=None, scaling='None', rotated=True)¶
Return the spatial amplitude fields for the first n EOFs.
- Parameters
n (int, optional) – Number of amplitude fields to be returned. If None, return all fields. The default is None.
scaling ({'None', 'max'}, optional) – Scale by maximum value (‘max’). The default is None.
rotated (boolean, optional) – When rotation was performed, True returns the rotated spatial amplitudes while False returns the unrotated (original) spatial amplitudes. Default is True.
- Returns
Spatial amplitude fields associated to left and right field.
- Return type
dict[ndarray, ndarray]
- spatial_phase(n=None, phase_shift=0, rotated=True)¶
Return the spatial phase fields for the first n EOFs.
- Parameters
n (int, optional) – Number of phase fields to return. If None, all fields are returned. The default is None.
phase_shift (float, optional) – If complex, apply a phase shift to the spatial phase. Default is 0.
rotated (boolean, optional) – When rotation was performed, True returns the rotated spatial phases while False returns the unrotated (original) spatial phases. Default is True.
- Returns
Spatial phase fields associated to left and right field.
- Return type
dict[ndarray, ndarray]
- summary()¶
Return meta information of the performed analysis.
- temporal_amplitude(n=None, scaling='None', rotated=True)¶
Return the temporal amplitude time series for the first n PCs.
- Parameters
n (int, optional) – Number of amplitude series to be returned. If None, return all series. The default is None.
scaling ({'None', 'max'}, optional) – Scale by maximum value (‘max’). The default is None.
rotated (boolean, optional) – When rotation was performed, True returns the rotated temporal amplitudes while False returns the unrotated (original) temporal amplitudes. Default is True.
- Returns
amplitudes – Temporal ampliude series associated to left and right field.
- Return type
dict[ndarray, ndarray]
- temporal_phase(n=None, phase_shift=0, rotated=True)¶
Return the temporal phase function for the first n PCs.
- Parameters
n (int, optional) – Number of phase functions to return. If none, return all series. The default is None.
phase_shift (float, optional) – If complex, apply a phase shift to the temporal phase. Default is 0.
rotated (boolean, optional) – When rotation was performed, True returns the rotated temporal phases while False returns the unrotated (original) temporal phases. Default is True.
- Returns
amplitudes – Temporal phase function associated to left and right field.
- Return type
dict[ndarray, ndarray]
- truncate(n)¶
Truncate the solution to the first n modes.
This may be helpful when the full model takes up to much space to be saved.
- Parameters
n (int) – Number of modes to be retained.
- variance(n=None, sorted=True)¶
Return variance of first n loaded singular vectors.
- Parameters
n (int, optional) – Number of modes to return. The default will return all modes.
- Returns
Variance of each mode and vector.
- Return type
dict[str, ndarray]