xmca.xarray.xMCA¶
- class xmca.xarray.xMCA(*fields)¶
Bases:
xmca.array.MCA
Perform MCA on two
xarray.DataArray
.MCA is a more general form of Principal Component Analysis (PCA) for two input fields (left, right). If both data fields are the same, it is equivalent to PCA.
- __init__(*fields)¶
Load data fields and store information about data size/shape.
- Parameters
left (ndarray) – Left input data. First dimension needs to be time. The default is None.
right (ndarray, optional) – Right input data. First dimension needs to be time. If none is provided, automatically, right field is assumed to be the same as left field. In this case, MCA reducdes to normal PCA. The default is None.
Examples
Let left and right be some geophysical fields (e.g. SST and SLP). To perform PCA on left use:
>>> from xmca.array import MCA >>> pca = MCA(left) >>> pca.solve() >>> exp_var = pca.explained_variance() >>> pcs = pca.pcs() >>> eofs = pca.eofs()
To perform MCA on left and right use:
>>> mca = MCA(left, right) >>> mca.solve() >>> exp_var = mca.explained_variance() >>> pcs = mca.pcs() >>> eofs = mca.eofs()
Methods
__init__
(*fields)Load data fields and store information about data size/shape.
Apply area correction to higher latitudes.
apply_weights
(**weights)Apply weights to the left and/or right field.
bootstrapping
(n_runs[, n_modes, axis, ...])Perform Monte Carlo bootstrapping on model.
Return the correlation matrix of PCs.
eofs
([n, scaling, phase_shift, rotated])Return the first n EOFs.
explained_variance
([n])Return the CF of the first n modes.
fields
([original_scale])Return left (and right) input field.
heterogeneous_patterns
([n, phase_shift])Return left and right heterogeneous correlation maps.
homogeneous_patterns
([n, phase_shift])Return left and right homogeneous correlation maps.
load_analysis
(path[, engine])Load a model.
norm
([n, sorted])Return L2 norm of first n loaded singular vectors.
Normalize each time series by its standard deviation.
pcs
([n, scaling, phase_shift, rotated])Return first n PCs.
plot
(mode[, threshold, phase_shift, ...])Plot results for mode.
predict
([left, right, n, scaling, phase_shift])Predict PCs of new data.
reconstructed_fields
([mode, original_scale])Reconstruct original input fields based on specified `mode`s.
rotate
(n_rot[, power, tol])Perform Promax rotation on the first n EOFs.
rotation_matrix
([inverse_transpose])Return the rotation matrix used for rotation.
rule_n
(n_runs[, n_modes])Apply Rule N by Overland and Preisendorfer, 1982.
rule_north
([n])Uncertainties of singular values based on North's rule of thumb.
save_analysis
([path, engine])Save analysis including netcdf data and meta information.
save_plot
(mode[, path, plot_kwargs, save_kwargs])Create and save a plot to local disk.
scf
([n])Return the SCF of the first n modes.
set_field_names
([left, right])Set the name of the left and/or right field.
singular_values
([n])Return first n singular values of the SVD.
solve
([complexify, extend, period])Call the solver to perform EOF analysis/MCA.
spatial_amplitude
([n, scaling, rotated])Return the spatial amplitude fields for the first n EOFs.
spatial_phase
([n, phase_shift, rotated])Return the spatial phase fields for the first n EOFs.
summary
()Return meta information of the performed analysis.
temporal_amplitude
([n, scaling, rotated])Return the temporal amplitude functions for the first n PCs.
temporal_phase
([n, phase_shift, rotated])Return the temporal phase function for the first n PCs.
truncate
(n)Truncate the solution to the first n modes.
variance
([n, sorted])Return variance of first n loaded singular vectors.
- apply_coslat()¶
Apply area correction to higher latitudes.
- apply_weights(**weights)¶
Apply weights to the left and/or right field.
- Parameters
left (ndarray) – Weights for left field.
right (ndarray) – Weights for right field.
Examples
Let left and right be some input data (e.g. SST and precipitation).
>>> left = np.random.randn(100, 30) >>> right = np.random.randn(100, 40)
Call constructor, apply weights and then solve:
>>> left_weights = np.random.randn(1, 30) # some random weights >>> right_weights = np.random.randn(1, 40) >>> mca = MCA(left, right) >>> mca.apply_weights(lw, rw) >>> mca.solve()
- bootstrapping(n_runs, n_modes=20, axis=0, on_left=True, on_right=False, block_size=1, replace=True, strategy='standard', disable_progress=False)¶
Perform Monte Carlo bootstrapping on model.
Monte Carlo simulations allow to assess the signifcance of the obtained singular values and hence modes by re-performing the analysis on synthetic sample data. Using bootstrapping the synthetic data is created by resampling the original data.
- Parameters
n_runs (int) – Number of Monte Carlo simulations.
n_modes (int) – Number of modes to be returned. By default return all modes.
axis (int) – Whether to resample along time (axis=0) or in space (axis=1). The default is 0.
on_left (bool) – Whether to resample the left field. True by default.
on_right (bool) – Whether to resample the right field. False by default.
block_size (int) – Resamples blocks of data of size block_size. This is particular useful when there is strong autocorrelation (e.g. annual cycle) which would be destroyed under standard bootstrapping. This procedure is known as moving-block bootstrapping. By default block size is 1.
replace (bool) – Whether to resample with (bootstrap) or without replacement (permutation). True by default (bootstrap).
strategy (['standard', 'iterative']) – Whether to perform standard or iterative permutation. Standard permutation typically is overly conservative since it estimates the entire singular value spectrum at once. Iterative approach is more realistic taking into account each singular value before estimating the next one. The iterative approach usually takes much more time. Consult Winkler et al. (2020) for more details on the iterative approach.
disable_progress (bool) – Whether to disable progress bar or not. By default False.
- Returns
2d-array containing the singular values for each Monte Carlo simulation.
- Return type
np.ndarray
References
Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. Chapman and Hall. 436 pp.
Winkler, A. M., Renaud, O., Smith, S. M. & Nichols, T. E. Permutation inference for canonical correlation analysis. NeuroImage 220, 117065 (2020).
- correlation_matrix()¶
Return the correlation matrix of PCs.
For non-rotated and Varimax-rotated solutions the correlation matrix is equivalent to the unit matrix.
- Returns
Correlation matrix.
- Return type
ndarray
- eofs(n=None, scaling='None', phase_shift=0, rotated=True)¶
Return the first n EOFs.
- Parameters
n (int, optional) – Number of EOFs to return If none, all EOFs are returned. The default is None.
scaling ({None, 'eigen', 'max', 'std'}, optional) – Scale by square root of singular values (‘eigen’), maximum value (‘max’) or standard deviation (‘std’). The default is None.
phase_shift (float, optional) – If complex, apply a phase shift to the EOFs. Default is 0.
rotated (boolean, optional) – When rotation was performed, True returns the rotated EOFs while False returns the unrotated (original) EOFs. Default is True.
- Returns
EOFs associated to left and right input field.
- Return type
dict[DataArray, DataArray]
- explained_variance(n=None)¶
Return the CF of the first n modes.
The covariance fraction (CF) is a measure of importance of each mode. It is calculated as the singular values divided by the sum of singular values.
- Parameters
n (int, optional) – Number of modes to return. The default is None.
- Returns
Fraction of described covariance of each mode.
- Return type
DataArray
- fields(original_scale=False)¶
Return left (and right) input field.
- Parameters
original_scale (boolean, optional) – If True, decenter and denormalize (if normalized) the input fields to obtain the original unit scale. Default is False.
- Returns
Fields associated to left and right input field.
- Return type
dict[ndarray, ndarray]
- heterogeneous_patterns(n=None, phase_shift=0)¶
Return left and right heterogeneous correlation maps.
- Parameters
n (int, optional) – Number of patterns (modes) to be returned. If None then all patterns are returned. The default is None.
phase_shift (float, optional) – If complex, apply a phase shift to the heterogeneous patterns. Default is 0.
- Returns
Heterogeneous patterns associated to left and right input field.
- Return type
dict[DataArray, DataArray]
- homogeneous_patterns(n=None, phase_shift=0)¶
Return left and right homogeneous correlation maps.
- Parameters
n (int, optional) – Number of patterns (modes) to be returned. If None then all patterns are returned. The default is None.
phase_shift (float, optional) – If complex, apply a phase shift to the homogeneous patterns. Default is 0.
- Returns
Homogeneous patterns associated to left and right input field.
- Return type
dict[DataArray, DataArray]
- load_analysis(path, engine='h5netcdf')¶
Load a model.
This method allows to load a models which was saved by .save_analysis().
- Parameters
path (str) – Location of the .info file created by .save_analysis().
fields (ndarray) – The original input fields.
eofs (ndarray) – The obtained EOFs.
singular_values (ndarray) – The obtained singular values.
- norm(n=None, sorted=True)¶
Return L2 norm of first n loaded singular vectors.
- Parameters
n (int, optional) – Number of modes to return. By default will return all modes.
- Returns
L2 norm associated to each mode and vector.
- Return type
dict[str, DataArray]
- normalize()¶
Normalize each time series by its standard deviation.
- pcs(n=None, scaling='None', phase_shift=0, rotated=True)¶
Return first n PCs.
- Parameters
n (int, optional) – Number of PCs to return. If none, then all PCs are returned.
None. (The default is) –
scaling ({'None', 'eigen', 'max', 'std'}, optional) – Scale by square root of singular values (‘eigen’), maximum value (‘max’) or standard deviation (‘std’). The default is None.
phase_shift (float, optional) – If complex, apply a phase shift to the PCs. Default is 0.
rotated (boolean, optional) – When rotation was performed, True returns the rotated PCs while False returns the unrotated (original) PCs. Default is True.
- Returns
PCs associated to left and right input field.
- Return type
dict[DataArray, DataArray]
- plot(mode, threshold=0, phase_shift=0, cmap_eof=None, cmap_phase=None, figsize=(8.3, 5.0), resolution='110m', projection=None, orientation='horizontal', land=True)¶
Plot results for mode.
- Parameters
mode (int, optional) – Mode to plot. The default is 1.
threshold ([0,1], optional) – Threshold of max-normalised amplitude below which the fields are masked out. The default is 0.
cmap_eof (str or Colormap) – The colormap used to map the spatial patterns. The default is Blues.
cmap_phase (str or Colormap) – The colormap used to map the spatial phase function. The default is twilight.
figsize (tuple) – Size of figure. Default is (8.3, 5.0).
resolution ({None, '110m', '50m', '10m'}) – A named resolution to use from the Natural Earth dataset. Currently can be one of 110m, 50m, and 10m. If None, no coastlines will be drawn. Default is 110m.
projection (cartopy.crs projection,) – dict of {str : cartopy.crs projection} Projection can be either a valid cartopy projection (cartopy.crs) or a dictionary of different projections with keys ‘left’ and ‘right’. The default is cartopy.crs.PlateCaree.
orientation ({'horizontal', 'vertical'}) – Orientation of the plot. Default is horizontal.
land (boolean) – Turn coloring of land surface on/off.
- Returns
matplotlib.figure.Figure – Figure instance.
dict of matplotlib.axes._subplots.AxesSubplot – Dictionary of axes for left (and right) field containing pcs, eofs and, if complex, phase.
- predict(left=None, right=None, n=None, scaling='None', phase_shift=0)¶
Predict PCs of new data.
left and right are projected on the left and right singular vectors. If rotation was performed, the predicted PCs will be rotated as well.
- Parameters
left (ndarray) – Description of parameter left.
right (ndarray) – Description of parameter right.
n (int) – Number of PC modes to return. If None, return all modes. The default is None.
scaling ({'None', 'eigen', 'max', 'std'}, optional) – Scale PCs by square root of eigenvalues (‘eigen’), maximum value (‘max’) or standard deviation (‘std’).
phase_shift (float, optional) – If complex, apply a phase shift to the temporal phase. Default is 0.
- Returns
Predicted PCs associated to left and right input field.
- Return type
dict[ndarray, ndarray]
- reconstructed_fields(mode=slice(1, None, None), original_scale=True)¶
Reconstruct original input fields based on specified `mode`s.
- Parameters
mode (int, slice) – Modes to be considered for reconstructing the original fields. The default is slice(1, None) which returns the original fields based on all modes.
- Returns
Left and right reconstructed fields.
- Return type
dict[DataArray, DataArray]
- rotate(n_rot, power=1, tol=1e-08)¶
Perform Promax rotation on the first n EOFs.
Promax rotation (Hendrickson & White 1964) is an oblique rotation which seeks to find simple structures in the EOFs. It transforms the EOFs via an orthogonal Varimax rotation (Kaiser 1958) followed by the Promax equation. If power=1, Promax reduces to Varimax rotation. In general, a Promax transformation breaks the orthogonality of EOFs and introduces some correlation between PCs.
- Parameters
n_rot (int) – Number of EOFs to rotate.
power (int, optional) – Power of Promax rotation. The default is 1 (= Varimax).
tol (float, optional) – Tolerance of rotation process. The default is 1e-5.
- Raises
ValueError – If number of rotations are <2.
- Returns
- Return type
None.
- rotation_matrix(inverse_transpose=False)¶
Return the rotation matrix used for rotation.
For non-rotated solutions the rotation matrix equals the unit matrix.
- Parameters
inverse_transpose (boolean) – If True, return the inverse transposed of the rotation matrix. For orthogonal rotations (Varimax) it is the same as the rotation matrix. The default is False.
- Returns
Rotation matrix
- Return type
ndarray
- rule_n(n_runs, n_modes=None)¶
Apply Rule N by Overland and Preisendorfer, 1982.
The aim of Rule N is to provide a rule of thumb for the significance of the obtained singular values via Monte Carlo simulations of uncorrelated Gaussian random variables. The obtained singular values are scaled such that their sum equals the sum of true singular value spectrum.
- Parameters
n_runs (int) – Number of Monte Carlo simulations.
n_modes (int) – Number of singular values to return.
- Returns
Singular values obtained by Rule N.
- Return type
DataArray
References
- Overland, J.E., Preisendorfer, R.W., 1982. A significance test for
principal components applied to a cyclone climatology. Mon. Weather Rev. 110, 1–4.
- rule_north(n=None)¶
Uncertainties of singular values based on North’s rule of thumb.
In case of compex PCA/MCA, the rule of thumb includes another factor of spqrt(2) according to Horel 1984.
- Parameters
n (int, optional) – Number of modes to be returned. By default return all.
- Returns
Uncertainties associated to singular values.
- Return type
ndarray
References
North, G, T L. Bell, R Cahalan, and F J. Moeng. 1982. “Sampling Errors in the Estimation of Empirical Orthogonal Functions.” Monthly Weather Review 110. https://doi.org/10.1175/1520-0493(1982)110<0699:SEITEO>2.0.CO;2.
Horel, JD. 1984. “Complex Principal Component Analysis: Theory and Examples.” Journal of Climate and Applied Meteorology 23 (12): 1660–73. https://doi.org/10.1175/1520-0450(1984)023<1660:CPCATA>2.0.CO;2.
- save_analysis(path=None, engine='h5netcdf')¶
Save analysis including netcdf data and meta information.
- Parameters
path (str, optional) – Path to storage location. If none is provided, the analysis is saved into ./xmca/<left_name>(_<right_name>)/.
engine (str) – h5netcdf is needed for complex values. Otherwise standard netcdf works as well (the default is ‘h5netcdf’).
- save_plot(mode, path=None, plot_kwargs={}, save_kwargs={})¶
Create and save a plot to local disk.
- Parameters
mode (int) – Mode to plot.
path (str) – Path where to save the plot. If none is provided, an automatic name will be generated based on the mode number.
plot_kwargs (dict) – Additional parameters provided to xmca.array.plot.
save_kwargs (dict) – Additional parameters provided to matplotlib.pyplot.savefig.
- scf(n=None)¶
Return the SCF of the first n modes.
The squared covariance fraction (SCF) is a measure of importance of each mode. It is calculated as the squared singular values divided by the sum of squared singular values.
- Parameters
n (int, optional) – Number of modes to return. The default is None.
- Returns
Fraction of described squared covariance of each mode.
- Return type
DataArray
- set_field_names(left='left', right='right')¶
Set the name of the left and/or right field.
Field names will be reflected when results are plotted or saved.
- Parameters
left (string) – Name of the left field. (the default is ‘left’).
right (string) – Name of the right field. (the default is ‘right’).
- singular_values(n=None)¶
Return first n singular values of the SVD.
- Parameters
n (int, optional) – Number of singular values to return. If None, all singular values are returned. The default is None.
- Returns
Singular values of the SVD.
- Return type
DataArray
- solve(complexify=False, extend=False, period=1)¶
Call the solver to perform EOF analysis/MCA.
Under the hood the method performs singular value decomposition on the covariance matrix.
- Parameters
complexify (boolean, optional) – Use Hilbert transform to complexify the input data fields in order to perform complex PCA/MCA. Default is false.
extend (['exp', 'theta', False], optional) – If specified, time series are extended by fore/backcasting based on either an exponential or a Theta model. Artificially extending the time series sometimes helps to reduce spectral leakage inherent to the Hilbert transform when time series are not stationary. Only used for complex time series i.e. when omplexify=True. Default is False.
period (float, optional) – If Theta model, it represents the number of time steps for a season. If exponential model, it represents the number of time steps for the exponential to decrease to 1/e. If no extension is selected, this parameter has no effect. Default is 1.
- spatial_amplitude(n=None, scaling='None', rotated=True)¶
Return the spatial amplitude fields for the first n EOFs.
- Parameters
n (int, optional) – Number of amplitude fields to return. If none, all fields are returned. The default is None.
scaling ({'None', 'max'}, optional) – Scale by maximum value (‘max’). The default is None.
rotated (boolean, optional) – When rotation was performed, True returns the rotated spatial amplitudes while False returns the unrotated (original) spatial amplitudes. Default is True.
- Returns
Spatial amplitudes associated to left and right input field.
- Return type
dict[DataArray, DataArray]
- spatial_phase(n=None, phase_shift=0, rotated=True)¶
Return the spatial phase fields for the first n EOFs.
- Parameters
n (int, optional) – Number of phase fields to return. If none, all fields are returned. The default is None.
scaling ({None, 'max', 'std'}, optional) – Scale by maximum value (‘max’) or standard deviation (‘std’). The default is None.
phase_shift (float, optional) – If complex, apply a phase shift to the spatial phases. Default is 0.
rotated (boolean, optional) – When rotation was performed, True returns the rotated spatial phases while False returns the unrotated (original) spatial phases. Default is True.
- Returns
Spatial phases associated to left and right input field.
- Return type
dict[DataArray, DataArray]
- summary()¶
Return meta information of the performed analysis.
- temporal_amplitude(n=None, scaling='None', rotated=True)¶
Return the temporal amplitude functions for the first n PCs.
- Parameters
n (int, optional) – Number of amplitude functions to return. If none, all functions are returned. The default is None.
scaling ({'None', 'max'}, optional) – Scale by maximum value (‘max’). The default is None.
rotated (boolean, optional) – When rotation was performed, True returns the rotated temporal amplitudes while False returns the unrotated (original) temporal amplitudes. Default is True.
- Returns
PCs associated to left and right input field.
- Return type
dict[DataArray, DataArray]
- temporal_phase(n=None, phase_shift=0, rotated=True)¶
Return the temporal phase function for the first n PCs.
- Parameters
n (int, optional) – Number of phase functions to return. If none, all functions are returned. The default is None.
phase_shift (float, optional) – If complex, apply a phase shift to the temporal phases. Default is 0.
rotated (boolean, optional) – When rotation was performed, True returns the rotated temporal phases while False returns the unrotated (original) temporal phases. Default is True.
- Returns
Temporal phases associated to left and right input field.
- Return type
dict[DataArray, DataArray]
- truncate(n)¶
Truncate the solution to the first n modes.
This may be helpful when the full model takes up to much space to be saved.
- Parameters
n (int) – Number of modes to be retained.
- variance(n=None, sorted=True)¶
Return variance of first n loaded singular vectors.
- Parameters
n (int, optional) – Number of modes to return. By default will return all modes.
- Returns
Variance associated to each mode and vector.
- Return type
dict[str, DataArray]