xmca.xarray.xMCA

class xmca.xarray.xMCA(*fields)

Bases: xmca.array.MCA

Perform MCA on two xarray.DataArray.

MCA is a more general form of Principal Component Analysis (PCA) for two input fields (left, right). If both data fields are the same, it is equivalent to PCA.

__init__(*fields)

Load data fields and store information about data size/shape.

Parameters
  • left (ndarray) – Left input data. First dimension needs to be time. The default is None.

  • right (ndarray, optional) – Right input data. First dimension needs to be time. If none is provided, automatically, right field is assumed to be the same as left field. In this case, MCA reducdes to normal PCA. The default is None.

Examples

Let left and right be some geophysical fields (e.g. SST and SLP). To perform PCA on left use:

>>> from xmca.array import MCA
>>> pca = MCA(left)
>>> pca.solve()
>>> exp_var = pca.explained_variance()
>>> pcs = pca.pcs()
>>> eofs = pca.eofs()

To perform MCA on left and right use:

>>> mca = MCA(left, right)
>>> mca.solve()
>>> exp_var = mca.explained_variance()
>>> pcs = mca.pcs()
>>> eofs = mca.eofs()

Methods

__init__(*fields)

Load data fields and store information about data size/shape.

apply_coslat()

Apply area correction to higher latitudes.

apply_weights(**weights)

Apply weights to the left and/or right field.

bootstrapping(n_runs[, n_modes, axis, ...])

Perform Monte Carlo bootstrapping on model.

correlation_matrix()

Return the correlation matrix of PCs.

eofs([n, scaling, phase_shift, rotated])

Return the first n EOFs.

explained_variance([n])

Return the CF of the first n modes.

fields([original_scale])

Return left (and right) input field.

heterogeneous_patterns([n, phase_shift])

Return left and right heterogeneous correlation maps.

homogeneous_patterns([n, phase_shift])

Return left and right homogeneous correlation maps.

load_analysis(path[, engine])

Load a model.

norm([n, sorted])

Return L2 norm of first n loaded singular vectors.

normalize()

Normalize each time series by its standard deviation.

pcs([n, scaling, phase_shift, rotated])

Return first n PCs.

plot(mode[, threshold, phase_shift, ...])

Plot results for mode.

predict([left, right, n, scaling, phase_shift])

Predict PCs of new data.

reconstructed_fields([mode, original_scale])

Reconstruct original input fields based on specified `mode`s.

rotate(n_rot[, power, tol])

Perform Promax rotation on the first n EOFs.

rotation_matrix([inverse_transpose])

Return the rotation matrix used for rotation.

rule_n(n_runs[, n_modes])

Apply Rule N by Overland and Preisendorfer, 1982.

rule_north([n])

Uncertainties of singular values based on North's rule of thumb.

save_analysis([path, engine])

Save analysis including netcdf data and meta information.

save_plot(mode[, path, plot_kwargs, save_kwargs])

Create and save a plot to local disk.

scf([n])

Return the SCF of the first n modes.

set_field_names([left, right])

Set the name of the left and/or right field.

singular_values([n])

Return first n singular values of the SVD.

solve([complexify, extend, period])

Call the solver to perform EOF analysis/MCA.

spatial_amplitude([n, scaling, rotated])

Return the spatial amplitude fields for the first n EOFs.

spatial_phase([n, phase_shift, rotated])

Return the spatial phase fields for the first n EOFs.

summary()

Return meta information of the performed analysis.

temporal_amplitude([n, scaling, rotated])

Return the temporal amplitude functions for the first n PCs.

temporal_phase([n, phase_shift, rotated])

Return the temporal phase function for the first n PCs.

truncate(n)

Truncate the solution to the first n modes.

variance([n, sorted])

Return variance of first n loaded singular vectors.

apply_coslat()

Apply area correction to higher latitudes.

apply_weights(**weights)

Apply weights to the left and/or right field.

Parameters
  • left (ndarray) – Weights for left field.

  • right (ndarray) – Weights for right field.

Examples

Let left and right be some input data (e.g. SST and precipitation).

>>> left = np.random.randn(100, 30)
>>> right = np.random.randn(100, 40)

Call constructor, apply weights and then solve:

>>> left_weights = np.random.randn(1, 30)  # some random weights
>>> right_weights = np.random.randn(1, 40)
>>> mca = MCA(left, right)
>>> mca.apply_weights(lw, rw)
>>> mca.solve()
bootstrapping(n_runs, n_modes=20, axis=0, on_left=True, on_right=False, block_size=1, replace=True, strategy='standard', disable_progress=False)

Perform Monte Carlo bootstrapping on model.

Monte Carlo simulations allow to assess the signifcance of the obtained singular values and hence modes by re-performing the analysis on synthetic sample data. Using bootstrapping the synthetic data is created by resampling the original data.

Parameters
  • n_runs (int) – Number of Monte Carlo simulations.

  • n_modes (int) – Number of modes to be returned. By default return all modes.

  • axis (int) – Whether to resample along time (axis=0) or in space (axis=1). The default is 0.

  • on_left (bool) – Whether to resample the left field. True by default.

  • on_right (bool) – Whether to resample the right field. False by default.

  • block_size (int) – Resamples blocks of data of size block_size. This is particular useful when there is strong autocorrelation (e.g. annual cycle) which would be destroyed under standard bootstrapping. This procedure is known as moving-block bootstrapping. By default block size is 1.

  • replace (bool) – Whether to resample with (bootstrap) or without replacement (permutation). True by default (bootstrap).

  • strategy (['standard', 'iterative']) – Whether to perform standard or iterative permutation. Standard permutation typically is overly conservative since it estimates the entire singular value spectrum at once. Iterative approach is more realistic taking into account each singular value before estimating the next one. The iterative approach usually takes much more time. Consult Winkler et al. (2020) for more details on the iterative approach.

  • disable_progress (bool) – Whether to disable progress bar or not. By default False.

Returns

2d-array containing the singular values for each Monte Carlo simulation.

Return type

np.ndarray

References

Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. Chapman and Hall. 436 pp.

Winkler, A. M., Renaud, O., Smith, S. M. & Nichols, T. E. Permutation inference for canonical correlation analysis. NeuroImage 220, 117065 (2020).

correlation_matrix()

Return the correlation matrix of PCs.

For non-rotated and Varimax-rotated solutions the correlation matrix is equivalent to the unit matrix.

Returns

Correlation matrix.

Return type

ndarray

eofs(n=None, scaling='None', phase_shift=0, rotated=True)

Return the first n EOFs.

Parameters
  • n (int, optional) – Number of EOFs to return If none, all EOFs are returned. The default is None.

  • scaling ({None, 'eigen', 'max', 'std'}, optional) – Scale by square root of singular values (‘eigen’), maximum value (‘max’) or standard deviation (‘std’). The default is None.

  • phase_shift (float, optional) – If complex, apply a phase shift to the EOFs. Default is 0.

  • rotated (boolean, optional) – When rotation was performed, True returns the rotated EOFs while False returns the unrotated (original) EOFs. Default is True.

Returns

EOFs associated to left and right input field.

Return type

dict[DataArray, DataArray]

explained_variance(n=None)

Return the CF of the first n modes.

The covariance fraction (CF) is a measure of importance of each mode. It is calculated as the singular values divided by the sum of singular values.

Parameters

n (int, optional) – Number of modes to return. The default is None.

Returns

Fraction of described covariance of each mode.

Return type

DataArray

fields(original_scale=False)

Return left (and right) input field.

Parameters

original_scale (boolean, optional) – If True, decenter and denormalize (if normalized) the input fields to obtain the original unit scale. Default is False.

Returns

Fields associated to left and right input field.

Return type

dict[ndarray, ndarray]

heterogeneous_patterns(n=None, phase_shift=0)

Return left and right heterogeneous correlation maps.

Parameters
  • n (int, optional) – Number of patterns (modes) to be returned. If None then all patterns are returned. The default is None.

  • phase_shift (float, optional) – If complex, apply a phase shift to the heterogeneous patterns. Default is 0.

Returns

Heterogeneous patterns associated to left and right input field.

Return type

dict[DataArray, DataArray]

homogeneous_patterns(n=None, phase_shift=0)

Return left and right homogeneous correlation maps.

Parameters
  • n (int, optional) – Number of patterns (modes) to be returned. If None then all patterns are returned. The default is None.

  • phase_shift (float, optional) – If complex, apply a phase shift to the homogeneous patterns. Default is 0.

Returns

Homogeneous patterns associated to left and right input field.

Return type

dict[DataArray, DataArray]

load_analysis(path, engine='h5netcdf')

Load a model.

This method allows to load a models which was saved by .save_analysis().

Parameters
  • path (str) – Location of the .info file created by .save_analysis().

  • fields (ndarray) – The original input fields.

  • eofs (ndarray) – The obtained EOFs.

  • singular_values (ndarray) – The obtained singular values.

norm(n=None, sorted=True)

Return L2 norm of first n loaded singular vectors.

Parameters

n (int, optional) – Number of modes to return. By default will return all modes.

Returns

L2 norm associated to each mode and vector.

Return type

dict[str, DataArray]

normalize()

Normalize each time series by its standard deviation.

pcs(n=None, scaling='None', phase_shift=0, rotated=True)

Return first n PCs.

Parameters
  • n (int, optional) – Number of PCs to return. If none, then all PCs are returned.

  • None. (The default is) –

  • scaling ({'None', 'eigen', 'max', 'std'}, optional) – Scale by square root of singular values (‘eigen’), maximum value (‘max’) or standard deviation (‘std’). The default is None.

  • phase_shift (float, optional) – If complex, apply a phase shift to the PCs. Default is 0.

  • rotated (boolean, optional) – When rotation was performed, True returns the rotated PCs while False returns the unrotated (original) PCs. Default is True.

Returns

PCs associated to left and right input field.

Return type

dict[DataArray, DataArray]

plot(mode, threshold=0, phase_shift=0, cmap_eof=None, cmap_phase=None, figsize=(8.3, 5.0), resolution='110m', projection=None, orientation='horizontal', land=True)

Plot results for mode.

Parameters
  • mode (int, optional) – Mode to plot. The default is 1.

  • threshold ([0,1], optional) – Threshold of max-normalised amplitude below which the fields are masked out. The default is 0.

  • cmap_eof (str or Colormap) – The colormap used to map the spatial patterns. The default is Blues.

  • cmap_phase (str or Colormap) – The colormap used to map the spatial phase function. The default is twilight.

  • figsize (tuple) – Size of figure. Default is (8.3, 5.0).

  • resolution ({None, '110m', '50m', '10m'}) – A named resolution to use from the Natural Earth dataset. Currently can be one of 110m, 50m, and 10m. If None, no coastlines will be drawn. Default is 110m.

  • projection (cartopy.crs projection,) – dict of {str : cartopy.crs projection} Projection can be either a valid cartopy projection (cartopy.crs) or a dictionary of different projections with keys ‘left’ and ‘right’. The default is cartopy.crs.PlateCaree.

  • orientation ({'horizontal', 'vertical'}) – Orientation of the plot. Default is horizontal.

  • land (boolean) – Turn coloring of land surface on/off.

Returns

  • matplotlib.figure.Figure – Figure instance.

  • dict of matplotlib.axes._subplots.AxesSubplot – Dictionary of axes for left (and right) field containing pcs, eofs and, if complex, phase.

predict(left=None, right=None, n=None, scaling='None', phase_shift=0)

Predict PCs of new data.

left and right are projected on the left and right singular vectors. If rotation was performed, the predicted PCs will be rotated as well.

Parameters
  • left (ndarray) – Description of parameter left.

  • right (ndarray) – Description of parameter right.

  • n (int) – Number of PC modes to return. If None, return all modes. The default is None.

  • scaling ({'None', 'eigen', 'max', 'std'}, optional) – Scale PCs by square root of eigenvalues (‘eigen’), maximum value (‘max’) or standard deviation (‘std’).

  • phase_shift (float, optional) – If complex, apply a phase shift to the temporal phase. Default is 0.

Returns

Predicted PCs associated to left and right input field.

Return type

dict[ndarray, ndarray]

reconstructed_fields(mode=slice(1, None, None), original_scale=True)

Reconstruct original input fields based on specified `mode`s.

Parameters

mode (int, slice) – Modes to be considered for reconstructing the original fields. The default is slice(1, None) which returns the original fields based on all modes.

Returns

Left and right reconstructed fields.

Return type

dict[DataArray, DataArray]

rotate(n_rot, power=1, tol=1e-08)

Perform Promax rotation on the first n EOFs.

Promax rotation (Hendrickson & White 1964) is an oblique rotation which seeks to find simple structures in the EOFs. It transforms the EOFs via an orthogonal Varimax rotation (Kaiser 1958) followed by the Promax equation. If power=1, Promax reduces to Varimax rotation. In general, a Promax transformation breaks the orthogonality of EOFs and introduces some correlation between PCs.

Parameters
  • n_rot (int) – Number of EOFs to rotate.

  • power (int, optional) – Power of Promax rotation. The default is 1 (= Varimax).

  • tol (float, optional) – Tolerance of rotation process. The default is 1e-5.

Raises

ValueError – If number of rotations are <2.

Returns

Return type

None.

rotation_matrix(inverse_transpose=False)

Return the rotation matrix used for rotation.

For non-rotated solutions the rotation matrix equals the unit matrix.

Parameters

inverse_transpose (boolean) – If True, return the inverse transposed of the rotation matrix. For orthogonal rotations (Varimax) it is the same as the rotation matrix. The default is False.

Returns

Rotation matrix

Return type

ndarray

rule_n(n_runs, n_modes=None)

Apply Rule N by Overland and Preisendorfer, 1982.

The aim of Rule N is to provide a rule of thumb for the significance of the obtained singular values via Monte Carlo simulations of uncorrelated Gaussian random variables. The obtained singular values are scaled such that their sum equals the sum of true singular value spectrum.

Parameters
  • n_runs (int) – Number of Monte Carlo simulations.

  • n_modes (int) – Number of singular values to return.

Returns

Singular values obtained by Rule N.

Return type

DataArray

References

  • Overland, J.E., Preisendorfer, R.W., 1982. A significance test for

    principal components applied to a cyclone climatology. Mon. Weather Rev. 110, 1–4.

rule_north(n=None)

Uncertainties of singular values based on North’s rule of thumb.

In case of compex PCA/MCA, the rule of thumb includes another factor of spqrt(2) according to Horel 1984.

Parameters

n (int, optional) – Number of modes to be returned. By default return all.

Returns

Uncertainties associated to singular values.

Return type

ndarray

References

North, G, T L. Bell, R Cahalan, and F J. Moeng. 1982. “Sampling Errors in the Estimation of Empirical Orthogonal Functions.” Monthly Weather Review 110. https://doi.org/10.1175/1520-0493(1982)110<0699:SEITEO>2.0.CO;2.

Horel, JD. 1984. “Complex Principal Component Analysis: Theory and Examples.” Journal of Climate and Applied Meteorology 23 (12): 1660–73. https://doi.org/10.1175/1520-0450(1984)023<1660:CPCATA>2.0.CO;2.

save_analysis(path=None, engine='h5netcdf')

Save analysis including netcdf data and meta information.

Parameters
  • path (str, optional) – Path to storage location. If none is provided, the analysis is saved into ./xmca/<left_name>(_<right_name>)/.

  • engine (str) – h5netcdf is needed for complex values. Otherwise standard netcdf works as well (the default is ‘h5netcdf’).

save_plot(mode, path=None, plot_kwargs={}, save_kwargs={})

Create and save a plot to local disk.

Parameters
  • mode (int) – Mode to plot.

  • path (str) – Path where to save the plot. If none is provided, an automatic name will be generated based on the mode number.

  • plot_kwargs (dict) – Additional parameters provided to xmca.array.plot.

  • save_kwargs (dict) – Additional parameters provided to matplotlib.pyplot.savefig.

scf(n=None)

Return the SCF of the first n modes.

The squared covariance fraction (SCF) is a measure of importance of each mode. It is calculated as the squared singular values divided by the sum of squared singular values.

Parameters

n (int, optional) – Number of modes to return. The default is None.

Returns

Fraction of described squared covariance of each mode.

Return type

DataArray

set_field_names(left='left', right='right')

Set the name of the left and/or right field.

Field names will be reflected when results are plotted or saved.

Parameters
  • left (string) – Name of the left field. (the default is ‘left’).

  • right (string) – Name of the right field. (the default is ‘right’).

singular_values(n=None)

Return first n singular values of the SVD.

Parameters

n (int, optional) – Number of singular values to return. If None, all singular values are returned. The default is None.

Returns

Singular values of the SVD.

Return type

DataArray

solve(complexify=False, extend=False, period=1)

Call the solver to perform EOF analysis/MCA.

Under the hood the method performs singular value decomposition on the covariance matrix.

Parameters
  • complexify (boolean, optional) – Use Hilbert transform to complexify the input data fields in order to perform complex PCA/MCA. Default is false.

  • extend (['exp', 'theta', False], optional) – If specified, time series are extended by fore/backcasting based on either an exponential or a Theta model. Artificially extending the time series sometimes helps to reduce spectral leakage inherent to the Hilbert transform when time series are not stationary. Only used for complex time series i.e. when omplexify=True. Default is False.

  • period (float, optional) – If Theta model, it represents the number of time steps for a season. If exponential model, it represents the number of time steps for the exponential to decrease to 1/e. If no extension is selected, this parameter has no effect. Default is 1.

spatial_amplitude(n=None, scaling='None', rotated=True)

Return the spatial amplitude fields for the first n EOFs.

Parameters
  • n (int, optional) – Number of amplitude fields to return. If none, all fields are returned. The default is None.

  • scaling ({'None', 'max'}, optional) – Scale by maximum value (‘max’). The default is None.

  • rotated (boolean, optional) – When rotation was performed, True returns the rotated spatial amplitudes while False returns the unrotated (original) spatial amplitudes. Default is True.

Returns

Spatial amplitudes associated to left and right input field.

Return type

dict[DataArray, DataArray]

spatial_phase(n=None, phase_shift=0, rotated=True)

Return the spatial phase fields for the first n EOFs.

Parameters
  • n (int, optional) – Number of phase fields to return. If none, all fields are returned. The default is None.

  • scaling ({None, 'max', 'std'}, optional) – Scale by maximum value (‘max’) or standard deviation (‘std’). The default is None.

  • phase_shift (float, optional) – If complex, apply a phase shift to the spatial phases. Default is 0.

  • rotated (boolean, optional) – When rotation was performed, True returns the rotated spatial phases while False returns the unrotated (original) spatial phases. Default is True.

Returns

Spatial phases associated to left and right input field.

Return type

dict[DataArray, DataArray]

summary()

Return meta information of the performed analysis.

temporal_amplitude(n=None, scaling='None', rotated=True)

Return the temporal amplitude functions for the first n PCs.

Parameters
  • n (int, optional) – Number of amplitude functions to return. If none, all functions are returned. The default is None.

  • scaling ({'None', 'max'}, optional) – Scale by maximum value (‘max’). The default is None.

  • rotated (boolean, optional) – When rotation was performed, True returns the rotated temporal amplitudes while False returns the unrotated (original) temporal amplitudes. Default is True.

Returns

PCs associated to left and right input field.

Return type

dict[DataArray, DataArray]

temporal_phase(n=None, phase_shift=0, rotated=True)

Return the temporal phase function for the first n PCs.

Parameters
  • n (int, optional) – Number of phase functions to return. If none, all functions are returned. The default is None.

  • phase_shift (float, optional) – If complex, apply a phase shift to the temporal phases. Default is 0.

  • rotated (boolean, optional) – When rotation was performed, True returns the rotated temporal phases while False returns the unrotated (original) temporal phases. Default is True.

Returns

Temporal phases associated to left and right input field.

Return type

dict[DataArray, DataArray]

truncate(n)

Truncate the solution to the first n modes.

This may be helpful when the full model takes up to much space to be saved.

Parameters

n (int) – Number of modes to be retained.

variance(n=None, sorted=True)

Return variance of first n loaded singular vectors.

Parameters

n (int, optional) – Number of modes to return. By default will return all modes.

Returns

Variance associated to each mode and vector.

Return type

dict[str, DataArray]