Tutorial¶

The package contains two classes xmca.array.MCA for numpy.ndarray and xmca.xarray.xMCA for xarray.DataArray. Depending on which data type you work with, import the respective module:

from xmca.array import MCA  # numpy
from xmca.xarray import xMCA  # xarray

For this tutorial we use North American surface temperatures from 2013 to 2014 shipped with xarray. We arbitrarily separate the data into two domains, west and east.

import xarray as xr  # only needed to obtain test data

# split data arbitrarily into west and east coast
data = xr.tutorial.open_dataset('air_temperature').air
west = data.sel(lon=slice(200, 260))
east = data.sel(lon=slice(260, 360))

Note

xMCA only accepts DataArray, not Dataset.

Warning

The time coordinate needs to be called time, the spatial coordinates lat and lon. In particular ERA5 data sets have their spatial coordinates labeled as longitude and latitude. Consider renaming via e.g. era5 = era5.rename({'longitude':'lon', 'latitude':'lat'}).

PCA/MCA¶

Performing standard MCA is straightforward. Simply run:

mca = xMCA(west, east)
mca.solve()

The singular values (= eigenvalues), spatial patterns (EOFs) and the expansion coefficients (PCs) can then be obtained via

svals = mca.singular_values()
expvar = mca.explained_variance()

eofs  = mca.eofs()
pcs   = mca.pcs()

Note

To perform PCA/EOF analysis instead of MCA, simply provide one instead of two fields, e.g. pca = xMCA(west).

Retrieve the homogeneous and heterogeneous patterns through

hom_patterns = mca.homogeneous_patterns()
het_patterns  = mca.heterogeneous_patterns()

Note

All methods that provide quantities for both left and right field (pcs, eofs etc.) return dictionaries with keys left and right for the respective field. If PCA is performed, only left exists.

Pre-processing¶

By calling the constructor, the input fields are centered (mean removed). Additionally, by using normalize the input fields are standardized. In order to weight the input fields according to their area on a sphere, that is each grid point is multiplied by \(\sqrt{ \cos (\varphi_i)}\) with \(\varphi\) being the latitude, apply_coslat can be used.

mca = xMCA(west, east)
mca.normalize()
mca.apply_coslat()
mca.solve()

Warning

Always call apply_coslat after normalize since the latter nullifies the latitude weighting.

Rotated MCA¶

The package provides two rotation schemes:

Varimax orthogonal rotation [1] and
Promax oblique rotation [2]

Both are available through rotate where specifying the number of rotated EOFs (n_rot) and the Promax parameter (power) defines the type of rotation. For example, to perform Varimax-rotated MCA on the first 10 EOFs run:

mca = xMCA(west, east)
mca.solve()
mca.rotate(n_rot=10, power=1)

Now retrieving the rotated PCs/EOFs works the same as before for standard MCA. Note that the singular values are not affected by the rotation. If you want to obtain the (co-)variance associated to each mode, use the variance method, which is also used to determine the explained variance after rotation.

svals = mca.singular_values()  # same as before rotation
variance = mca.variance()  # different from singular values
rotated_expvar = mca.explained_variance()  # different after rotation

rotated_eofs  = mca.eofs()
rotated_pcs   = mca.pcs()

If for some reason you want to obtain the unrotated PCs/EOFs after the model has been rotated, the rotated parameter comes in handy.

unrotated_eofs  = mca.eofs(rotated=False)
unrotated_pcs   = mca.pcs(rotated=False)

Note

For standard MCA, the results of singular_values and variance are identical. In fact, for standard MCA each singular value is indeed just the covariance explained by the respective mode.

Complex MCA¶

Complex PCA/MCA [3, 4] provides a means to investigate lagged correlations and spatially moving patterns. Performing complex MCA is similarly straightforward:

mca = xMCA(west, east)
mca.solve(complexify=True)

However, when the input data is not stationary, spectral leakage inherent to the Hilbert transform can sometimes produce strong boundary effects which affect the obtained PCs. One approach to mitigate these effects is by artificially extending the input time series before the Hilbert transform and then truncating it afterwards. Here, two different extension methods are provided:

Theta model extension [5] (èxtend='theta')
Exponential decay superimposed on a linear trend (èxtend='exp')

Both approaches require an additional parameter ``period```which has to be chosen a priori.

As a result of complex MCA, the EOFs and PCs have a real and imaginary part. This allows to compute the spatial amplitude and phase function as well as the temporal amplitude and phase function:

s_amp = mca.spatial_amplitude()
s_phase = mca.spatial_phase()
t_amp = mca.temporal_amplitude()
t_phase = mca.temporal_phase()

Note

By combining solve(complexify=True) and rotate, one can perform complex rotated PCA/MCA.

Visualization¶

The package provides a plot method to visually inspect the individual modes.

mca.set_field_names('West', 'East')
pkwargs = {'orientation' : 'vertical'}
mca.plot(mode=1, **pkwargs)

Result of default plot method after performing MCA on T2m of North American west and east coast showing mode 1.

Some fine-tuning of the plot for better optics:

from cartopy.crs import EqualEarth  # for different map projections

# map projections for "left" and "right" field
projections = {
    'left': EqualEarth(),
    'right': EqualEarth()
}

pkwargs = {
    "figsize"     : (8, 5),
    "orientation" : 'vertical',
    'cmap_eof'    : 'BrBG',  # colormap amplitude
    "projection"  : projections,
}
mca.plot(mode=3, **pkwargs)

Result of plot method with improved optics after performing MCA on T2mof North American west and east coast showing mode 3.

You can save the plot to your local disk as a .png file via

skwargs={'dpi':200, 'transparent':True}
mca2.save_plot(mode=3, plot_kwargs=pkwargs, save_kwargs=skwargs)

Evaluation¶

xmca provides some methods to assess the significance of the obtained modes:

North’s rule of thumb
Rule N
Bootstrapping

North’s Rule of Thumb¶

We can obtain the error estimates of the singular values via rule_north.

svals_err_north = mca.rule_north().to_dataframe()
svals_diff = svals.to_dataframe().diff(-1)
cutoff = np.argmax((svals_diff - (2 * svals_err_north)) < 0)  # 10

According to North’s Rule of Thumb [6], mode 10 is the first “effective multiplet”, that is given the sample size it cannot be resolved from the neighboring singular value.

Rule N¶

The aim of Rule N [7] is to provide a rule of thumb for the significance of the obtained singular values via Monte Carlo simulations of uncorrelated Gaussian random variables. The obtained singular values are scaled such that their sum equals the sum of true singular value spectrum. Under these assumptions we can use rule_n choosing the number of Monte Carlo simulations.

svals_rule_n = mca.rule_n(n_runs=100)
median = svals_rule_n.median('run')
q99 = svals_rule_n.quantile(.99, dim='run')
cutoff = np.argmax((svals < q99).values)  # 10

Here we defined the cutoff as the mode where the 99th quantile of the Rule N distribution exceeds the sampled singular value. In this case the cutoff according to Rule N is at mode 10.

Bootstrapping¶

bootstrapping provides a flexible method to perform a wide range of different Monte Carlo simulations. By specifying axis, samples are either drawn along time (0) or along space (1). The parameters on_left and on_right specify which field should be re-sampled. When both are True, samples are drawn from the joint distribution of both fields. For serially correlated variables, the block_size parameter provides the possibility to run moving-block bootstraps by re-sampling blocks of data. By default, re-sampling is performed with replacement. This can be turned off, however, by choosing replace=False which basically means permutation.

Here we perform bootstrapping (with replacement) along space of the joint distribution in order to assess the confidence interval of the singular values. This procedure suggest that the first 4 modes are significant.

svals_boot = mca.bootstrapping(100, on_left=True, on_right=True, axis=1, replace=False)
boot_q01 = svals_boot.quantile(0.01, 'run')
boot_q99 = svals_boot.quantile(0.99, 'run').shift({'mode' : -1})
cutoff = np.argmax((boot_q01 - boot_q99).dropna('mode').values < 0)  # 4

Note

You usually want to run large numbers of Monte Carlo simulations. A typical rule of thumb is \(40N\) where N is the number of observations (time steps) in your input data.

Saving an analysis¶

It is possible to save and load the model via save_analysis in a provided path. A info file info.xmca is then created in this directory which can be loaded via load_analysis.

mca.save_analysis('my_analysis')

new = xMCA()
new.load_analysis('my_analysis/info.xmca')

Warning

The original input fields are saved along with the singular vectors, allowing to call pcs, heterogeneous_patterns etc. However, evaluation results are not saved, in particular no bootstrap results.

1: Kaiser, H. F. The varimax criterion for analytic rotation in factor analysis. Psychometrika 23, 187–200 (1958).
2: Hendrickson, A. E. & White, P. O. Promax: A Quick Method for Rotation to Oblique Simple Structure. British Journal of Statistical Psychology 17, 65–70 (1964).
3: Horel, J. Complex Principal Component Analysis: Theory and Examples. J. Climate Appl. Meteor. 23, 1660–1673 (1984).
4: Rieger, N., Corral, Á., Olmedo, E. & Turiel, A. Lagged teleconnections of climate variables identified via complex rotated Maximum Covariance Analysis. submitted (2021).
5: Assimakopoulos, V. & Nikolopoulos, K. The theta model: a decomposition approach to forecasting. International Journal of Forecasting 16, 521–530 (2000).
6: North, G., L. Bell, T., Cahalan, R. & J. Moeng, F. Sampling Errors in the Estimation of Empirical Orthogonal Functions. Monthly Weather Review 110, (1982).
7: Overland, J.E., Preisendorfer, R.W., 1982. A significance test for principal components applied to a cyclone climatology. Mon. Weather Rev. 110, 1–4.
8: Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. Chapman and Hall. 436 pp.