Tutorial ---------- The package contains two classes ``xmca.array.MCA`` for ``numpy.ndarray`` and ``xmca.xarray.xMCA`` for ``xarray.DataArray``. Depending on which data type you work with, import the respective module: .. code:: py from xmca.array import MCA # numpy from xmca.xarray import xMCA # xarray For this tutorial we use North American surface temperatures from 2013 to 2014 shipped with ``xarray``. We arbitrarily separate the data into two domains, ``west`` and ``east``. .. code:: py import xarray as xr # only needed to obtain test data # split data arbitrarily into west and east coast data = xr.tutorial.open_dataset('air_temperature').air west = data.sel(lon=slice(200, 260)) east = data.sel(lon=slice(260, 360)) .. note:: ``xMCA`` only accepts ``DataArray``, not ``Dataset``. .. warning:: The time coordinate needs to be called `time`, the spatial coordinates ``lat`` and ``lon``. In particular ERA5 data sets have their spatial coordinates labeled as ``longitude`` and ``latitude``. Consider renaming via e.g. ``era5 = era5.rename({'longitude':'lon', 'latitude':'lat'})``. PCA/MCA ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Performing standard MCA is straightforward. Simply run: .. code:: py mca = xMCA(west, east) mca.solve() The singular values (= eigenvalues), spatial patterns (EOFs) and the expansion coefficients (PCs) can then be obtained via .. code:: py svals = mca.singular_values() expvar = mca.explained_variance() eofs = mca.eofs() pcs = mca.pcs() .. note:: To perform PCA/EOF analysis instead of MCA, simply provide one instead of two fields, e.g. ``pca = xMCA(west)``. Retrieve the homogeneous and heterogeneous patterns through .. code:: py hom_patterns = mca.homogeneous_patterns() het_patterns = mca.heterogeneous_patterns() .. note:: All methods that provide quantities for both left and right field (``pcs``, ``eofs`` etc.) return dictionaries with keys ``left`` and ``right`` for the respective field. If PCA is performed, only ``left`` exists. Pre-processing ~~~~~~~~~~~~~~ By calling the constructor, the input fields are centered (mean removed). Additionally, by using ``normalize`` the input fields are standardized. In order to weight the input fields according to their area on a sphere, that is each grid point is multiplied by :math:`\sqrt{ \cos (\varphi_i)}` with :math:`\varphi` being the latitude, ``apply_coslat`` can be used. .. code:: py mca = xMCA(west, east) mca.normalize() mca.apply_coslat() mca.solve() .. warning:: Always call ``apply_coslat`` **after** ``normalize`` since the latter nullifies the latitude weighting. Rotated MCA ~~~~~~~~~~~ The package provides two rotation schemes: 1. *Varimax* orthogonal rotation [[#]_] and 2. *Promax* oblique rotation [[#]_] Both are available through ``rotate`` where specifying the number of rotated EOFs (``n_rot``) and the Promax parameter (``power``) defines the type of rotation. For example, to perform Varimax-rotated MCA on the first 10 EOFs run: .. code:: py mca = xMCA(west, east) mca.solve() mca.rotate(n_rot=10, power=1) Now retrieving the rotated PCs/EOFs works the same as before for standard MCA. Note that the singular values are not affected by the rotation. If you want to obtain the (co-)variance associated to each mode, use the ``variance`` method, which is also used to determine the explained variance after rotation. .. code:: py svals = mca.singular_values() # same as before rotation variance = mca.variance() # different from singular values rotated_expvar = mca.explained_variance() # different after rotation rotated_eofs = mca.eofs() rotated_pcs = mca.pcs() If for some reason you want to obtain the unrotated PCs/EOFs after the model has been rotated, the ``rotated`` parameter comes in handy. .. code:: py unrotated_eofs = mca.eofs(rotated=False) unrotated_pcs = mca.pcs(rotated=False) .. note:: For standard MCA, the results of ``singular_values`` and ``variance`` are identical. In fact, for standard MCA each singular value is indeed just the covariance explained by the respective mode. Complex MCA ~~~~~~~~~~~ Complex PCA/MCA [[#]_, [#]_] provides a means to investigate lagged correlations and spatially moving patterns. Performing complex MCA is similarly straightforward: .. code:: py mca = xMCA(west, east) mca.solve(complexify=True) However, when the input data is not stationary, spectral leakage inherent to the Hilbert transform can sometimes produce strong boundary effects which affect the obtained PCs. One approach to mitigate these effects is by artificially extending the input time series before the Hilbert transform and then truncating it afterwards. Here, two different extension methods are provided: 1. Theta model extension [[#]_] (``èxtend='theta'``) 2. Exponential decay superimposed on a linear trend (``èxtend='exp'``) Both approaches require an additional parameter ``period```which has to be chosen a priori. As a result of complex MCA, the EOFs and PCs have a real and imaginary part. This allows to compute the spatial amplitude and phase function as well as the temporal amplitude and phase function: .. code:: py s_amp = mca.spatial_amplitude() s_phase = mca.spatial_phase() t_amp = mca.temporal_amplitude() t_phase = mca.temporal_phase() .. note:: By combining ``solve(complexify=True)`` and ``rotate``, one can perform complex rotated PCA/MCA. Visualization ~~~~~~~~~~~~~ The package provides a ``plot`` method to visually inspect the individual modes. .. code:: py mca.set_field_names('West', 'East') pkwargs = {'orientation' : 'vertical'} mca.plot(mode=1, **pkwargs) .. figure:: ../../figs/xmca-example-mode1.png :alt: Result of default plot method after performing MCA on T2m of North American west and east coast showing mode 1. Some fine-tuning of the plot for better optics: .. code:: py from cartopy.crs import EqualEarth # for different map projections # map projections for "left" and "right" field projections = { 'left': EqualEarth(), 'right': EqualEarth() } pkwargs = { "figsize" : (8, 5), "orientation" : 'vertical', 'cmap_eof' : 'BrBG', # colormap amplitude "projection" : projections, } mca.plot(mode=3, **pkwargs) .. figure:: ../../figs/xmca-example-mode3.png :alt: Result of plot method with improved optics after performing MCA on T2mof North American west and east coast showing mode 3. You can save the plot to your local disk as a ``.png`` file via .. code:: py skwargs={'dpi':200, 'transparent':True} mca2.save_plot(mode=3, plot_kwargs=pkwargs, save_kwargs=skwargs) Evaluation ~~~~~~~~~~ ``xmca`` provides some methods to assess the significance of the obtained modes: 1. North's rule of thumb 2. Rule N 3. Bootstrapping North's Rule of Thumb ===================== We can obtain the error estimates of the singular values via ``rule_north``. .. code:: py svals_err_north = mca.rule_north().to_dataframe() svals_diff = svals.to_dataframe().diff(-1) cutoff = np.argmax((svals_diff - (2 * svals_err_north)) < 0) # 10 According to North's Rule of Thumb [[#]_], mode 10 is the first "effective multiplet", that is given the sample size it cannot be resolved from the neighboring singular value. Rule N ====== The aim of Rule N [[#]_] is to provide a rule of thumb for the significance of the obtained singular values via Monte Carlo simulations of uncorrelated Gaussian random variables. The obtained singular values are scaled such that their sum equals the sum of true singular value spectrum. Under these assumptions we can use ``rule_n`` choosing the number of Monte Carlo simulations. .. code:: py svals_rule_n = mca.rule_n(n_runs=100) median = svals_rule_n.median('run') q99 = svals_rule_n.quantile(.99, dim='run') cutoff = np.argmax((svals < q99).values) # 10 Here we defined the cutoff as the mode where the 99th quantile of the Rule N distribution exceeds the sampled singular value. In this case the cutoff according to Rule N is at mode 10. Bootstrapping ========================= ``bootstrapping`` provides a flexible method to perform a wide range of different Monte Carlo simulations. By specifying ``axis``, samples are either drawn along time (``0``) or along space (``1``). The parameters ``on_left`` and ``on_right`` specify which field should be re-sampled. When both are ``True``, samples are drawn from the *joint* distribution of both fields. For serially correlated variables, the ``block_size`` parameter provides the possibility to run *moving-block bootstraps* by re-sampling blocks of data. By default, re-sampling is performed with replacement. This can be turned off, however, by choosing ``replace=False`` which basically means permutation. Here we perform bootstrapping (with replacement) along space of the joint distribution in order to assess the confidence interval of the singular values. This procedure suggest that the first 4 modes are significant. .. code:: py svals_boot = mca.bootstrapping(100, on_left=True, on_right=True, axis=1, replace=False) boot_q01 = svals_boot.quantile(0.01, 'run') boot_q99 = svals_boot.quantile(0.99, 'run').shift({'mode' : -1}) cutoff = np.argmax((boot_q01 - boot_q99).dropna('mode').values < 0) # 4 .. note:: You usually want to run large numbers of Monte Carlo simulations. A typical rule of thumb is :math:`40N` where N is the number of observations (time steps) in your input data. Saving an analysis ~~~~~~~~~~~~~~~~~~ It is possible to save and load the model via ``save_analysis`` in a provided path. A info file *info.xmca* is then created in this directory which can be loaded via ``load_analysis``. .. code:: py mca.save_analysis('my_analysis') new = xMCA() new.load_analysis('my_analysis/info.xmca') .. warning:: The original input fields are saved along with the singular vectors, allowing to call ``pcs``, ``heterogeneous_patterns`` etc. However, evaluation results are **not saved**, in particular no bootstrap results. .. [#] Kaiser, H. F. The varimax criterion for analytic rotation in factor analysis. Psychometrika 23, 187–200 (1958). .. [#] Hendrickson, A. E. & White, P. O. Promax: A Quick Method for Rotation to Oblique Simple Structure. British Journal of Statistical Psychology 17, 65–70 (1964). .. [#] Horel, J. Complex Principal Component Analysis: Theory and Examples. J. Climate Appl. Meteor. 23, 1660–1673 (1984). .. [#] Rieger, N., Corral, Á., Olmedo, E. & Turiel, A. Lagged teleconnections of climate variables identified via complex rotated Maximum Covariance Analysis. submitted (2021). .. [#] Assimakopoulos, V. & Nikolopoulos, K. The theta model: a decomposition approach to forecasting. International Journal of Forecasting 16, 521–530 (2000). .. [#] North, G., L. Bell, T., Cahalan, R. & J. Moeng, F. Sampling Errors in the Estimation of Empirical Orthogonal Functions. Monthly Weather Review 110, (1982). .. [#] Overland, J.E., Preisendorfer, R.W., 1982. A significance test for principal components applied to a cyclone climatology. Mon. Weather Rev. 110, 1–4. .. [#] Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. Chapman and Hall. 436 pp.