11,874
Views
3
CrossRef citations to date
0
Altmetric
Research Article

Bispectral analysis of nonlinear interaction, predictability and stochastic modelling with application to ENSO

ORCID Icon & ORCID Icon
Pages 1-30 | Received 26 Aug 2020, Accepted 16 Dec 2020, Published online: 07 Jan 2021

Abstract

Non-Gaussianity and nonlinearity have been shown to be ubiquitous characteristics of El Niño Southern Oscillation (ENSO) with implication on predictability, modelling, and assessment of extremes. These topics are investigated through the analysis of third-order statistics of El Niño 3.4 index in the period 1870–2018, namely bicovariance and bispectrum. Likewise, the spectral decomposition of variance, the bispectrum provides a spectral decomposition of skewness. Positive and negative bispectral contributions identify modes contributing respectively to La Niñas and El Niños, mostly in the period range 2–6 years. The ENSO bispectrum also shows statistically significant features associated with nonlinearity. The analysis of bicovariance reveals a nonlinear correlation between the Boreal Spring and following Winter, coming from an asymmetry of the persistence of El Niño, contributing hence to a reduction of Spring Predictability Barrier. The positive skewness and main features of the ENSO bicovariance and bispectrum are shown to be well reproduced by fitting a bilinear stochastic model. This model shows improved forecasts, with respect to benchmark linear models, especially of the amplitude of extreme El Niños. This study is relevant, particularly in a changing climate, to better characterize and predict ENSO extremes coming from non-Gaussianity and nonlinearity.

1. Introduction

Earth's weather and climate vary on a wide range of spatio-temporal scales. Whereas external orbital variations are believed to be the dominant driving force for macro-climate (on millennial time scales), weather, macro-weather (Lovejoy, Citation2018) and climate variations (on shorter time scales) are mainly the result of complex nonlinear interactions between very many degrees-of-freedom (Sura and Hannachi, Citation2015) and also due to many climate sub-components with different time scales The atmospheric (and climate) system is an excellent example of high-dimensional and highly complex dynamical system. One outstanding and ubiquitous feature of the large scale (and low frequency) atmospheric (and climate) variability is non-Gaussianity (Franzke et al., Citation2007; Proistosescu et al., Citation2016; Pires and Hannachi, Citation2017; Hannachi and Iqbal, Citation2019). For instance, Sura and Sardeshmukh (Citation2008) show that sea surface temperature (SST) has non-Gaussian probability distribution function (PDF) with particular tail extrema. Many processes, e.g. subgrid scales, large-scale teleconnections and nonlinearity can lead to various kinds of uncertainties, which can affect the accuracy of our understanding, such as forecasts. Stochastic modelling can help overcome some of the previous problems and improve accuracies (Berner et al., Citation2017).

Although non-Gaussianity and nonlinearity can be considered as two distinct aspects of weather, macro-weather and climate (e.g., Sura and Sardeshmukh, Citation2008; Sura and Hannachi, Citation2015), observations do suggest that these are two inter-related characteristic features of the system (e.g., Woollings et al., Citation2010; Hannachi et al., Citation2017; and references therein). Some systems can exhibit weak (deterministic) nonlinearity, but with strong non-Gaussian statistics, which may be explained by a multiplicative or state-dependent noise as detailed, e.g. in Sura and Sardeshmukh (Citation2008), Sardeshmukh and Sura Citation2009, see also Hannachi et al. (2017) for a review. In contrast, non-Gaussianity can also result from systems with strong nonlinearity and additive noise, see e.g., Hannachi et al. (2017) for a review and further references. Notwithstanding this link between nonlinearity and non-Gaussianity, it is sometimes helpful to disentangle the contributions of nonlinearity and stochastic noise to the system PDF. There is a general consensus that understanding non-Gaussian statistics of weather and climate is important for a number of reasons, not least for weather and climate prediction, planning and risk assessment. Extreme events, for instance, which are important in planning and risk assessment, depend closely on the structure of the non-Gaussian PDF. Various sources exist, which contribute to the non-Gaussianity. Sura and Hannachi (Citation2015) provide a detailed account of the different sources and mechanisms contributing to the observed non-Gaussianity of the atmospheric large-scale and low-frequency variability. They discussed, in particular, nonlinear regime dynamics, multiplicative noise, cross-frequency coupling, jet stream meandering and nonlinear boundary layer drags. They investigated specifically non-Gaussianity in geopotential heights, jet stream latitude and SST fields, with particular focus on skewness and kurtosis. Skewness and kurtosis are simple measures of non-Gaussianity, in the time domain, that does not directly reflect frequencies. A convenient way to examine nonlinearity and/or non-Gaussianity is to use high (e.g., third and fourth) order statistics in the frequency domain, namely, e.g. bispectrum and trispectrum (e.g. Brillinger and Rosenblatt, Citation1967; Nikias and Raghuveer, Citation1987). Spectral analysis of stationary time series is well rooted in the study of time series x(t),t=0,1,2 (supposed to be zero-mean and unit variance for simplicity) from weather and climate and other fields. The duality between the time and spectral domains allows to investigate the distribution of, e.g. variance and skewness, as a function of spectral bins. For example, the duality between the second-order cumulant, or autocovariance function γx(·) and the power spectrum Γx(f) reveals that the integral of the latter is precisely the variance of the time series, i.e., E(x2)=γx(0)=1212Γx(f)df.

This means, in particular, that the power spectrum can be viewed as a decomposition of the variance by frequency bins. Likewise, the duality between the bispectrum Γ3,x(f1,f2)  and the third-order moment or bicovariance γx(τ1,τ2)E[x(t)x(t+τ1)x(t+τ2)] means that the skewness is the integral of the bispectrum, E(x3)=γx(0,0)=1/21/21/21/2Re[Γ3,x(f1,f2)]df1df2.

This also implies that the bispectrum can be viewed as a decomposition of the skewness by bins of frequency pairs. This can be used to identify nonlinearly interacting pairs of spectral bins contributing to the skewness which are often attributed to phase locking between components at frequency triplets f1,f2, f1+f2 producing triadic resonance (Pires and Perdigão, Citation2015).

Bispectral analysis has been used in signal processing to study nonlinearity detection and bicoherency in a number of different fields including econometry (Ashley et al., Citation1986; Rusticelli et al., Citation2008), acoustics (Richardson and Hodgkiss, Citation1994), and Earth Sciences (Müller, Citation1987; Biswas et al., Citation1995; Hocke and Kämpfer, Citation2008).

Here we focus on the bispectral analysis of El Niño Southern Oscillation (ENSO), the main atmosphere–ocean interannual mode (Neelin et al., Citation1998; Wang et al., Citation2016). ENSO aspects, like nonlinearity and complexity (Timmermann, Citation2003; Frauen et al., Citation2014; Berner et al., Citation2018; Bianucci et al., Citation2018), non-Gaussianity (Burgers and Stephenson, Citation1999; Chunzai, Citation2018; Boucharel et al., Citation2009; Hannachi et al., Citation2017) and also its modelling (Kondrashov et al., Citation2005) have been extensively studied, both from a physical and signal-processing perspective. In particular, some ENSO bispectral analysis was performed by Timmermann et al. (Citation2001, Citation2018), to fit a low-order nonlinear dynamical model of El Niño index and by Schulte et al. (Citation2020) to compute the cross-bi-coherency and synchronization with the Indian Monsoon.

The present study aims to perform a systematic and thorough analysis of third-order statistics, both in the time and spectral domains to infer and improve the understanding of ENSO non-Gaussianity through skewness, nonlinearity (e.g. Hinich, Citation1982; Cox, Citation1991), and nonlinear predictability on time scales ranging from seasons to years. We will emphasize, in particular, the role of nonlinear lagged correlation with the intra-annual time scale in overcoming of the Spring predictability barrier of ENSO (Duan and Wei, Citation2013). The bispectrum of El Niño index and its statistical significance are thus computed, looking for the most relevant wave-triad contributing to the skewness, which are associated with ENSO extremes by a phase synchronization, see e.g. Jajcay et al. (Citation2018). Then, we check how simple stochastic models driven by lagged correlated additive multiplicative (CAM) noise (Monahan, Citation2020) can produce the main bispectral features of ENSO and El Niño/La-Niña asymmetry (Martinez-Villalobos et al., Citation2019).

The manuscript starts with data description with a synthesis of exploratory statistics (Section 2). Then the autocorrelation function and spectrum (Section 3) are shown. In Section 4, we present the tests of nonlinearity and non-Gaussianity based on the bicovariance function. Section 5 details the bispectrum properties and its estimation and statistical significance applied to ENSO. Section 6 develops a minimal bilinear stochastic model fitting data. A summary and conclusion are given in the last section. Most of the technical details and symbols are put in appendices. The details presented in the main text and appendices, some of them conventional, make the paper convenient particularly for didactive purposes and ready to apply to other time-series beyond ENSO.

2. Data

The raw data used are monthly anomalies (with respect to the 1981–2010 period) of El Niño 3.4 sea surface temperature (SST) Index, obtained by area averaged SST in the geographical region 5S–5N by 170–120 W, taken in the 119-year period 1870–2018 and extracted from the periodically updated website https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/nino34.long.anom.data (Rayner et al., Citation2003). Raw data exhibit a positive trend of 0.19 °C/century, associated to oceanic global warming. It reveals also a decadal-scale variability (observed in the 50 yr running means) in phase with the Pacific Decadal Oscillation (PDO), also known as a long-lived El Niño pattern (Zhang et al., Citation1997). After the removal of the linear trend, we get a detrended zero-centred time-series xcent of the index, with a standard deviation σ(xcent)=0.77°C. The skewness is sk(xcent)=0.44, associated to the prevalence of El Niños to La Niñas (Burgers and Stephenson, Citation1999) and an excess kurtosis (ekurt) of 0.46. The monthly dependence of statistics is particularly striking in the skewness and excess kurtosis, ranging from a positively skewed leptokurtic probability distribution function (PDF) in NH winter (sk=1.81, ekurt=2.93 in January), favouring extreme El Niños, up to a platykurtic, nearly unskewed PDF in the NH Spring (sk=0.01, ekurt=0.65 in May). Spring has weaker extreme El Niños compared to Winter, and the extrapolation of the Spring’s El Niño state to the next Winter is rather unskilful i.e. the Spring predictability barrier (Duan and Wei, 2013). The intra-seasonal variability is quite small, compared to the inter-seasonal and interannual variability. Short-range persistence is quite high leading to 1- and 2-month lag-correlations of 0.92 and 0.85. Three-monthly seasonal averages (JFM, AMJ, JAS, OND) are constructed with an overall std σ=0.74°C, sk=0.46 and ekurt=0.41, peaking again during the NH or Boreal Winter’s (sk=1.04, ekurt=1.35  in JFM) (Stuecker et al., Citation2013; Stein et al., Citation2014). The statistics are quite similar to those obtained with more commonly used trimesters (e.g. for DJF, we get sk=1.07, ekurt=1.68. The remaining three seasons are much less skewed with sk[0.25,0.28] and a rather small ekurt except for the platykurtic behaviour in Boreal Spring (AMJ), with ekurt=0.51. All the analysis that follows is performed on the standardized time series, hereafter denoted as x(t),with sample size N=596 trimesters (trm), being shown in .

Fig. 1. Detrended and standardized time series in the period 1870–2018, of the three-monthly average anomalies (with respect to the annual cycle) of El Niño 3.4 index (black) and its low-pass (red) and high-pass (green) components with a cutoff frequency of 0.08 cycles per trimester roughly corresponding, respectively, to inter- and intra-triennial timescales variability.

Fig. 1. Detrended and standardized time series in the period 1870–2018, of the three-monthly average anomalies (with respect to the annual cycle) of El Niño 3.4 index (black) and its low-pass (red) and high-pass (green) components with a cutoff frequency of 0.08 cycles per trimester roughly corresponding, respectively, to inter- and intra-triennial timescales variability.

3. Autocorrelation and spectrum

3.1. Autocorrelation function

We start by evaluating second-order lagged moments of the time series. The autocorrelation function Cx(τ)1Nτt=0Nτ1x(t)x(t+τ)  is shown in . It has oscillatory behaviour with typical ENSO timescales (3–4 years). In order to test the null hypothesis H0 of a vanishing autocorrelation, taking into account the serial correlation, we use an approximate effective sample size NeffN[1Cx(1)][1+Cx(1)]66 (Wilks, Citation2011). Then, H0  is rejected if |Cx(τ)|2>[qα2(Neff)]2/{[qα2(Neff)]2+Neff2} (von Storch and Zwiers, Citation1999) where qα2(Neff) is the α2-th quantile of a t-Student PDF with Neff degrees of freedom. From , some local extremes and significant correlations appear also at decadal lags.

Fig. 2. Autocorrelation function of El Niño 3.4 index (solid black) along with the 95% (dashed black) and 90% (dotted black) confidence interval. The autocorrelation function of the AR(5) model fitting data is also shown (solid red).

Fig. 2. Autocorrelation function of El Niño 3.4 index (solid black) along with the 95% (dashed black) and 90% (dotted black) confidence interval. The autocorrelation function of the AR(5) model fitting data is also shown (solid red).

3.2. The spectrum and its estimation

A regularly sampled stationary time-series x(t) can be decomposed using discrete Fourier transform (DFT) as: Xx(f)=t=0N1x(t)exp(2πift), for every frequency f in cycles per sampling period and can be reconstructed through the inverse Fourier transform (FT) as x(t)=1NK=N2N2Xx(kN)exp(2πikNt) (for N even). The sample spectrum or periodogram is the DFT of the autovariance function: (1) Sx(f)=1N|Xx(f)|2=τ=(N1)N1Cx(τ)exp(2πifτ)0(1) whereas the asymptotic spectrum is given by: (2) Γx(f)=limNE[Sx(f)](2)

where E(·) is the expectation operator over realizations of the stochastic process. According to the Wiener–Khintchine theorem (Equation2) is the FT of the autocovariance function i.e.: (3) γx(τ)=1212Γx(f)exp(2πifτ)df.(3)

EquationEquation (3) yields, in particular, the variance of the time series (4) γx(0)=1212Γx(f)df1N[Sx(0)+2k=1N2Sx(kN)],(4) where Γx(f)df in (Equation4) provides the contribution from the spectrum to the time series variance within the frequency interval [f,f+df].

For a stationary process, the DFTs Xx(f), estimated at different frequencies are nearly uncorrelated (von Storch and Zwiers, Citation1999), and the periodogram (Equation1), is known to be inconsistent, and hence a smoothed estimator is often used: (5) Ŝx(f)=τ=(N1)N1Cx(τ)λ(τ)exp(2πifτ)(5) where λ(τ)λ̂(τM) where λ̂(·) is a symmetric standardized positive lag window function (Priestley, Citation1981), characterized by a standardized bandwidth b11λ̂(u)2du. The window scale length M is chosen, based on a trade-off between spectral resolution (b1/M), low values of the variance (var[Ŝx(f)][Γx(f)2MNb1]) and bias (E[Ŝx(f)]Γx(f)λ̂(0)Γx(f)4π2M2) of (5) (Jenkins and Watts, Citation1968), where double apostrophe represents second derivatives. The 90% confidence interval of Γx(f) is given by [ Ŝx(f)νq95%(χν2), Ŝx(f)νq5%(χν2)], where ν=2Nb1M  is the number of degrees of freedom of the χν2.

We have also computed, the maximum entropy estimator of the spectrum by fitting an autoregressive (AR) model of generic order p, via the Yule–Walker equations. The order is chosen by minimizing the Akaike Information Criterion, AIC (Akaike, Citation1974). The goal of computing the theoretical maximum entropy spectrum is to build a null hypothesis spectrum Γx,0(f). Therefore, in order to objectively check if the spectral peaks of (1) and (5) are truly distinguishable and significantly larger than those of Γx,0(f),  at a significance level α, the estimated spectra (1,5) have to be compared to the threshold Γx,0(f)q1α(χν2)ν (Wilks, Citation2011) with ν=2Nb1M for (5) and ν=1 for (1) corresponding to M=N, and a standardized bandwidth equal to the Nyquist frequency.

3.3. El Niño index spectrum

3.3.1. Periodogram and smoothed spectrum

The periodogram (Equation1) of the time series is shown in . There is evidence of sharp and well separated peaks near the frequencies 0.021, 0.045, 0.071, 0.087, 0.105, and 0.168 cycles per trimester (cpt), corresponding (in the same order) to periods of 12.2 years, linked to decadal variability (Sun and Yu, Citation2009; Kravtsov, Citation2012), 5.62 years (Kim, Citation2002), and 3.53, 2.87, 2.38 and 1.49 years in the range of the ‘Quasi-quadrennial and quasi-biennial variability’ (Jiang et al., Citation1995). These peaks are also found by Deser et al. (Citation2010) that are responsible for variations in the ENSO frequency, intensity, propagation, and predictability (e.g., An and Wang, Citation2000; Fedorov and Philander, Citation2000; Timmermann, Citation2003; An and Jin, Citation2004; Yeh and Kirtman, Citation2004; Wang, Citation2018).

Fig. 3. Empirical periodogram (thin green); Smoothed spectrum (thick black) and corresponding 95% confidence interval (thin black); theoretical AR(5) spectrum (red), and 10% significance level for the periodogram (red dashed) and for the smoothed spectrum (red dotted). Units are (°C)2/cpt. The top axis presents periods in years.

Fig. 3. Empirical periodogram (thin green); Smoothed spectrum (thick black) and corresponding 95% confidence interval (thin black); theoretical AR(5) spectrum (red), and 10% significance level for the periodogram (red dashed) and for the smoothed spectrum (red dotted). Units are (°C)2/cpt. The top axis presents periods in years.

For the smoothed spectrum (Equation5), we chose the Bartlett–Priestley window-lag function: λ̂(u)=3(πu)2[sin(πu)πucos(πu)]​​​(u0);λ̂(0)=1 (Priestley, Citation1981) for which b1=0.855 and λ̂(0)=1.97. In order to resolve the main spectral peaks, the bandwidth b1M  must be smaller than the minimum difference between consecutive leading frequencies, i.e. b1M<0.016  cpt, hence M>b10.01652 trimesters. To further increase spectral resolution, we show in the estimator (5) using M=80 (20 years), corresponding to ν13. In this case, the standard deviation of the filtered estimator is about 39% of the true spectrum. The 90% confidence interval for Γx(f) is [0.52Ŝx(f),2.62Ŝx(f)] which is quite large, a consequence of the high window length M. The local maxima of the filtered estimator lie close to the high spikes. Moreover, the periodogram lies well within the 95% confidence interval of the true spectrum (see thin black lines of ).

3.3.2. Maximum entropy spectrum

An AR(5) model (see the AIC in ) is fitted here: (6) x˜(t)=1.18x˜(t1)0.46x˜(t2)0.09x˜(t3)+0.17x˜(t4)0.12x˜(t5)+0.524w(t),(6) where w(t) is a standard Gaussian white noise. The autocorrelation function of the AR(5) model () closely captures the empirical one for lags up to 12–20 trm but misses low-frequency oscillations. The corresponding (theoretical) maximum entropy spectrum is shown in showing a spectral bump in the range (3–5 years period) and with no other spectral peaks. also shows the thresholds of significance (at α = 10%) for the periodogram and for the smoothed spectrum.

Table 1. Values of the Akaike Information Criterium AIC(p)=Nln(2πeσw,p2)+2p, where σw,p2 is the residual noise variance of an AR(p) model fitting.

The peaks with periods 5.62, 3.53, and 1.49 years are significant, both in the periodogram (Equation1) and the smoothed spectrum (Equation5) whereas the peak 2.87 years is significant in the periodogram (Equation1) but only marginally significant in the smoothed spectrum (Equation5).

4. Bicovariance, non-Gaussianity and nonlinearity

4.1. General properties

The bicovariance function γx(τ1,τ2) is a generalization of the autocovariance function, given by third-order cumulants between lagged values of x(t), which for a zero mean stationary process writes as γx(τ1,τ2)=E[x(t)x(t+τ1)x(t+τ2)] and is estimated here, for τ1,τ20 as: (7) Cx(τ1,τ2)=1Nmax(τ1,τ2)t=0Nmax(τ1,τ2)1x(t)x(t+τ1)x(t+τ2).(7)

Stationarity implies time invariance of lagged statistics leading to several identities reflecting the symmetry of the bicovariance (Rao and Gabr, Citation1984), namely: (8) γx(τ1,τ2)=γx(τ2,τ1)=γx(τ1,τ2τ1)=γx(τ2,τ1τ2)=γx(τ2τ1,τ1)==γx(τ1τ2,τ2)(8) leading to the partition into 6 sectors of the plane (τ1,τ2) as illustrated in . We stress here that the origin of a non-vanishing bicovariance is the non-Gaussianity of the process x(t). In fact, for a Gaussian, (necessarily linear) process, the bicovariance vanishes because cumulants of order 3 (the equivalent to lagged skewness) or higher vanish. For finite time series, Gaussianity is rejected if the absolute value of skewness is large enough.

Fig. 4. Symmetries (a) of the bicovariance function in the delay plan (adapted from Rao and Gabr, Citation1984) and bicorrelation of El Niño 3.4 index (b). Thick black contours show the 5% significance level.

Fig. 4. Symmetries (a) of the bicovariance function in the delay plan (adapted from Rao and Gabr, Citation1984) and bicorrelation of El Niño 3.4 index (b). Thick black contours show the 5% significance level.

For a nonvanishing third-order cumulant, the bicorrelation provides a measure of predictability (for lags Δ0), coming from nonlinear correlation: (9) cor[x(t+Δ),x(t)x(tτ1)]=Cx(τ1,τ1+Δ)var[x(t)x(t+τ1)].(9)

Local maxima and minima of bicorrelation can thus provide sources of predictability due to non-Gaussianity and/or nonlinearity. However, we must note that part of the nonlinear correlation (Equation9), in particular at τ1=0, is due to skewness, and hence to eliminate that contribution, we must consider the nonlinear component or residual xnl(t)=x(t)2[skx x(t)+1], of the predictor x(t)2 after removing the linear regression with x(t), where skx  is a regression constant equal to the skewness of x.

Another useful advantage of bicorrelation is the Cox (Citation1991) test of nonlinearity, that we will apply bellow, also based on skewness. More general tests exist, which analyse nonlinear predictability originating from a set of past values (Granger and Anderson, Citation1978). Under the null hypothesis of linearity, the test TCox(Δ)cor[x(t+Δ),xnl(t)]2 must vanish for all lags. Threshold values of TCox under an inexistent predictability hypothesis are obtained by randomly shuffling x(t) and xnl(t) (10,000 times) and then computing the 95% quantile (denoted as TCox95%) of the sorted values of TCox(0). Nonlinearity of El Niño is thus accepted if TCox(Δ)>TCox95% at a 5% significance level.

4.2. Results for El Niño index

4.2.1. Bicorrelation

The bicorrelation (), exhibits fluctuations of the order of 12–20 trimesters similar to the autocorrelation (). The maximum value of the bicorrelation coincides with the skewness: Cx(0,0)=0.46.

In order to test the departure of the bicorrelation from the Gaussian hypothesis, we have generated 10,000, N-sized simulations with the AR(5) and computed uncertainties. The 90% (95%) quantiles of |Cx˜(0,0)|  are equal to 0.24 (0.29), which are both below the observed skewness 0.46, hence the null hypothesis of Gaussianity is rejected at the 5% significance level. For τ1τ2 (diagonals of the bicovariance graph) or (τ10  and/or τ20), the 95% (90%) quantiles are ∼0.14 (0.16) and 0.10 (0.12) elsewhere. The rejection regions rejecting of the null hypothesis (at the 5% significance level) appear within thick black contours in .

From inspection of , we verify that the deepest bicovariance minimum Cx(0,4)=0.40, is significant at the 5% significance level, corresponding to a nonlinear correlation (EquationEq. (9)) of cor[x(t+4),x(t)2]=0.24 with τ1=0,Δ=4 trm. This implies that an extreme El Niño or La Niña (high x(t)2 than average) favours the occurrence of La Niña occurrence four trimesters later (negative x(t+4)), whereas mild conditions favour El Niño. Another local bicorrelation minimum Cx(8,8)=0.15 (10% significance level) implies that La Niña event favours a strong El Niño or La Niña 2 years later (8 trm).

4.2.2. Linear and nonlinear predictability

As regards El Niño predictability skill, shows the linear and nonlinear correlations: cor[x(t+Δ),x(t)] and cor[x(t+Δ),xnl(t)], with xnl(t) defined in Section 4.1 and for forecast lags Δ up to 20 trm. The linear correlations of El Niño 3.4 are significant at 10% level for lags Δ3 trm ( thick black line) whereas the nonlinear correlations show significant values for lags 3 trmΔ5 trm ( thick red line). Those correlations are also evaluated for the stronger El Niño season i.e., the JFM trimester (t+Δ in JFM) ( thin black line for linear and thin red line for nonlinear correlations respectively). We note here the presence of the Spring predictability barrier of El Niño (Duan and Wei, 2013) i.e., the barely weak linear extrapolation (i.e. persistence) of El Niño index from current Spring (AMJ) to the next Winter (JFM), as shown by the small value (∼0.05) of cor[x(t+Δ),x(t) | t+Δ=JFM] where | means 'conditionated to' and Δ=3. However, the Spring barrier is reduced if we include the nonlinear term in the forecast. In fact, the nonlinear correlation evaluated at JFM ( thin red line) for forecast lags Δ=35 trm (–0.25 to −0.4) is statistically significant, e.g. a negative value of cor[x(t+3),x(t)2  |  t=AMJ, t+3=JFM] between x(t) in Spring and x(t+3) in next Winter (see ).

Fig. 5. (a) Linear (thick black) and nonlinear (thick red) correlations (cor[x(t+Δ),x(t)] and cor[x(t+Δ),xnl(t)] respectively) of El Niño 3.4 index. The same correlations restricted to forecast trimester JFM (black thin and red thin respectively). Horizontal dashed lines show the 10% significance level interval [1.64Neff3,1.64Neff3]. (b) AMJ versus following year JFM El Niño and best quadratic fitting. (c) Cox test TCox(Δ) (thick) and the 95% (dotted) and 99% (dashed) confidence level threshold of nonrejection of the linearity hypothesis.

Fig. 5. (a) Linear (thick black) and nonlinear (thick red) correlations (cor[x(t+Δ),x(t)] and cor[x(t+Δ),xnl(t)] respectively) of El Niño 3.4 index. The same correlations restricted to forecast trimester JFM (black thin and red thin respectively). Horizontal dashed lines show the 10% significance level interval [−1.64Neff−3,1.64Neff−3]. (b) AMJ versus following year JFM El Niño and best quadratic fitting. (c) Cox test TCox(Δ) (thick) and the 95% (dotted) and 99% (dashed) confidence level threshold of nonrejection of the linearity hypothesis.

This is corroborated by the conditional expectations of the Winter signal conditioned to the previous Spring signal. First, we have E[x(t+3)  | x(t)<1]=0.38 (see left sector of ). We argue that strong trade-winds (or persistent eastern wind bursts in the Eastern Pacific), favouring strong La Niña Spring conditions tend to persist over some trimesters eventually reaching the next Winter season (e.g. JFM of years 1989, 1974, and 2011).

On the other hand, we have E[x(t+3)  | 1<x(t)<1]=0.12 (central sector of ) corresponding to (Spring) near climatological conditions in the Eastern Pacific. Here, the tendency is to favour a Winter with El Niño predominance in agreement with Boreal Winter phase-locking. In particular, strong Winter El Niños (e.g. 1983, 1998, and 2016) were preceded by quite mild Spring conditions. Finally, from E[x(t+3) |  x(t)>1]=0.19, strong El Niño Spring conditions (associated with westerly winds anomalies) tend to reverse in the next trimesters. This suggests that the nonlinear predictability is a consequence of the asymmetric persistence of El Niño signal and Pacific trade winds in Spring, as a function of their intensity and phase. A possible mechanism for this is seasonal growth rate dependence on the ENSO regime and feedbacks controlling SST (Yishuai et al., Citation2020), and the seasonal dependence of easterly wind bursts from Spring to Autumn (Fan et al., Citation2019).

Finally, we compute the nonlinear forecast score TCox(Δ) for lags up to 80 trm (). Clearly, the nonlinearity is especially significant at certain lags (4–8, 28, 36, 52, 72, 80 trm) where TCox(Δ) is even larger than the quantile TCox99% of nonrejection of the linearity hypothesis. Those lags are related to phase synchronization between Fourier frequencies, namely those with periods τ1 and τ2=2τ1. Lags are generally close to multiples of the half-period of the shorter oscillation, i.e. Δ=nτ12, nϵN. For instance, from , the Fourier spectral peaks at periods τ1=10.087=11.5 trm and τ2=10.0452τ1 justify the peak of TCox(Δ) at lag Δ=5τ1229 trm . This frequency relationship is known as quadratic phase coupling (Biswas et al., Citation1995) and examples in relation to decadal variability of ENSO are given in Timmermann (Citation2003).

5. Bispectrum

5.1. General properties

5.1.1. Bispectrum background

The bicovariance of El Niño time series () exhibits certain features and periodicities in the lag-time domain. Therefore, the two-dimensional Fourier transform of the bicovariance, i.e. bispectrum (Brillinger and Rosenblatt, Citation1967), can provide a dual complementary information about the most relevant spectral interactions contributing to the bicovariance and skewness of the time series making easier the physical interpretation of such interactions.

The bispectrum is the two-dimensional version of polyspectra (Brillinger, Citation1965), providing relevant information on non-Gaussian processes. The bispectrum Γ3,x(f1,f2)  is given by: (10) Γ3,x(f1,f2)τ1=τ2=γx(τ1,τ2)exp[2πi(f1τ1+f2τ2)],(10) where its real and imaginary parts are discrete Fourier transforms (DFT) of the symmetric and antisymmetric parts of the bicovariance, respectively, with the last one vanishing if the underlying stochastic process is reversible (Weiss, Citation1975).

For the simplest case of a purely random noise w(t), the bicovariance is a spike at the origin i.e. γw(τ1,τ2)=E(w3)δ(τ1)δ(τ2) where δ(·) is the Kronecker delta, yielding a flat, constant and real bispectrum Γ3,w(f1,f2)=E(w3).

5.1.2. Bispectrum and spectral components

The asymptotic bispectrum of a N-sized time series writes in terms of DFTs Xx(·) of the time series, taken at triplets of frequencies (multiples of the minimum frequency 1N) (Hinich, Citation1982) at f1,f2 and f3=[(f1+f2+12)mod(1)12] as: (11) Γ3,x(f1,f2)=limN1NE[Xx(f1)Xx(f2)Xx(f3)*](11) where (*) stands for conjugate complex. EquationEq. (11) shows that the bispectrum comes from the interaction between Fourier components at three frequencies f1,f2 and f3=f1+f2<1/2 in the area outside of the aliasing region, i.e. lower than the Nyquist frequency. When f1+f2>1/2, then f3=1(f1+f2) becomes an aliased frequency (Hinich and Wolinsky, Citation1988).

EquationEquation (11) can still be interpreted in terms of the amplitude and phase of the DFT in polar form, i.e. Xx(f)Ax(f)e[iΘx(f)] . By denoting Ax(f1)Ax(f2)Ax(f3)Ax,123,  and ei[Θx(f1)+Θx(f2)Θx(f3)]eiΘx,123, and applying the product expectation decomposition to (11), we get (Kovach et al., 2018): (12) Γ3,x(f1,f2)=limN1NE[Ax,123]E[eiΘx,123] + limN1Ncov(Ax,123,eiΘx,123),(12) where the first r.h.s. term of (Equation12) depends on phase synchronization of the three Fourier components and the second term is a covariance between amplitudes and phases, vanishing for linear processes. In fact, the Volterra representation of a linear process writes as a convolution: x(t)=kα(k)w(tk), where w(t) is a purely random noise. Its DFT is Xx(f)=Xα(f)Xw(f)=Xα(f)Aw(f)e[iΘw(f)]  where  Xα(f),Xw(f)  are DFTs, respectively of the sequence α(·) and of the noise. The independence between Aw(f) and Θw(f) yields Γ3,x(f1,f2)= Xα(f1) Xα(f2) Xα(f3)* E(w3),  restricted to the synchronization term (Nikias and Raghuveer, Citation1987), where E(w3)=limN1NE[Aw,123]E[eiΘw,123], showing hence the intrinsic nonlinear origin of the covariance term of (12).

5.1.3. Properties of the bispectrum

Like the spectrum, the bispectrum of a real signal satisfies Γ3,x(f1,f2)=Γ3,x(f1,f2)*, and (Equation11) leads to 5 identities of symmetry, namely: (13) Γ3,x(f1,f2)=Γ3,x(f2,f1)=Γ3,x(f1,f1f2)=Γ3,x(f1f2,f2)=Γ3,x(f1f2,f1)=Γ3,x(f2,f1f2).(13)

This allows for a partition of the bispectrum domain (BD) into 12 polygonal regions in which the bispectrum can be reproduced from the Principal Domain (PD): the triangle of vertices (0,0), (0,1/2) and (1/3,1/3) (shown by triangle 1 in ). The periodicities are evident in the theoretical bispectrum in (real part) and (imaginary part) of a non-Gaussian AR(5) model of El Niño (presented later in Section 5.3.1).

Fig. 6. Principal domain (triangle 1) in the spectral plane and its symmetric replicas (a) (adapted from Rao and Gabr (Citation1984). Real (b) and Imaginary (c) parts of the bispectrum of the NGAR(5) model. Note the symmetries associated with the 12 sectors shown in panel a).

Fig. 6. Principal domain (triangle 1) in the spectral plane and its symmetric replicas (a) (adapted from Rao and Gabr (Citation1984). Real (b) and Imaginary (c) parts of the bispectrum of the NGAR(5) model. Note the symmetries associated with the 12 sectors shown in panel a).

The reconstruction of bicovariance is obtained through the inverse FT of (10): (14) γx(τ1,τ2)=12121212Γ3,x(f1,f2)exp[2πi(f1τ1+f2τ2)]df1df2,(14) which leads to the skewness decomposition in terms of positive or negative contributions given by the real parts along the PD: (15) E(x3)=γx(0,0)=12PDRe[Γ3,x(f1,f2)]df1df2.(15)

It is important to note here that, like the power spectrum (Equation4), (Equation15) implies that the element of the bispectrum Re[Γ3,x(f1,f2)]df1df2 provides the contribution to the skewness E(x3) from the bi-spectral bin [f1,f1+df1]×[f2,f2+df2].

Finally, since the square correlation is a predictability measure coming from third-order moments (see EquationEq. (9)), its overall sum can be distributed over the bispectral domain through the Parseval relationship: (16) τ1τ2[γx(τ1,τ2)]2=12121212|Γ3,x(f1,f2)|2df1df2=12PD|Γ3,x(f1,f2)|2df1df2.(16)

5.2. Bispectrum estimation

The estimation of bispectrum has been addressed by many authors (e.g. Brillinger and Rosenblatt, Citation1967; Raghuvver and Nikias, 1986; Nikias and Raghuveer, Citation1987). The empirical bispectrum or biperiodogram of a finite sample of length N is the two-dimensional DFT of the sampled bicovariance, which can also be expressed in terms of DFTs of the signal (see Section 3.1): (17) S3,x(f1,f2)τ1=(N1)N1τ2=(N1)NCx(τ1,τ2)exp[2πi(f1τ1+f2τ2)]=1NXx(f1)Xx(f2)Xx(f3)*(17) where f3=(f1+f2+12)mod(1)12. The inverse FT of EquationEq. (17) yields the empirical bicovariance: (18) Cx(τ1,τ2)=1N2k1=N2N2k2=N2N2S3,x(k1N,k2N)exp[2πi(k1Nτ1+k2Nτ2)](18)

Like the periodogram (Equation1), the estimator (Equation17) is not consistent, which can be overcome by (i) smoothing the sample bispectrum (Hinich, Citation1982) or dividing the sample into pieces and then averaging and smoothing bispectra (Lii and Rosenblatt, Citation1982); (ii) using multi-tapers (Birkelund and Hanssen, Citation1999, Citation2000) or (iii) smoothing the bicovariance function (indirect-method) (Rao and Gabr, Citation1984), which we use here.

The smoothed bispectrum is: (19) Ŝ3,x(f1,f2)=τ1=NNτ2=NNCx(τ1,τ2)Λ(τ1,τ2)exp[2πi(f1τ1+f2τ2)](19) where the 2D-lag window Λ(·) satisfies similar properties of symmetry as the bicovariance (8) and is taken to be Λ(τ1,τ2)=λ̂(τ1M2)λ̂(τ2M2)λ̂(τ1M2τ2M2) where λ̂( )  is the lag window function and M2 is the window length. The equivalent to bandwidth is given by b2M22 where b2=1/[λ̂(u)λ̂(v)λ̂(uv)]2du dv=1.19 (Appendix A).

5.3. Bispectrum estimation of El Niño index

5.3.1. Null hypothesis bispectrum

To reproduce the observed skewness, we can construct an AR process driven by a non-Gaussian noise as a null hypothesis H0.

The model we wish to fit is like model (Equation6) x˜(t)=τ=1pa(k)x˜(tk)+σww(t), but with a non-Gaussian w(t) white noise, σw2=E(x2)1212|A(f)|2df and A(f)=1τ=1pa(k)e(2πiτf). The bispectrum of such a linear process (Nikias and Raghuveer, Citation1987) is: (20) Γ3,x˜(f1,f2)=σw3E(w3)A(f1)A(f2)A(f1+f2)*,(20) with: (21) E(w3)=E(x3)σw312121212[A(f1)A(f2)A(f3)*]1df1df2(21)

By using the coefficients of the AR(5) model (Equation6), we get the approximations σw2=0.275 and E(w3)=0.895, hence σw3E(w3)=0.129. This null H0 is hereafter designated NGAR(5).

shows () and imaginary () parts of the bispectrum (20) of the NGAR(5) model over the global bifrequency domain. The real part is mostly positive whereas the imaginary part is formed by dipolar structures. Both parts reflect the symmetry shown in (13) and exhibit a positive band of maximum absolute values in the region near f1+f20.09  cpt and f1,f2[0.02,0.08] cpt, coinciding with the frequency range where the power of the AR(5) model (6) exceeds 3 (°C)2/cpt as seen in .

5.3.2. Empirical smoothed bispectrum

To estimate the empirical bispectrum using the smoothed estimator (19), we start by identifying the ideal window function M2, according to (A4) in Appendix A. The values of the bispectral fluctuations σŝ3,x and the average confidence interval half-size (cihs) of the bispectrum (given by the square root of the l.h.s. of (A4)) are given in for several values of M2. In order to separate peaks, the condition σŝ3,x>cihs must be satisfied. As expected, smaller bandwidths (larger M2) lead to larger bispectral fluctuations and larger bispectrum estimation errors via cish. From , the largest M2 around which σŝ3,x>cihs is M230, which is used below.

Table 2. Average fluctuations σŝ3,x of the bispectrum and the average confidence interval half-size (CIHS) for several values of the window length M2 (EquationEq. (A4)).

shows the real () and imaginary () parts of the smoothed bispectrum of the 3.4 El Niño index for the most relevant part of the first quadrant.

Fig. 7. Real (a) and Imaginary (b) part of the empirical bispectrum using a window lag (M=30). Squared amplitude of the smoothed bispectrum (c). In figures a–c, significant regions at 20% significance level (or lower) appear within thick contours. Real (d) and imaginary (e) parts of the standardized bispectrum deviation Tx,x˜(f1,f2)  of El Niño index, and corresponding sum of the squared real and imaginary parts (f). Bifrequencies for which the null hypothesis H0 is rejected at 20% significance levels (or lower) are color-shaded (|Tx,x˜)| >1.3 for each part). Values are restricted to the most significant part of bispectrum.

Fig. 7. Real (a) and Imaginary (b) part of the empirical bispectrum using a window lag (M2 =30). Squared amplitude of the smoothed bispectrum (c). In figures a–c, significant regions at 20% significance level (or lower) appear within thick contours. Real (d) and imaginary (e) parts of the standardized bispectrum deviation Tx,x˜(f1,f2)  of El Niño index, and corresponding sum of the squared real and imaginary parts (f). Bifrequencies for which the null hypothesis H0 is rejected at 20% significance levels (or lower) are color-shaded (|Tx,x˜)| >1.3 for each part). Values are restricted to the most significant part of bispectrum.

Both parts show several peaks. In particular, both present high absolute values for f1, f2[0.04,0.07]  cpt, as for null NGAR(5) model () but with much larger amplitude. As expected, the average diameter of peaks is of the order of b2M2=0.036 cpt. The integral of Re(Ŝ3,x) () is the estimated skewness as EquationEq. (15). According to EquationEq. (11), the superposition of Fourier components along the triplet of frequencies f1,f2, f3=f1+f2 where Re()  is positive (negative), will mostly generate extreme positive (negative) values, i.e. El Niño (La Niña) events. The integral of positive and negative values of Re(Ŝ3,x) over the frequency domain is 0.54 and −0.08 respectively (adding up to the observed skewness 0.46). The local maxima of the real part, mostly contributing to El Niño extremes, lie near the frequency triplet (f1=f2=0.05, f3=0.1 cpt) and the band (f1+f2=f3=0.165 cpt), corresponding to local maxima of the power spectrum (), (e.g. quadratic phase synchronization Jajcay et al., Citation2018). On the other hand, the local minima of the real part, mostly contributing to La Niña extremes, lie near the frequency triplets (f1=f2=0.018, f3=0.036 cpt) and (f1=0.05, f2=0.018, f3=0.063 cpt), which are again close to relative power maxima (see ).

shows the squared bispectrum amplitude, providing the bispectral contribution to the predictability through (16), which agree quite well with Timmermann (Citation2003). Its maxima occur near the local extremes of the real or imaginary parts of Ŝ3,x (). The most relevant region for nonlinear predictability occurs for frequencies satisfying f1+f2[0.07, 0.1] cpt. Another maximum is observed for f1+f2[0.16, 0.18] cpt, producing oscillations with periods of 5–6 trimesters, and suggests possible source of the high nonlinear predictability for lags 5–6 trm, as diagnosed by Cox test in .

5.3.3. Bispectrum and bicovariance near the origin

The bispectrum is relevant for the bicovariance behaviour near the origin. In fact, γx(τ1,τ2) can be approximated by a Taylor expansion: (22) γx(τ1,τ2)=γx(0,0)+p=1,2γxτpτp+12p,q=1,22γxτpτqτpτq+16p,q,r=1,23γxτpτqτrτpτqτr+O(τ1aτ2b); a+b=4(22) where derivatives are computed at τ1=τ2=0. Using EquationEq. (14), we get: (23) γxτp=BDIm[Γ3,x(f1,f2)]2πfpdf1df2(23) (24) 2γxτpτq=BDRe[Γ3,x(f1,f2)](2π)2fpfqdf1df2(24) (25) 3γxτpτqτr=BDIm[Γ3,x(f1,f2)](2π)3fpfqfrdf1df2.(25) where bicovariance symmetries lead to symmetries at the origin: γxτp=0, i.e. a local bicovariance extreme (see ) and 2γxτ1τ1=2γxτ2τ2, 3γxτ1τ1τ1=3γxτ2τ2τ2,3γxτ1τ1τ2=3γxτ2τ2τ1.

The partial derivatives at the origin, estimated with the smoothed El Niño bispectrum yield 2Cxτpτp=0.264 (p=1,2),  which explains most of the symmetric decrease of Cx(τ1,τ2) near the origin (see ). The term 3Cxτpτpτp=0.114, however explains the asymmetry of that decrease, which is stronger for positive than negative lags yielding the bicovariance minimum Cx(0,4)=0.4.

5.3.4. Bispectrum from frequency-band partitions

A coarse-grained description of the bispectrum can be achieved by classifying the triplets f1,f2,f3=f1+f2 into sets of frequencies forming a partition of [0,1/2]. Each triplet is then characterized by the number of frequencies in each set. We consider the simple partition of the frequency interval using a cutoff frequency 0<fcut<1/2, separating low (S) and high (F) frequencies, with the corresponding decomposition x(t)=s(t)+f(t) (see ). shows the 4 obtained subdomains namely, SSS, SSF, SFF and FFF, yielding an expansion into 4 terms of third-order statistics, e.g., (26) E(x3)=E(s3)+3E(s2f)+3E(sf2)+E(f3)=SSS+SSF+SFF+FFF,(26)

Fig. 8. (a) Subdomains SSS, SSF, SFF and FFF obtained from a frequency partition using a cut-off frequency fcut=0.08 cpt (∼3 years). (b) Contributing terms to El Niño 3.4 index skewness (EquationEq. (26)).

Fig. 8. (a) Subdomains SSS, SSF, SFF and FFF obtained from a frequency partition using a cut-off frequency fcut=0.08 cpt (∼3 years). (b) Contributing terms to El Niño 3.4 index skewness (EquationEq. (26)(26) E(x3)=E(s3)+3E(s2f)+3E(sf2)+E(f3)=SSS+SSF+SFF+FFF,(26) ).

shows the terms in the r.h.s. of EquationEq. (26) as a function of fcut. A reasonable criterion of discrimination among the different components is to choose fcut  that maximizes the sum |SSS|+|SSF|+|SFF|+|FFF|, which takes place at fcut=0.082 cpt (3.04 years). This yields inter- (slow S) and intra-triennial (fast F) variations with respective 82% and 38% explained variance, with a well-marked scale separation lying at a local minimum of the smoothed power spectrum (see ), and a minimum value of E(s3) (–0.066).

To see the impact of spectral decomposition on the different terms of skewness, we compare in the terms of EquationEq. (26), derived directly from the time series to those obtained from the partial integrals of the smoothed bispectrum (). shows that the values are quite close, except for the negative value of E(s3). The underestimation obtained from the smoothed bispectrum is suggested to be due to the weak resolution of low frequencies.

Table 3. Contributing terms to skewness, EquationEq. (26), obtained from time series and from partial integrals of the smoothed bispectrum: total, negative and positive contributions. The most relevant integral positive (negative) contributions are marked in bold, representing the most relevant types of interaction contributing to extreme El Niño (La Niña) events.

Extremes are classified according to the dominant term (SSS, SSF, SFF or FFF). From , extreme events of La Niña must be explained by the slow component s(t) (e.g. 1887, 1917, 1956, 2000) or by phase synchronization of s(t)  and f(t) (mostly of SFF type, e.g. 1973, 1988, 2008, and 2011) (). On the other hand, extreme events of El Niño, are mainly due to slow-fast component interactions, namely of SSF type (e.g. 1877, 1918, 1930, 1958, 2015) and SFF type (e.g. 1926, 1951, 1965, 1972, 1982–83, 1992, 1997, 2002) or from fast components only (FFF type) (e.g. 1923, 1977, 2006, 2009). However, some El Niño events (e.g. 1905, 1940, 1986–87), have occurred due to long persistence of the slow component (SSS type) (). Note also that there are few cases of phase polarity between fast and slow components (e.g. 1974) that lead to weak El Niño index.

In order to determine temporal changes of the bispectrum, we assess the third moment and its decomposition, EquationEq. (26), both in the full period (FULL) and along the three half-centuries: 1870–1919 (HC1), 1920–1969 (HC2 and 1970–2018 (HC3) (). Moreover, we evaluate the above statistics during El Niños (x(t)>0) and La Niñas (x(t)<0) to examine the variability of extremes and corresponding spectral contributions. For a given term in EquationEq. (26), for instance, SSS, its average E(SSS) decomposes as: E(SSS)=E(SSS)++E(SSS), where E(SSS)+=E(SSS  | x>0)prob(x>0) and E(SSS)=E(SSS  |  x0)prob(x0), giving the contributions to E(SSS) during El Niños and La Niñas, respectively. summarizes the results.

Table 4. Expected values (yellow) of the third moment of the normalized El Niño 3.4 index (denoted as sk) and its contributions SSS, SSF, SFF and FFF, as detailed in EquationEq. (26) for the full period and for the three half centuries 1870–1919 (HC1), 1920–1969 (HC2) and 1970–2018 (HC3). The expectations and their bispectral contributions are further decomposed into contributions conditioned to La Niñas (blue) and conditioned to El Niños (red). Note that sk=sk++sk (idem for skewness terms).

First, the most recent half century (HC3) has on average, the most extreme episodes of La Niña and El Niño, as observed from the high absolute values of sk and sk+. The amplification of La Ninãs comes mostly from a clear increase of self-slow interaction SSS=–0.06 and cross interaction SFF=–0.4 whereas amplification of El Niños comes from the enhancement of the SFF term (0.6), as compared to the previous two half centuries. This suggests changes and decadal variability of the ENSO skewness, its bicovariance and bispectrum (Wu and Hsieh, Citation2003), associated to changes in the preferential Fourier phase couplings (Schulte et al., Citation2019). This was accompanied by an ENSO regime shift, near 1970 towards more nonlinearity (An and Wang, Citation2000; An and Jin, Citation2004; An, Citation2009).

5.3.5. Statistical significance of the empirical bispectrum

It is important to check the acceptance or rejection of the null bispectrum H0 of NGAR(5) (). The spectral method used here is based on a variation of the Hinich (Citation1982) test of linearity. We anticipate that both the local (A6) and the integrated (A7) spectral-based tests of nonlinearity in the bi-frequency domain reject the null H0 in consistency with the nonlinearity Cox test in the time domain (see Section 4.2.2) and hence other sources of non-Gaussianity (e.g. deterministic nonlinearity and multiplicative noise) shall be necessary. Furthermore, as shown in Appendix A the asymptotic bias (A2), variance (A3) as well as the asymptotic Gaussian PDF (Rao and Gabr, Citation1984) of the smoothed estimator are not good approximations because of the small number of degrees freedom (Neff=66) of the time series and hence cannot be used to test the null H0. This could be alleviated by using, e.g. a very long run of a climate model simulation. We use a Monte-Carlo strategy by computing the statistics of bispectrum by generating 10,000 surrogates of the NGAR(5) model forced by a noise prescribed by its first three moments. In order to easily obtain noise realizations, we consider noises produced by polynomial independent standard Gaussian noises, by relating the coefficients of monomial expectations to the imposed noise moments. For instance, the first trial noise: σww(t)=aw1(t)+b[w12(t)1] where w1 is a standard Gaussian white noise, leads to σw2=a2+2b2=0.275 and σw3E(w3)=8b3+6a2b=0.129, yielding a=0.5122, and b=0.0794, and its excess kurtosis is 1.073.

Here, we choose: σww(t)=aw1(t)+b[w22(t)1] where w1,w2 are independent standard Gaussians with σw2=a2+2b2=0.275 and σw3E(w3)=8b3=0.129, yielding a=0.3840, and b=0.2525 and a excess kurtosis of 2.475. The results are quite robust to changes in the noise model. Other possible, though less practical, noises are generated by maximum entropy constrained by the four first moments (Pires et al., Citation2010). We then compute the Monte-Carlo ensemble average E[Ŝ3,x˜] and variance var[Ŝ3,x˜], of the real and imaginary parts of the smoothed bispectrum. Noise high-order moments (greater than 2) appear only to influence the high-order moments of the smoothed spectrum (e.g. skewness and kurtosis) which are not relevant for the linearity test devised here.

The deviation of El Niño smoothed bispectrum Ŝ3,x(f1,f2) () from that of NGAR(5) model is assessed by the test statistic (standardized deviation) (EquationEq. (A6)): Tx,x˜(f1,f2)Ŝ3,x(f1,f2)E[Ŝ3,x˜](var[Ŝ3,x˜])1/2 . Under H0, its real and imaginary parts are approximately standard Gaussian. We also limited the tests to frequencies with higher bispectrum amplitude, roughly corresponding to |f1|,|f2|<0.2 (as in ). shows the real () and imaginary () parts of Tx,x˜(f1,f2) where significant regions (at α=20% significance level). are color-shaded. shows that most peaks of Re(Ŝ3,x) () are significant. In particular, the low-frequency region (SSS type), producing La Niña events for |f1|,|f2|,|f3|<0.06 cpt, is significant (i.e. rejecting H0). The other positive and negative bispectrum extremes discussed in Section 5.3.2 are also significant at α=10%. The imaginary part of the test () and the squared amplitude () are also highly significant in most of the relevant regions of the bispectrum with significance levels reaching 5%. The most significant region of nonlinear predictability holds approximately for f1+f2  within [0.16, 0.18]  cpt where bispectrum is significant at α=5%, producing oscillations with periods of the order 5–6 trimesters. This suggests a possible source of the high nonlinear predictability for lags 3–6 trm, as diagnosed by Cox test (), and for the nonlinear curtailing of El Niño Spring barrier (see ).

The local test Tx,x˜(f1,f2)  computed on a frequency basis may lead, ambiguously, either to the rejection or to the acceptance of linearity, depending on f1,f2 (see ). This is suggested by the fact that finite N-sized samples generated by the non-Gaussian NGAR(5) model led to local frequency tests Tx˜,x˜(f1,f2) where linearity is falsely rejected (not shown). Therefore, in order to overcome this difficulty and enhance test robustness, we propose the integrated test of nonlinearity (A7) given by Tint x,x˜(f1,f2)L |Tx,x˜(f1,f2)|2 over a representative lattice L (). We found that nonlinearity cannot be rejected (at 5% level), thanks to the highly significant regions of El Niño bispectrum ( and 7a–c).

5.3.6. Normalized bispectrum and bicoherency

An independent test of linearity, beyond that of previous section and diagnostic of phase synchronization comes from the normalized bispectrum or bicoherence spectrum (Kim and Powers, 1979; Nikias and Raghuveer, Citation1987; Hinich and Wolinsky, Citation2005; Rao et al., Citation2012). It is obtained by prewhitening x(t) to yield a non-Gaussian white noise y(t), and reconstructed by inverse FT of Xy(f)Xx(f)[Γx(f)]12=Nei[Θx(f)]. The test is then: (27) Γ3,y(f1,f2) Γ3,x(f1,f2)( Γx(f1)Γx(f2)Γx(f3))12(27)

We stress that the phases of Xy(f) and Xx(f)  are the same (i.e. Θx(f)), leading to a nonvanishing correlation cor(x,y)=σx11212 Γx(f) df.

A linear process i.e. x(t)=kα(k)w(tk) where w is a white noise yields Γx(f)=| Xα(f)|2E(w2). By using the result of Section 5.1.2, the normalized bispectrum (Equation27) becomes Γ3,y(f1,f2)=sk(w)ei[Θα(f1)+Θα(f2)Θα(f3)] where Θα(·) is the phase of the DFT of sequence α(k). Therefore, the amplitude of Γ3,y  is uniform, i.e. |Γ3,y(f1,f2)|=sk(w) which is precisely the Hinich (Citation1982) null hypothesis of linearity.

In the case of El Niño, we get an estimated prewhitened non-Gaussian noise y(t) by using the more reliable theoretical maximum entropy NGAR(5) spectrum for the normalization in (27), instead of the empirical smoothed spectrum. The lag correlation of the resulting noise y(t) is very close to zero and thus can be considered a white noise. Its skewness is 0.282 and the excess kurtosis is 0.531. The correlation with the signal x(t) is quite high: 0.75, coming mainly from extreme events.

The smoothed normalized bispectrum, using the same window length M=M2=30, is hereby denoted Ŝ3,y. Significant peaks (at 20% significance level) of Ŝ3,y, both of the real () and imaginary () parts are located nearly at the same bifrequencies as the non-normalized bispectrum Ŝ3,x () though peaks are attenuated by normalization and a new peak appears at higher frequencies f1=f2=0.18 (2.5 years), (f3=0.36) cpts (2.8 years). The squared bicoherency |Ŝ3,y(f1,f2)| 2 () is clearly nonuniform, showing regions of nonacceptance of the null bispectrum of NGAR(5) and hence rejecting the linearity hypothesis.

Fig. 9. Real (a) and Imaginary (b) parts of the normalized smoothed bispectrum of El Niño index, (M2=30) and its squared amplitude (c). Regions statistically significant at 20% level are colour-shaded.

Fig. 9. Real (a) and Imaginary (b) parts of the normalized smoothed bispectrum of El Niño index, (M2=30) and its squared amplitude (c). Regions statistically significant at 20% level are colour-shaded.

6. Stochastic modelling

6.1. The method

6.1.1. The motivation

ENSO has been extensively studied to look for a deeper understanding of the underlying physics and complexity (see recent reviews of Chunzai, 2018; Timmermann et al., Citation2018 and references therein), improve predictability (see the review of Tang et al., Citation2018), as well as to get accurate statistics (e.g. pdf, extremes etc.). This was done through different dynamical models (physical-based deterministic models), statistical models (e.g. linear inverse modelling by Penland (Citation1996) and Privalsky and Muzylev, Citation2013) and models based on machine learning (Dijkstra et al., Citation2019).

However, even complex models may exhibit biases of various statistics. The present top-down approach of fitting simple stochastic models to observations, attempts learn signal-noise relationships from models, enabling parametrizations of the nonlinear and complex effects of nonobserved variables and hence reproducing a set of relevant stochastic properties (e.g. spectrum, bispectrum).

A number of stochastic univariate models of ENSO have been fitted such as: smooth transition autoregressive (STAR) models (Hall et al., 2001; Ubilava and Helmers, Citation2013), autoregressive conditional heteroscedasticity type models (ARCH) (Ahn and Kim, Citation2005), and threshold AR models (De Gooijer, Citation2017). This section aims at fitting a minimal univariate stochastic model for El Niño 3.4. index driven by a multiplicative Gaussian delayed noise, able to reproduce the observed empirical spectrum, skewness and bispectrum as well as assess the impact it has on predictability, compared to benchmark linear models.

6.1.2. The model formulation

The models fitted here, belong to a class in which the simulated scalar state x˜(t),  at integer t, is driven by a noise u(t)=σww(t) where w(t) is a standard Gaussian white noise and σw is a positive constant. The simulated state x˜(t) depends (through a function F) on: a) the simulated previous state values at t<0, represented in delay coordinate (Takens et al., Citation1981) by x˜(t1)[x˜(t2),x˜(t1)]T; b) the previous noise values at t<0: u(t1)[u(t2),u(t1)]T and c) a parameter vector θ. Throughout this section bold letters refer to vectors and italic to scalars. The model writes thus as: (28) x˜(t)=F[x˜(t1), u(t1), θ]+ u(t)(28)

The model (28) will also be used in forecast mode, and the τ-lag (τ1)  forecast valid at time at t, x(t,τ), is given by: (29) x(t,τ)=F[x(t1,τ1), ϵ(t1,τ), θ],(29) where x(t1,τ1)[,x(t2,τ2),x(t1,τ1)]T, with x(t,τ)=x(t) for τ0. The noise vector in (29) becomes ϵ(t1,τ)[,ε(t2,τ1),ε(t1,τ)]T with ε(t,τ)x(t,τ1)x(t,τ), where τ=max(1,τ).  For instance the one-step forecast is x(t,1)=F[x(t1,0), ϵ(t1,1), θ] where x(t1,0)=x(t1)[,x(t2),x(t1)]T  and ϵ(t1,1)=ϵ(t1)=[,ε(t2),ε(t1)]T with ε(t)x(t)x(t,1) is the one-step error forecasts, being thus consistent with (28).

The models of form (Equation28) include AR linear models, fitted in Section 3.3.2. However, in order to reproduce nonlinear and non-Gaussian ENSO behaviour, we have considered bilinear models (Haggan and Oyetunji, Citation1980; Rao and Gabr, Citation1984; Rao Citation1981), which differ from AR processes by the addition to them of lagged bilinear (BL) terms, denoted ARBL(p1, p2): (30) F[x˜(t1), u(t1), θ]=k=1p1akx˜(tlk)+k=1p2bkx˜(trk)u(tsk)+α,(30) characterized by its correlated-additive–multiplicative (CAM) noise (Usoro, Citation2015), where lk, rk,sk1 are lags, with θ(a1,,ap1,b1,,bp2,α). These models can produce non-Gaussian statistics and nonvanishing bicovariances and bispectra (Rao and Gabr, Citation1984). Note that the restricted case of lags rk=0, sk=0 (Monahan, Citation2020) are excluded from model of EquationEq. (30), to allow inverting u(t) from past values for forecasting. Note also that the case rk>sk leads to sub-diagonal bilinear models whereas the case rksk corresponds to diagonal/super-diagonal bilinear models. In the former case, for example, the intervening noise is independent of the most recent state whereas the latter case corresponds to nonlinear feedbacks, having in general nonvanishing time averages due to correlation between states and past noises. We stress that models (Equation30) have a nonlinear Volterra development in terms of lagged noises, which in a certain way parametrizes nonlinearities which are not present in a deterministic form in function F. However, other type of models could be fitted, e.g. adding quadratic terms in the deterministic forcing but which can lead to instabilities in simulations. Another difficulty with nonlinear deterministic part, is the very wide class of nonlinearity: which nonlinearity: if polynomial-what degree? However, we investigate it in another study.

6.1.3. Model fitting

In order to optimize predictability and reproduce data statistics, we apply a hybrid fitting algorithm that minimizes a cost function Jhyb  which is the weighted sum of the normalized one-step forecast residuals Jfitt  and the normalized squared distance between a set of observed and simulated statistics (average, autocovariance and bi-covariance) Jstat: (31) Jhyb(θ)=cfittJfitt(θ)+cstatJstat(θ,σw(θ))(31) where cfitt and cstat are positive weights and σw(θ)=σx(Jfitt(θ))1/2 is the RMS of the forecast residuals for a model using parameters θ with σx=1 (observations are standardized).

The term Jfitt  is given by the one-step forecast error: (32) Jfitt(θ)=1(NNγ)σx2t=NγN1[x(t)F[x(t1), ϵ(t1), θ]]2(32) where Nγmaxk,k(lk,rk,sk).

Iterative minimization algorithms of Jfitt(θ) for general bilinear models are discussed in Pham and Tran (Citation1981), (Rao and Gabr Citation1984), Grahn Citation1995, Guegan and Pham (Citation1989), Gabr Citation1998 and Falguerolles and Francis (Citation1992). Traditionally, the method of moments is used to obtain implicit relationships between the parameters and lagged moments (e.g. Sesay and Rao, Citation1988; Tang and Mohler, Citation1988; Kim et al., Citation1990), where, in most cases parameters are difficult to be expressed as a function of moments. Here we apply a method where statistics are estimated from a long simulation of (EquationEq. (28)) with initial conditions: x˜(t)=w(t)=0; t=Npast,,Npast+Nγ, and w(t)N(0,1);t=Npast+Nγ+1,…, Nsim. Statistics are computed for t=1,,Nsim with Nsim=30,000 and the initial Npast=1000 values are discarded from statistics as spin-up.

The term Jstat, involving the mean, autocovariance and bicovariance from observations, (Sobs,Cobs(·), and Bobs(·,·)), and simulations (Ssim, Csim(·),  and Bsim(·,·)), is (33) Jstat(θ,σw)=JS(θ,σw)+JC(θ,σw)+JB(θ,σw),(33) where (34) JS(θ,σw)=(SsimSobs)2σx2JC(θ,σw)=|τ|τmax[Csim(τ)Cobs(τ)]2b1λ(τ)2τmaxσx4  (34) (35) b1τmaxσx4 1/21/2[Ŝ2,x˜(f)Ŝ2,x(f)]2df,(35) and (36) JB(θ,σw)|τ1|,|τ2|τmax[Bsim(τ1,τ2)Bobs(τ1,τ2)]2b2λ(τ1,τ2)2τmax2σx6 b2τmax2σx6 1/21/21/21/2[Ŝ3,x˜(f1,f2)Ŝ3,x(f1,f2)]2df1df2,(36) with τmax=16.

The used window lag functions λ(τ) and λ(τ1,τ2) are scaled by M=M2=τmax. The Parseval Theorem applied to (35–36), shows that minimizing JC(θ,σw), JB(θ,σw) leads also to minimizing errors in the spectral domain.

In the analysis we compare two situations: cfitt=1,cstat=0 (simple fitting) and the hybrid fitting where cfitt=1; cstat=ΔJfitt/ΔJstat, given by the ratio of typical variations of Jfitt and Jstat (hybrid fitting). The optimal parameters issued from simple and hybrid fittings are hereby denoted θfitt and θhyb, respectively. Note that the term cstatJstat reduces overfitting and the domination of one term over the other, see Appendix B for the description of the minimization of Jhyb(θ). We compare both fittings in terms of Jfitt and Jstat for the AR(p1) model and several ARBL(p1, p2) models. The statistical significance is given through the normalized Akaike Information Criterion: NAIC=log(Jfitt)2dimθ/(NNγ) that penalizes the number of model terms.

6.2. Results for El Niño index

6.2.1. Fitting statistics

We analyse and evaluate a sequence of models (EquationEq. (30)), for p1=5, lk=k;k=1,,p1, with various p2 values using lags rk6,sk6 () with every new lag pair producing the largest Jfitt decrease.

shows that the mean residual squares Jfitt  decreases with increasing model complexity. For hybrid, NAIC decreases with increasing complexity (i.e., no overfitting). Moreover, the hybrid fitting is able to get improved statistics compared to that from the simple fitting (Jstat(θhyb)<Jstat(θfitt)), with reductions up to about one fifth for the ARBL(5,5) model. That reduction entails very tiny increments in the sum of residuals,  i.e. Jfitt(θhyb)>Jfitt(θfitt), of the order of 1%, but still keeping significant NAIC values.

Table 5. Statistics of the AR(5) and bilinear ARBL(5,1),…,ARBL(5,5) models with the corresponding bilinear-term lags (rp2,sp2)  along with the terms Jfitt and Jstat for simple and hybrid fitting in addition to the normalized AIC relative to hybrid fitting. Correlation skills corτ are also shown at lags τ=1,2,3 trimesters, assessed over the period 1970–2018 for models obtained by hybrid fitting calibrated in the period 1870–1969.

The estimated of the ARBL(5,5) model are shown in and (α,σw)=(0.0626, 0.5102).

Table 6. Coefficients and lags of the autoregressive and bilinear terms of the ARBL(5,5) model of El Niño index, obtained by hybrid fitting over the full period. The independent constant α=0.0626 and the additive driving noise is u(t)=0.5102 w(t).

6.2.2. Simulated autocovariance and spectrum

We note that the autoregressive part of the ARBL(5,5) model is quite similar to that of AR(5) model (EquationEq. (6)). The bilinear coefficients have smaller amplitude than the autoregressive coefficients. ARBL(5,5) contains two feedback terms corresponding to (rk,sk)=(2,4) and (1,3) whose nonzero mean values come from white noise squares and leading to a nonvanishing constant α. This parametrizes partially the bicovariance at the interseasonal scales which is at the origin of the nonlinear reduction of the Spring Predictability Barrier (see Section 4.2.2). The model recovers quite well the empirical autocovariance function (not shown), and also the smoothed spectrum () using the window lag M=30, with particularly similar peak in the band f0.050.07 cpt.

Fig. 10. Smoothed empirical (black) and ARBL(5,5) simulated (red) spectra using a window lag M = 30.

Fig. 10. Smoothed empirical (black) and ARBL(5,5) simulated (red) spectra using a window lag M = 30.

6.2.3. Simulated bicovariance and bispectrum

Most (∼80%) uncertainty in Jhyb(θhyb) comes from the bicovariance uncertainty through JB(θ,σw). The simulated bicovariance () reproduces very well the same pattern of the bicovariance inferred from observations (), at least up to lags 12 trimesters. The simulation yields a skewness equal to 0.26, which is comparable with the observed skewness (0.46). Any simpler model with only one bilinear term is unable to yield positive skewness and negative bicovariances Cx˜(0,3), Cx˜(0,4), which are fundamental to get skillful El Niño predictions from Spring to the next Winter.

Fig. 11. Bicorrelation of the ARBL(5,5) model, approaching the empirical bicorrelation of El Niño index (to be compared with ).

Fig. 11. Bicorrelation of the ARBL(5,5) model, approaching the empirical bicorrelation of El Niño index (to be compared with Fig. 4b).

The smoothed bispectrum of the ARBL(5,5) simulation, using a window lag M2=30, is shown in (real part) and (imaginary part), to be compared with the empirical one (). The real part () is quite well reproduced, with negative values in the low-frequency band |f1+f2|<0.06, and positive local maxima near f1=f2=0.045 cpts  and f1=f2=0.085 cpts (). The decomposition of skewness (EquationEq. (26)), (with fcut=0.06) yields the values −0.036, 0.062, 0.154 and 0.082 respectively for the SSS, SSF, SFF and SSS contributions, which agree approximately with observations (). The imaginary part of the simulated bispectrum () is also roughly well reproduced, showing negative values within the region f1+f2<0.06 () and positive elsewhere. The maximum is near f1=f20.07 cpts. Note that the existence of a single peak is related to the simulated single-peak spectrum.

Fig. 12. Real (a) and Imaginary (b) parts of smoothed bispectrum of the ARBL(5,5) using a window lag M2=30.

Fig. 12. Real (a) and Imaginary (b) parts of smoothed bispectrum of the ARBL(5,5) using a window lag M2=30.

We shall remark that the bilinear models chosen here, have a linear deterministic part with nonlinearity coming indirectly through the CAM noise. However, other type of models could be fitted, e.g. adding quadratic terms in the deterministic forcing but which can lead to other difficulties like instabilities and chaotic behaviour in simulations.

6.2.4. Predictability

In order to assess the predictability impact of the inclusion of bilinear terms in ARBL models, compared to the AR(5) model, we compute the correlation skill corτ, () between observations and predictions for lags for τ=1,2,3 trimesters. Models are optimized by hybrid fitting in the training period 1870–1969 (100 years) and predictions are evaluated in the validation period 1970–2018 (49 years), where the most intense La Niñas and El Niños have been observed (see Section 5.3.4). All the tested models’ predictions are skillful (corτ>0.5) for lags up to two trimesters.

shows that, in general correlation skills increase with increasing complexity. For instance, the correlation skill of model ARBL(5,5) is about 2%, 7% and 9% larger than AR(5), respectively for lags of 1, 2 and 3 trimesters. The presence of at least two bilinear terms suggests the reason behind the improvement of the 2-trimester forecasts by ARBL(5,5), with respect to AR(5), for which some extreme El Niños (e.g. 1973, 1983, 1998, 2016) are more accurately predicted. The ARBL(5,5) model has no explicit deterministic nonlinearities, which are common to physically-based models. Here, nonlinearities are parametrized through the bilinear terms. Despite the simplicity of the ARBL(5,5) model, its correlation skills (cor1=0.87, cor2=0.62) are not dramatically smaller, in the same period than those of physically-based models (cor1=0.850.91, cor2=0.680.80) and to those of a much more complex neural-network based model (cor1=0.92, cor2=0.80) as observed in of (Ham et al., Citation2019).

7. Discussion and conclusion

El Niño Southern Oscillation (ENSO) is one of the most important coupled atmosphere–ocean system, exhibiting time scales ranging from seasons to decades and beyond, with a particularly worldwide teleconnection. Using different stochastic and/or dynamic approaches, most studies have emphasized and shown its intrinsic complexity, nonlinearity and non-Gaussianity. Most of those studies limited their investigations to the second order statistics in addition to skewness and/or kurtosis.

Here, we follow the same line of research by performing a data-driven systematic study of the third-order stochastic moments, both in the time and spectral domains, applied to the standardized trimonthly-average El Niño 3.4 index with a trimester sampling. Within the time domain, this comprises the bicovariance γx(τ1,τ2)=E[x(t)x(t+τ1)x(t+τ2)], in addition to nonlinear correlations for testing nonlinearity and nonlinear predictability for forecasting horizons from seasons up to a few years. The study uses a 149-year period (1870–2018) time series with its statistically significant skewness of 0.46, peaking mostly at the boreal winter (1.04). The analysis of bicovariance maxima reveals high negative nonlinear correlation, implying that El Niño or La Niña extremes are likely to be followed by La Niña one year later, whereas mild conditions, on the other hand, favour El Niño occurrence.

The analysis of the nonlinear predictability, on seasonal time scales, shows that such nonlinear correlation is enhanced further when forecasts are issued at the NH Spring season (AMJ). This is linked to the persistence of many La Niñas starting in Spring up to next Winter and to the fact that strong Winter El Niños have only occurred under close climatological conditions in the previous boreal Spring. This strongly suggests that nonlinearity in the inter-seasonal timescale can contribute significantly to reduce the so-called El Niño Spring predictability barrier. Another equally important aspect is the fact that ENSO nonlinearity allows for the extension of predictability skill even for forecasting time of a few years, particularly when these forecasting time intervals satisfy phase synchronization and quadratic phase locking with certain dominant Fourier frequencies.

Similarly, within the spectral domain the bispectrum and bicoherence have been computed. As with power spectrum and variance, the bispectrum provides, in particular, the contribution of each bi-frequency bin to the observed skewness and squared bicovariance. This warrants the detection of combinations of El Niño Fourier components that mostly contribute to ENSO extremes by phase synchronization. The bispectrum also permits a test of nonlinearity in the spectral domain. The bispectrum has been estimated by a smoothed estimator using a window lag of 30 trimesters = 7.5 years, obtained from a trade-off between bispectrum bias, variance, and spectral resolution. To estimate the statistical significance of peaks, a conservative test has been adopted. The bispectrum of a non-Gaussian autoregressive null-hypothesis model NGAR(5) is tested and rejected at 5% significance level. We first obtained the coarse-grained spectral partition of the skewness by splitting the full signal into a slow component s(t) (with periods larger than 3 years) and a fast component f(t). The skewness is then decomposed into 4 components, namely SSS=0.066, SSF=0.263, SFF=0.185  and FFF=0.071, implying, in particular, that most El Niños result from interaction between inter- and intra-triennial timescales. Some decadal tendency towards SFF-type El Niños is apparent from the time series, which is consistent with the observed ENSO decadal variability. Note that if a maximum of the bispectrum real part is observed at (f1,f2) then a peak in the power spectrum is observed approximately at f3=f1+f2. In particular, the leading bispectral maxima contributing to El Niño occurs at f1=f2=0.05, f3=0.1 cpt cpt (periods of 5 and 2.5 years inside the SSF region) and near the band f1+f2=0.165 cpt, crossing the SFF and FFF regions, with a maximum at f1=f2=0.082, f3=0.165 cpt (periods of 2.9 and 1.5 years). On the other hand, minima contributing mostly to La Niña extremes, lie near f1=f2=0.018, f3=0.036 cpt (periods of 14 and 7 years) and (f1, f2)=(0.05,0.018), f3=0.063 cpt (periods of 5, 7 and 4 years), both inside the SSS region.

Lastly, a minimal stochastic model was constructed, which was able to reproduce the main features of the spectrum and bispectrum, and yielded robust improvement of the predictability skill, compared to an autoregressive AR(5) model. The model was selected from a large class of bilinear models with correlated-additive–multiplicative lagged noise. To gain predictability skill with the right stochastic properties, a hybrid fitting approach is used by minimizing a combination of forecasting squared residuals and squared deviations from empirical third-order statistics. The bilinear model yields forecast improvements, particularly at lags of 1, 2 and 3 trimesters with 2%, 7% and 9% of correlation skill increment respectively, suggested to be linked to the attenuation of El Niño predictability Spring barrier and to the more accurate prediction of super El Niños.

This study contributes to the understanding of ENSO predictability and modelling from the perspective of the bispectrum and phase synchronization. In a changing climate, this is especially relevant for the study and predictability of ENSO extremes resulting from resonant-type interaction. The study also provides the possibility to investigate other ENSO indices such as El-Nino Modoki or other Nino indices, and check whether other processes are at play. In particular, forecast skill of ENSO, based on the developed models are of great importance for seasonal (and longer) timescales forecasting. A systematic analysis of these topics is beyond the scope of this manuscript and is left for future research.

Acknowledgments

This research was developed at IDL with the support of the Portuguese Foundation for Science and Technology through the FCT – project UIDB/50019/2020 – IDL – Instituto Dom Luiz (IDL) and FCT – project JPIOCEANS/0001/2019 (ROADMAP: 'The Role of ocean dynamics and Ocean–Atmosphere interactions in Driving cliMAte variations and future Projections of impact–relevant extreme events'). We acknowledge the International Meteorological Institute (IMI) at Stockholm University for hosting Carlos Pires, and also the constructive discussion with Adam Monahan. Two anonymous referees provided constructive comments, which helped improve the manuscript. Many thanks for the undeniable support of the family.

Data availability statement

The data that support the findings of this study are available from the NOAA website: https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/nino34.long.anom.data

Disclosure statement

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

Appendix A:

Statistics of the smoothed bispectrum estimator

Let us denote the error of the smoothed bispectrum estimator (Equation19) as: (A1) δŜ3,x(f1,f2)Ŝ3,x(f1,f2)Γ3,x(f1,f2).(A1)

EquationEquation (A1) follows (for large N,M2,N/M22), a complex Gaussian PDF (Brillinger, Citation1965) with independent real and imaginary parts. The estimator’s bias is approximately given by: (A2) E[δŜ3,x(f1,f2)]λ̂(0)4π2M22(2Γ3,xf12+2Γ3,xf222Γ3,xf1f2),(A2)

Asymptotically, the real and imaginary parts of the estimator (Equation19) have equal variances (Rao and Gabr, Citation1984), for different f1,f2,f3, given by (A3) var[Re(Ŝ3,x))]=var[Im(Ŝ3,x))]=0.5M22Nb2Γx(f1)Γx(f2)Γx(f3)(A3) which decreases both with N and the two-dimensional bandwidth b2M2 where 1b2=[λ̂(u)λ̂(v)λ̂(uv)]2du dv, and hence larger spectral resolution implies larger variance.

The optimal value of M2 minimizing the bispectrum MSE is frequency dependent (Rao and Gabr, Citation1984). Instead, we use an overall criterion such that typical bispectrum fluctuations σŜ3,x  are larger than the average confidence interval half-size (cish), which writes as: σŜ3,x2max[ Re(Ŝ3,xsk(x)3)2¯, Im(Ŝ3,x)2¯] (A4) >(Φ1α2)2M22Nb2Γx(f1)Γx(f2)Γx(f3)¯=cish2(A4) where the overbar means integration over the domain of bifrequencies, and Φq(·) is qth quantile of the standard normal. Then, for the best bispectral resolution we chose the largest M2 satisfying (A4).

Here, we are looking for a statistical test of the null bispectrum  of NGAR(5). For that, we need good bias and variance approximations of the smoothed bispectrum. To achieve this, even for a reduced number of temporal degrees of freedom (the case here), we have generated 10,000 Monte-Carlo N-sized time series or surrogates x˜(t) based on the null H0, and construced ensemble statistics. The bispectrum error is δŜ3,x˜(f1,f2)=Ŝ3,x˜(f1,f2)Γ3,x˜(f1,f2) with bias E[δŜ3,x˜]=E[Ŝ3,x˜]Γ3,x˜. Monte-Carlo biases of the NGAR(5) bispectrum () are positive (negative) near the local bispectrum minima (maxima) (see ), in agreement with the asymptotic bias (A2). Exception holds near the origin due possibly to under sampling of low frequencies.

Fig. A1. Real (a) and imaginary (b) parts of bias E[δŜ3,x˜]. Real (c) and imaginary (d) values of the ratio varrat(Ŝ3,x˜)  between empirical variances var(Ŝ3,x) and total asymptotic variance.

Fig. A1. Real (a) and imaginary (b) parts of bias E[δŜ3,x˜]. Real (c) and imaginary (d) values of the ratio varrat(Ŝ3,x˜)  between empirical variances var(Ŝ3,x) and total asymptotic variance.

Fig. A2. Values of the non-Gaussianity diagnostic Υx˜ (equal to 1 under Gaussianity) versus the 50%, 75%, 90%, 95% and 99% half-quantiles 12qp( |Tx˜,x˜|2) of the squared bispectrum normalized error, collected over all the 173 bifrequencies of lattice L. A linear adjustment is shown for each quantile.

Fig. A2. Values of the non-Gaussianity diagnostic Υx˜ (equal to 1 under Gaussianity) versus the 50%, 75%, 90%, 95% and 99% half-quantiles 12qp( |Tx˜,x˜|2) of the squared bispectrum normalized error, collected over all the 173 bifrequencies of lattice L. A linear adjustment is shown for each quantile.

Fig. A3. Position of bifrequencies in the set L used in the integral test of linearity.

Fig. A3. Position of bifrequencies in the set L used in the integral test of linearity.

The deviation between the asymptotic variance (A3) and the Monte-Carlo variance var[Ŝ3,x˜] (for the real and imaginary parts) is assessed through the ratio: (A5) varrat(Ŝ3,x˜)var(Ŝ3,x˜)M22Nb2Γx˜(f1)Γx˜(f2)Γx˜(f3),(A5) shown in . It can be verified from this figure that when the distance to the principal domain edges or vertices slightly surpasses the bandwidth b2M2=0.036 cpt, then the ratio is nearly constant, varying in the range 0.65–0.80, which is larger than the value 0.5 observed in asymptotic conditions, and hence (A3) underestimates the estimator’s variance (due to sampling (NeffN)).

In order to evaluate the distance between the estimated Ŝ3,x(f1,f2) and the H0 bispectrum, we compute the standardized error (real and imaginary parts): (A6) Tx,x˜(f1,f2)Ŝ3,x(f1,f2)E[Ŝ3,x˜][var(Ŝ3,x˜)]1/2 (A6)

Asymptotically, under H0, both real and imaginary parts of Tx˜,x˜ are independent standard Gaussian, and  |Tx˜,x˜|2Re(Tx,x˜)2+Im(Tx,x˜)2χ22. However, for finite samples, Tx˜,x˜ tends to be non-Gaussian and leptokurtic and hence the diagnostic Υx˜ var( |Tx˜,x˜|2) 2E( |Tx˜,x˜|2) gets values larger than 1. For the purpose of H0 tests, we also compute the Monte-Carlo quantiles qp( |Tx˜,x˜|2). Therefore, the H0 bispectrum is rejected at the frequency pair (f1,f2) if Tx,x˜(f1,f2) or |Tx,x˜|2 fall outside their H0 nonrejection intervals.

The non-Gaussianity assessment of Tx˜,x˜  is presented in , showing Υx˜ versus 12qp( |Tx˜,x˜|2) , for p= 50%, 75%, 90%, 95% and 99%, of the local test  |Tx˜,x˜|2. The leftmost points in are set to the Gaussian case (Υx˜=1) and quantiles are those of 12χ22. Moreover, quantiles are well adjusted by a linear fitting of Υx˜. Quantiles up to p= 95% are well approximated by those of 12χ22, but less for the whereas the 99%. Consequently, to test H0  by using Tx,x˜ and  |Tx,x˜|2, with 5% significance or larger, we can use the χ22.

The local test (A6) used in Section 5.3.5 is not very powerful because H0 can be rejected at random frequencies even for finite H0 samples. To overcome this an integrated test is used based on the total squared amplitudes: (A7) Tint x,x˜(f1,f2)L |Tx,x˜(f1,f2)|2,(A7) along nL  bifrequencies of a regularly sampled lattice L, with frequency step, Δf= 5N=0.0084 cpm, i.e. 14 of the two-dimensional (2D) bandwidth b2M2 to get a good bispectrum representation.  Tx,x˜ values become uncorrelated when the distance between bifrequencies is typically greater than the 2D bandwidth. The lattice L () covers the significant region, where Γx˜(f1)Γx˜(f2)Γx˜(f3)MaxPD(Γx˜(f1)Γx˜(f2)Γx˜(f3)εL=0.01, yielding nL=173 bifrequencies.

Under H0, the test (A7) has an average 2nL=346 and variance equal to 16527, much larger than var(χ2nL2)=692, due to correlated bispectrum errors and due to sampled bispectra very far from the theoretical bispectrum. The upper quantiles of Tintx˜ ,x˜ for probabilities 80%, 95%, 99% and 99.5% are 489, 563, 804 and 903 respectively, which are the rejection thresholds of NGAR(5) H0 bispectrum, used in Section 5.3.5.

Appendix B:

Minimization of the cost function of the hybrid fitting

B.1. Method

The minimization of Jhyb(θ)  is based on a Quasi–Newton method, and to prevent bad-conditioning, the guess correction is performed along the low-rank subspace spanned by the leading eigenvectors of the Hessian matrix H: (B1) θk+1=θkδ[Hlr(Jhyb(θk))]1gradlr(Jhyb(θk))(B1) where δ]0,1] weights the iteration step, Hlr= RlrDlrRlr is the low-rank Hessian diagonalisation, and gradlr=Rlrgrad is the low-rank gradient. The iterative procedure stops when the relative difference between consecutive values is less than 10–6. The gradient and the Hessian matrix are obtained by time-averages of auxiliary linear models that we describe bellow.

B.2. Gradient of the cost function

The cost function (Eq. (37)) of the hybrid fitting is: (B2) Jhyb(θ)=cfittJfitt(θ)+cstatJstat(θ,σw(θ))(B2) where σwθ=σx(Jfitt(θ))1/2 and θ=(θ1,,θdimθ). The partial derivative of Jhyb with respect to θi is (B3) Jhybθi=cfittJfittθi+cstat(Jstatθi+Jstatσwσwθi)(B3) where σwθi=σw2JfittJfittθi. The derivatives of Jfitt are (B4) Jfittθi=2(NNγ)σx2t=NNγN1ε(t)ε(t)θi ,(B4) where ϵ(t)=x(t)F[x(t1), ε(t1), θ] and ε(t)θi=(Fθi+Fε(t1)·ε(t1)θi)  with the dot standing for inner product along all components of the vector ε(t1)  of past innovations. Those derivatives are random variables governed by an AR linear model whose random coefficients depend on the time-series values. Therefore Jfittθi is approximated by a time average of a product containing the derivative ε(t)θi.

The derivatives of Jstat are (B5) Jstatθi=2(SsimSobs)σx2Nsimt=1Nsimx˜(t)θi+2b1τmaxσx4 |τ|τmaxλ(τ)2[Csim(τ)Cobs(τ)]Csim(τ)θi  +2b2τmax2σx6 |τ1|,|τ2|τmaxλ(τ1,τ2)2[Bsim(τ1,τ2)Bobs(τ1,τ2)]Bsim(τ1,τ2)θi  (B5) with (B6) Csim(τ)θi=1Nsimτt=1Nsimτ[x˜(t)x˜(t+τ)θi+x˜(t+τ)x˜(t)θi](B6) (B7) Bsim(τ1,τ2)θi=1Nmax(τ1,τ2)t=1Nmax(τ1,τ2)[x˜(t)x˜(t+τ1)x˜(t+τ2)θi+x˜(t)x˜(t+τ2)x˜(t+τ1)θi+x˜(t+τ1)x˜(t+τ2)x˜(t)θi](B7) where x˜(t)=F[x˜(t1), u(t1), θ]+σww(t) ,u(t1)=σww(t1) and x˜(t)θi=(Fθi+Fx˜(t1)·x˜(t1)θi) applied to a generic time. An identical formula applies to Jstatσw, depending on partial derivatives x˜(t)σw=(Fu(t1)·w(t1)+Fx˜(t1)·x˜(t1)σw)+w(t) where dot stands for inner products along components of the standard Gaussian past innovations w(t1) and components of the past states x˜(t1). Finally, we must see that Jstatθi is made by time-averages along the simulation period of a sum of products containing the random derivatives x˜(t)θi.

The particular form of the derivatives u(t)θi, x˜(t)θi,x˜(t)σw  depend on the stochastic model. In the case of bilinear models (EquationEq. (29)), Rao and Gabr (Citation1984) have obtained similar formulas, for t=0,,N1: (B8) u(t)ak=(x(tlk)+i=1p2bix(tri)u(tsi)ak )(B8) (B9) u(t)bk=(x(trk)u(tsk)+i=1p2bix(tri)u(tsi)ak )(B9) (B10) u(t)α=(1+i=1p2bix(tri)u(tsi)α ),(B10) using initial conditions as pseudo-observations , x(t)=u(t)=u(t)θi=0 for , t=Nγ1,,1. The derivatives of the states are, for t=0,,Nsim: (B11) x˜(t)ak=x(tlk)+i=1p1aix˜(tli)ak +i=1p2σwbix˜(tri)ak w(tsi) (B11) (B12) x˜(t)bk=σwx(trk)w(tsk)+i=1p1aix˜(tli)bk +i=1p2σwbix˜(tri)bk w(tsi) (B12) (B13) x˜(t)α=1+i=1p1aix˜(tli)α +i=1p2σwbix˜(tri)α w(tsi) (B13) (B14) x˜(t)σw=w(t)+i=1p1σwaix˜(tli)σw +i=1p2σwbix˜(tri)σw w(tsi),  (B14) using initial conditions as pseudo-observations , x˜(t)=α; w(t)=x˜(t)θi=0 ,x˜(t)α=1  for , t=Npast,,0, and the derivatives are then substituted into the derivatives of Jstat.

B.3. Hessian matrix of the cost function

The Hessian matrix is obtained from: (B15) 2Jhybθiθj=cfitt2Jfittθiθj+cstat[2Jstaθiθj+Jstatσw2σwθjθi+σwθi2Jstatθjσw+σwθj2Jstatθiσw+σwθiσwθj2Jstatσwσw] (B15) where 2σwθjθi=σw2Jfitt2Jfittθiθjσw4Jfitt2JfittθiJfittθj. Because the functions involved are sums of squares, we use the approximation of Marquardt (Citation1963), i.e., products of first derivatives. The second derivatives of Jfitt are the approximated as (B16) 2Jfittθiθj2(NNγ)σx2t=NNγN1u(t)θi u(t)θj(B16)

The second derivatives of Jstat are then approximated as: (B17) 2Jstatθiθj2σx4Nsim2[t=1Nsimx˜(t)θi][t=1Nsimx˜(t)θj]+2b1τmaxσx4 |τ|τmaxλ(τ)2Csim(τ)θiCsim(τ)θj  +2b2τmax2σx6 |τ1|,|τ2|τmaxλ(τ1,τ2)2Bsim(τ1,τ2)θiBsim(τ1,τ2)θj  (B17)

The same formula applies for mixed derivatives with respect to σw.

Appendix C:

Symbols and acronyms (by order of appearance)

 xraw==

Raw time series

 xcent==

Detrended and centered time series

σ(x)  sk(x)  ekurt(x)==

Standard deviation, skewness and excess kurtosis of signal x

x(t)==

Standardized time series

s(t), f(t)==

Slow and fast components of the standardized time series

N==

Time series size

E(·)= =

Expectation operator

Neff==

Number of effective degrees of freedom

Cx(τ)==

Empirical autocovariance

Xx(f)Ax(f)e[iΘx(f) ]==

DFT of the signal x(t) and its polar form

Sx(f)==

Periodogram

Γx(f)==

Exact spectrum

γx(τ)==

Exact autocovariance

Ŝx(f)==

Smoothed spectrum

λ(τ)==

Window function

λ̂(u)==

Standardized window function

M==

1D window length

b1==

Standardized 1D bandwidth

ν==

Degrees of freedom of the qui-square

qp(·)==

Quantile por probability p and PDF (·)

w(t)==

Zero centered white noise (not necessarily Gaussian nor purely random)

σw==

Standard deviation of the noise

σw,p2==

Noise variance from fitting data with an AR(p) model

γx(τ1,τ2)==

Exact bicovariance

Cx(τ1,τ2)==

Empirical bicovariance

Δ==

Forecast lag in unit time steps

xnl(t)==

Nonlinear component of x(t)

TCox Δ=

Cox test for lag Δ and quantile 95% under the linearity hypothesis

TCox – 95 %=

δ(·)=

Kronecker delta

Γ3,w(f1,f2)==

Exact bispectrum

Γ3,y(f1,f2)==

Normalized exact bispectrum

S3,x(f1,f2)==

Bi-periodogram

Λ(τ1,τ2)==

2D window function used in the smoothed bispectrum

Ŝ3,x==

Smoothed bispectrum

Ŝ3n,x=

Smoothed normalized bispectrum

δŜ3,x==

Error of the smoothed bispectrum estimator

varar==

Total smoothed bispectrum error variance under asymptotic conditions

(·)¯==

Average over the bifrequency domain

σŜ3,x==

Typical bispectrum fluctuations

x˜(t)==

Null hypothesis model process

H0==

Null hypothesis or the null hypothesis model

Φ·   ==

Standard Gaussian quantile function

Tx,x˜(f1,f2)==

Standardized bispectrum difference between a model x and x˜

varrat(Ŝ3,x˜)==

Ratio between empirical and total asymptotic bispectrum variance

Tint x,x˜==

Sum of squared amplitudes of Tx,x˜ over a certain set L , used to test H0 Δf

Δf==

Frequency step used in the lattice set L

εL==

Minimum threshold fraction of the asymptotic error variances accepted in L

nL==

Cardinal of L

A(f)==

Frequency function intervening in the AR model

fcut==

Cutoff frequency

Υx˜==

Diagnostic of non-Gaussianity of bispectrum errors

x˜(t)==

Simulated state at time t

u(t)==

Driving noise of a simulating model

x˜(t1)==

Vector of past simulated states

u(t1)==

Vector of past driving noises or innovations

Jfitt(θ) = =

Sum of the squared residuals as function of parameters in vector θ

ε(t)==

forecasting error at lag 1

ε(t1)==

Vector of past forecasting errors

Npast=

Size of the trail sector of a simulating time series

Nγ==

Maximum lag used in a simulating model

Nsim ==

Size of the sector of a simulating time series, used to compute statistics

Sobs   Ssim==

Averages from the observed (obs) and simulated (sim) time series

Cobs(τ)   Csim(τ)==

Noncentered lagged moments of second order for obs. and sim. data

Bobs (T1, T2)=

Noncentered lagged moments of third order for obs. and sim.

Bsim (T1, T2)=

JS, JC,JB==

Normalized squared deviation between the obs. and sim. statistics: average, lagged moments of second and third order respectively

Jstat==

Weighted sum of JS, JC,JB assessing the average deviations between obs. and sim. statistics.

Jhyb==

Weighted sum of Jfitt and Jstat with weights cfitt and cstat , to minimize in the hybrid fitting

corlag==

Correlation skill between observations and predictions at a certain lag

cpt =

Cycles per trimester

PD =

Principal domain, i.e. the triangle of vertices (0,0), (0,1/2) and (1/3,1/3) in the bifrequency domain.

NGAR(5)=

Non-Gaussian autoregressive model or order 5, used to build the null hypothesis bispectrum

NAIC =

Normalized Akaike Information Criterion