# 2 Blind Source Separation

Blind Source Separation (BSS) assumes that an individual source signal \(\boldsymbol{z}=(\boldsymbol{z}_1, \boldsymbol{z}_2,\dots,\boldsymbol{z}_p)'\) is mixed by a \(p\times p\) matrix \(\boldsymbol{\Omega}\) and thus produces the mixture \(\boldsymbol{x}=(\boldsymbol{x}_1, \boldsymbol{x}_2,\dots,\boldsymbol{x}_p)'\), where the signals themselves can be multidimensional. The mixing mechanism shall satisfy \(\boldsymbol x = \boldsymbol \mu + \boldsymbol\Omega \boldsymbol z\), where \(\boldsymbol\mu\) stands for a \(p\)-variate static location parameter, usually the mean value (Belouchrani et al., 1997). Without loss of generality, it can be further assumed that \(\boldsymbol x\) embeds the zero-mean property, which is always achievable through subtracting the mean from the mixture, leading to further simplified mathematical representation.

Relying on certain (assumed) property/properties in the mixing mechanism, there exist different approaches to solve the BSS problem in terms of identifying the mixing matrix and source signal series. Independent Component Analysis (ICA) is perhaps the most well-known methodology to tackle the BSS problem, which was introduced by Hérault and Ans (1984) and became widely popular in early 1990s. Since Central Limit Theorem suggests that sufficiently many independent signals together approximate the Gaussian distribution, ICA finds the independent components by maximizing the non-Gaussianity. Kurtosis and negative entropy (negentropy) are among the best measures for non-Gassianity (Comon, 1994; Hyvärinen & Oja, 2000; Tharwat, 2018). Another common BSS solution, Second Order Blind Identification (SOBI, Belouchrani et al., 1997) is based on autocovariance, which is conceptualized by the theorem that diagonal autocovariance matrices suggest uncorrelated components. Later sections will elaborate SOBI in further detail. SOBI differs from ICA mainly from two aspects: (1) ICA uses fourth-order statistics like kurtosis while SOBI uses second-order statistics; (2) SOBI seeks to recover uncorrelated components while ICA recovers independent components.

Albeit mathematical unnecessity, \(\boldsymbol{\Omega}\) is defined as a full-rank \(p\times p\) matrix. A decrease in dimension of \(\boldsymbol{\Omega}\) will allow fewer signal series (less-than-\(p\)-variate time series) generated from the same mixture \(\boldsymbol{x}\). However, it is not necessary due to the fact that the outcome carries real-world information, and the usefulness of a series would be more properly determined with the support of further evidence from both the outcome and source. Consequently, it is a justified choice to force the mixing matrix to be of full-rank.

The ultimate goal of BSS is to restore the source signals based on the observed data, and the above model assures that finding either the mixing matrix \(\boldsymbol{\Omega}\) or the unmixing matrix \(\boldsymbol{\Gamma}= \boldsymbol{\Omega}^{-1}\) suffices, where the full-rank assumption guarantees the existence of inverse matrix. For notation simplification and clarity, the following paragraphs shall only use \(\boldsymbol\Omega\).

## 2.1 Ambiguities and Assumptions

The BSS model underlies the impossibility to identify \(\boldsymbol\Omega\) and \(\boldsymbol z\) unambiguously because of merely one known item, and thus BSS solution tends to incur (Belouchrani et al., 1997),

- Permutation ambiguity: the \(a\)-th series of signal may be confused into \(b\)-th series in the restored one; however, they are always 1-1 correspondent after sign-correction. The permutation can also include sign ambiguity.
- Scale ambiguity: restored signals can be scaled since for any scale constant \(||a||:\ \boldsymbol x \equiv \big(\boldsymbol\Omega ||a||^{-1} \big) \big(||a|| \boldsymbol z \big)\). The scale constant is usually unknown.

The ambiguity can be summarised mathematically as \(\boldsymbol\Gamma = \boldsymbol{C \Omega}^{-1}\), where \(\boldsymbol C \in \mathcal C\) is a set of matrices that contain exactly one non-zero element in each row and column.

Figure 2.1 illustrates the ambiguity. The observed source is a 4-dimensional signal combining auto-regressive, moving average, sinusoidal and Electrocardiogram ECG-like time series. Comparing two plots, source series 2 is clearly mapped into series 1 in restored signals, while series 4 become series 2. The scale of the y-axis intimates the scale ambiguity, though the shape and waveform are highly analogous.

## 2.2 Stationary Time-Series Source Separation Using Autocovariance Matrices

Focusing on the autocovariance matrices, that is, second order statistics, the Second Order Source separation (SOS, Belouchrani et al., 1997; Miettinen et al., 2016) model seeks to extract original source signals based on the property of uncorrelatedness. Meanwhile, the source signals are assumed to be weakly stationary, implying that the autocovariances vary only on the lag \(\tau\) but not on the time point \(t\), which are mathematically expressed as \(\mathbb E (\boldsymbol z_t \boldsymbol z_{t+\tau}')\) being invariant given \(\tau\) for all \(t\). The SOS model states the discrete \(p\)-variate stochastic process \((\boldsymbol z_t)_{t=0,\pm1, \pm2, \dots}\) as (unobservable) source signals and the (observable) mixture \((\boldsymbol x_t)_{t=0,\pm1, \pm2, \dots}\) such that,

\[\begin{equation} \begin{aligned} \boldsymbol x_t = \boldsymbol{\Omega}\boldsymbol z_t, \text{ where } & \boldsymbol z_t \text{ satisfies } \\ (A1)\ & \mathbb E( \boldsymbol z_t) = \boldsymbol 0 \\ (A2)\ & \mathbb E( \boldsymbol z_t \boldsymbol z_t') = \boldsymbol I_p \\ (A3)\ & \mathbb E( \boldsymbol z_t \boldsymbol z_{t+\tau}') = \boldsymbol\Lambda_\tau \text{ diagonal for all } \tau = 1,2,\dots \end{aligned} \tag{2.1} \end{equation}\]

(Miettinen et al., 2016). The condition \((A1)\) simplifies the model with pre-centered observation; \((A2)\) further restrains the source signals in unit scale, solving the scale ambiguity; and \((A3)\) ensures both stationarity and uncorrelatedness. Given realization of the stochastic process \((\boldsymbol x_1, \boldsymbol x_2, \dots)\) (a \(p\)-variate time-series), this semi-parametric model can be solved using sample autocovariance matrices by joint optimization for the diagonal properties in \((A3)\) under the restriction of \((A2)\) in (2.1), which yield to a unique constrained optimization that can be solved using Lagrange method after proper whitening and/or normalization procedures (Miettinen et al., 2016).

The Second Order Blind Identification (SOBI) solves the BSS problem using second-order statistics when the signal and mixing mechanism obey the SOS model (2.1). The major second-order blind source separation approaches include Algorithm for Multiple Unknown Signals Extraction (AMUSE, Tong et al., 1990) and Second-Order Blind Identification (SOBI, Belouchrani et al., 1997). Nordhausen (2014) expanded the algorithm to non-stationary time series using locally stationary intervals and further robustifying the method with average spatial-sign autocovariances on such intervals.

### References

Belouchrani, A., Abed-Meraim, K., Cardoso, J.-F., & Moulines, E. (1997). A blind source separation technique using second-order statistics. *IEEE Transactions on Signal Processing*, *45*(2), 434–444.

Comon, P. (1994). Independent component analysis, a new concept? *Signal Processing*, *36*(3), 287–314.

Hérault, J., & Ans, B. (1984). Réseau de neurones à synapses modifiables: Décodage de messages sensoriels composites par apprentissage non supervisé et permanent. *Comptes Rendus Des Séances de L’Académie Des Sciences. Série 3, Sciences de La Vie*, *299*(13), 525–528.

Hyvärinen, A., & Oja, E. (2000). Independent component analysis: Algorithms and applications. *Neural Networks*, *13*(4-5), 411–430.

Miettinen, J., Illner, K., Nordhausen, K., Oja, H., Taskinen, S., & Theis, F. J. (2016). Separation of uncorrelated stationary time series using autocovariance matrices. *Journal of Time Series Analysis*, *37*(3), 337–354.

Nordhausen, K. (2014). On robustifying some second order blind source separation methods for nonstationary time series. *Statistical Papers*, *55*(1), 141–156.

Tharwat, A. (2018). Independent component analysis: An introduction. *Applied Computing and Informatics*.

Tong, L., Soon, V., Huang, Y., & Liu, R. (1990). AMUSE: A new blind identification algorithm. *Circuits and Systems, 1990., IEEE International Symposium on*, 1784–1787.