# 3 Time-Varying Second-Order Model Formulation and Yeredor’s Solution

Time-varying Second-Order Source Separation (TV-SOS, Yeredor, 2003) refers to the existence of subtle change in mixing matrix \(\boldsymbol{\Omega}\). Within the SOS model (2.1), linear time variation can be represented as \(\boldsymbol\Omega_t = (\boldsymbol I + t \boldsymbol{\mathcal E})\boldsymbol\Omega_0\), which is clearly non-constant despite \(\boldsymbol{\mathcal E}\) and \(\boldsymbol\Omega_0\) are. The initial mixing \(\boldsymbol \Omega_0\) is the \(p\times p\) mixing matrix at time \(t=0\); the time-varying factor \(\boldsymbol{\mathcal E}\) is another \(p\times p\) matrix that measures the scale of linear variation in mixing matrix over the change of time. In addition to this linear variation, other time-varying structures include periodical (Weisman & Yeredor, 2006), geometric curved (Kaftory & Zeevi, 2007), etc. Figure 3.1 illustrates the difference between an ordinary time-invariant mixture and linearly time-varying one. It can be discovered that the time-varying mixture has trends and it does not demonstrate the stationary property. In fact, the introduction of time-varying factor invalids the stationary property in the observation \(\boldsymbol{x}\) almost surely even though the source signals are stationary. This is because \(\boldsymbol{\mathcal E}\) changes the scale (second-order statistics) over time \(t\). Nevertheless, the aforementioned SOBI and the upcoming LTV-SOBI algorithms do not require such property on the observations; only the source signals are bind to stationarity.

## 3.1 TV-SOS Model and Assumptions

TV-SOS model serves as an extension to the SOS model (2.1) and is also a special case of general-TV-SOS where the time-dependent variation is assumed to be linear. Focusing on the realization of stochastic processes, an observable \(p\)-variate time series in TV-SOS satisfies

\[\begin{equation} \begin{aligned} \boldsymbol x_t = ( \boldsymbol I + t \boldsymbol{\mathcal E})\boldsymbol \Omega_0 \boldsymbol z_t, \text{ where }& \boldsymbol z_t \text{satisfies } \\ (B1)\ & \mathbb E( \boldsymbol z_t) = \boldsymbol 0 \\ (B2)\ & \text{Cov}( \boldsymbol z_t) = \boldsymbol I \\ (B3)\ & \text{Cov}( \boldsymbol z_t, \boldsymbol z_{t+\tau}') = \boldsymbol\Lambda_\tau \text{ diagonal for all } \tau = 1,2,\dots \\ (B4*)\ & \boldsymbol{\mathcal E} << \boldsymbol I \end{aligned} \tag{3.1} \end{equation}\]

Similar to SOS, the first assumption \((B1)\) is non-restrictive and achievable through data transformation; \((B2)\) is required to tackle the ambiguity of BSS, while \((B3)\) states both stationary and uncorrelated characteristics in source signals. Further, the diagonal elements in \(\boldsymbol\Lambda_\tau\) have to be different to ensure identifiability of each series. The final and optional assumption \((B4)\) ensures that the change of mixing is rather slow so that the meaningfulness of BSS is not greatly compromised, and this assumption can simplify BSS process in some algorithms (Yeredor, 2003). It should be noted that the model assumes uncorrelatedness instead of independence in pre-centered source signals, formally, the Pearson sample correlation between different series is always \(0\), or \(\text{Cov}(\boldsymbol x_i, \boldsymbol x_j) = \boldsymbol 0\) for all \(i\neq j\). It is a relatively less-restrictive condition as independence implies uncorrelatedness, while the opposite is not true in general (e.g. Papoulis & Pillai, 2002).

## 3.2 Yeredor’s TV-SOBI Algorithm

TV-SOBI is the original algorithm provided by Yeredor (2003) that solves the above TV-SOS model. While the SOS model can be identified by obtaining one single matrix \(\boldsymbol\Omega\), TV-SOS demands at least one more matrix to be effectively estimated, the linear time-varying mixing factor \(\boldsymbol{\mathcal E}\) in addition to the initial mixing matrix \(\boldsymbol{\Omega}_0\). Similar to model fitting in time series analysis, a prerequisite is that the desired lags must be chosen beforehand based on the data characteristics (for example with the help of `acf`

function; a proper choice of lags could be challenging, and this topic would not be elaborated in this thesis), and suppose \(L = \{\tau_1, \tau_2,\dots, \tau_{l}\}\) is the set of pre-defined lags. For convenience, let \(\tau\in\{0\} \bigcup L\). Yeredor’s algorithm first finds the approximate three-item expression of autocovariances as,

\[\begin{equation} \begin{aligned} \mathbb E(\boldsymbol x_t \boldsymbol x_{t+\tau}') & = \mathbb E[( \boldsymbol I + t \boldsymbol{\mathcal E}) \boldsymbol\Omega_0 \boldsymbol z_t \ \boldsymbol z_{t+\tau}' \boldsymbol\Omega_0' [ \boldsymbol I + (t + \tau) \boldsymbol{\mathcal E}]'] \\ &= \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0'+ t ( \boldsymbol{\mathcal E} \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0' + \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0' \boldsymbol{\mathcal E}') + t^2 ( \boldsymbol{\mathcal E} \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0' \boldsymbol{\mathcal E}') \\ &\ \ \ \ \ \ + \tau ( \boldsymbol{\mathcal E} \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0') + t \tau ( \boldsymbol{\mathcal E} \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0' \boldsymbol{\mathcal E}') \\ &= \underline {\boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0'} + t (\underline{ \boldsymbol{\mathcal E} \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0' + \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0' \boldsymbol{\mathcal E}'}) \\ &\ \ \ \ \ \ + t(t+\tau) ( \underline{\boldsymbol{\mathcal E} \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0' \boldsymbol{\mathcal E}'}) + \tau ( \underline {\boldsymbol{\mathcal E} \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0'}) \\ & \approx \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0' + t ( \boldsymbol{\mathcal E} \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0' + \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0' \boldsymbol{\mathcal E}') + t^2 ( \boldsymbol{\mathcal E} \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0' \boldsymbol{\mathcal E}') \\ &:= \boldsymbol R ^{(1)}_\tau + t\, \boldsymbol R ^{(2)}_\tau + t^2\, \boldsymbol R ^{(3)}_\tau\ \end{aligned}, \tag{3.2} \end{equation}\]

where the shorthand notations are defined as \(\boldsymbol R ^{(1)}_\tau = \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0'\), \(\boldsymbol R ^{(2)}_\tau = \boldsymbol{\mathcal E} \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0' + \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0' \boldsymbol{\mathcal E}'\) and \(\boldsymbol R ^{(3)}_\tau =\boldsymbol{\mathcal E} \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0' \boldsymbol{\mathcal E}'\); those three items are the sample autocovariance decomposition of the observed mixture. Since \(\boldsymbol{\mathcal E}\) is assumed to be rather small in quantity, the items \(\tau ( \boldsymbol{\mathcal E} \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0')\) and \(t \tau ( \boldsymbol{\mathcal E} \boldsymbol\Omega_0 \boldsymbol\Lambda_\tau \boldsymbol \Omega_0' \boldsymbol{\mathcal E}')\) would also be insignificant due to their scale transformation by \(\boldsymbol{\mathcal E}\); Yeredor hence argues that they are negligible.

Then, Yeredor tried to estimate the \(p\times p\) matrices of \(\boldsymbol R ^{(1)}_\tau,\ \boldsymbol R ^{(2)}_\tau\) and \(\boldsymbol R ^{(3)}_\tau\) through a linear least-squares model. The model is achieved by casting matrix equation (3.2) to element-wise real-valued equations of

\[\begin{equation} \begin{bmatrix} \boldsymbol x_1 \boldsymbol x_{1+\tau}'\ [i,j] \\ \boldsymbol x_2 \boldsymbol x'_{2+\tau}\ [i,j] \\ \vdots \\ \boldsymbol x_{T-\tau} \boldsymbol x_{T}'\ [i,j]\end{bmatrix} = \begin{bmatrix} 1 & 1 & 1^2 \\ 1 & 2 & 2^2 \\ \vdots &\vdots &\vdots \\ 1 & T-\tau & T^2 \end{bmatrix} \begin{bmatrix} \boldsymbol R ^{(1)}_\tau [i,j] \\ \boldsymbol R ^{(2)}_\tau [i,j] \\ \boldsymbol R ^{(3)}_\tau [i,j] \end{bmatrix} + \text{residuals}. \tag{3.3} \end{equation}\]

The LS-estimation leads to

\[\begin{equation} \begin{aligned} \begin{bmatrix} \widehat{\boldsymbol R} ^{(1)}_\tau [i,j] \\ \widehat{\boldsymbol R} ^{(2)}_\tau [i,j] \\ \widehat{\boldsymbol R} ^{(3)}_\tau [i,j] \end{bmatrix} = & \begin{pmatrix}\begin{bmatrix} 1 & 1 & 1^2 \\ 1 & 2 & 2^2 \\ \vdots &\vdots &\vdots \\ 1 & T-\tau & T^2 \end{bmatrix}' \begin{bmatrix} 1 & 1 & 1^2 \\ 1 & 2 & 2^2 \\ \vdots &\vdots &\vdots \\ 1 & T-\tau & T^2 \end{bmatrix} \end{pmatrix} ^{-1} \\ & \ \ \begin{bmatrix} 1 & 1 & 1^2 \\ 1 & 2 & 2^2 \\ \vdots &\vdots &\vdots \\ 1 & T-\tau & T^2 \end{bmatrix}' \begin{bmatrix} \boldsymbol x_1 \boldsymbol x_{1+\tau}'\ [i,j] \\ \boldsymbol x_2 \boldsymbol x'_{2+\tau}\ [i,j] \\ \vdots \\ \boldsymbol x_{T-\tau} \boldsymbol x_{T}'\ [i,j]\end{bmatrix}. \end{aligned} \end{equation}\]

Yeredor (2003) proposed a practical approach to optimize the best solution for \(\boldsymbol \Omega_0\) and \(\boldsymbol{\mathcal E}\). Denote the affine transformation (aka. whitening) matrix \(\boldsymbol W = \big( \widehat{\boldsymbol R}^{(1)}_0 \big)^{-\frac 1 2}\). The idea is then to apply sequential Jacobi rotations to optimize

\[\begin{equation} \min\limits_{V,\Lambda_{\tau_1}, \dots, \Lambda_{\tau_l}} \bigg( \sum\limits_{\tau=\tau_1}^{\tau_l} || \boldsymbol W \widehat{\boldsymbol R}^{(1)}_\tau \boldsymbol W' - \boldsymbol {V \Lambda}_\tau \boldsymbol V'||^2 \bigg), \end{equation}\]

where \(\Lambda_{\tau_1}, \dots, \Lambda_{\tau_l}\) are diagonal. This diagonal property is sufficient for the optimization algrithm, and the values of \(\Lambda_{\tau_1}, \dots, \Lambda_{\tau_l}\) will become available right after the optimization. This procedure is similar to the joint diagnolization procedure that will be detailed in section 4.2. The \(\boldsymbol{\mathcal E}\) can be found through optimization,

\[\begin{equation} \min\limits_{ \boldsymbol{\mathcal E} } \bigg( \sum\limits_{\tau=\tau_1}^{\tau_l} || \widehat{\boldsymbol R}^{(2)}_\tau - \boldsymbol{\mathcal E} \widehat{\boldsymbol R}^{(1)}_\tau - \widehat{\boldsymbol R}^{(1)}_\tau \boldsymbol{\mathcal E}' ||^2 \bigg). \end{equation}\]

Finally, Yeredor’s TV-SOBI concludes with \(\widehat{\boldsymbol \Omega}_0 = \boldsymbol W ^{-1} \boldsymbol V\) and \(\widehat{ \boldsymbol{\mathcal E}}\). It can be noticed in the optimization steps that the value of \(\boldsymbol R ^{(3)}_{\tau}\) is not used. In fact, Yeredor even provided an alternative that excludes it from (3.2). Nonetheless, notice that \(\boldsymbol R ^{(3)}_{\tau}\) is closely associating with \(\boldsymbol R ^{(1)}_{\tau}\) and \(\boldsymbol R ^{(2)}_{\tau}\) as shown in equation (3.3), suggesting that inclusion or exclusion would truly impact the output of TV-SOBI. It is theoretically possible to include the expression of \(\boldsymbol R ^{(3)}_{\tau}\) in optimization procedures, but it would be mathematically too complicated and computationally too costly to find a solution, and the relatively naive approach described above suffices to solve the TV-SOS problem.

### References

Kaftory, R., & Zeevi, Y. Y. (2007). Probabilistic geometric approach to blind separation of time-varying mixtures. *International Conference on Independent Component Analysis and Signal Separation*, 373–380.

Papoulis, A., & Pillai, S. U. (2002). *Probability, random variables, and stochastic processes*. Tata McGraw-Hill Education.

Weisman, T., & Yeredor, A. (2006). Separation of periodically time-varying mixtures using second-order statistics. *International Conference on Independent Component Analysis and Signal Separation*, 278–285.

Yeredor, A. (2003). TV-sobi: An expansion of SOBI for linearly time-varying mixtures. *Proc. 4th International Symposium on Independent Component Analysis and Blind Source Separation (ICA’03), Nara, Japan*.