**4.4 Principal Component Analysis (PCA)**

The PCA network of Section 3.3.5, employing Sanger's
rule, is analyzed in this section. Recalling Sanger's rule [from
Equation (3.3.19)] and writing it in vector form for the continuous-time
case, we get

(4.4.1)

with *i *= 1, 2, ..., *m*, where
*m* is the number of units in the PCA network. We will assume,
without any loss of generality, that *m* = 2. This leads
to the following set of coupled learning equations for the two
units:

(4.4.2)

and

(4.4.3)

Equation (4.4.2) is Oja's rule. It is independent
of unit 2 and thus converges to , the
principal eigenvector of the autocorrelation matrix of the input
data (assuming zero mean input vectors). Equation (4.4.3) is
Oja's rule with the added inhibitory term .
Next, we will assume a sequential operation of the two-unit net
where unit 1 is allowed to fully converge before evolving unit
2. This mode of operation is permissible since unit 1 is independent
of unit 2.

With the sequential update assumption, Equation
(4.4.3) becomes

(4.4.4)

For clarity, we will drop the subscript on **w**.
Now, the average learning equation for unit 2 is given by

(4.4.5)

which has equilibria satisfying

(4.4.6)

Hence, and
with *i* = 2, 3, ..., *n* are solutions. Note that
the point is not an equilibrium. The
Hessian is given by

(4.4.7)

Since is not positive definite,
the equilibrium **w*** = **0**
is not stable. For the remaining equilibria we have the Hessian
matrix

(4.4.8)

which is positive definite only at ,
assuming 2 3.
Thus, Equation (4.4.5) converges asymptotically to the unique
stable vector which is the eigenvector
of **C** with the second largest eigenvalue 2.
Similarly, for a network with *m* interacting units according
to Equation (4.4.1), the *i*th unit (*i* = 1, 2, ..., *m*)
will extract the *i*th eigenvector of **C**.

The unit-by-unit description presented here helps
simplify the explanation of the PCA net behavior. In fact, the
weight vectors **w***i*
approach their final values simultaneously, not one at a time.
Though, the above analysis still applies, asymptotically, to
the end points. Note that the simultaneous evolution of the **w***i*
is advantageous since it leads to faster learning than if the
units are trained one at a time.

Goto [4.0][4.1] [4.2] [4.3] [4.5] [4.6] [4.7] [4.8] [4.9] [4.10]

Back to the Table of Contents

Back to Main Menu