**4.2 Mathematical Theory of Learning in a Single Unit Setting**

In this section, instead of dealing separately with
the various learning rules proposed in the previous chapter, we
seek to study a single learning rule, called the general learning
equation (Amari, 1977a and 1990), which captures the salient features
of several of the different single-unit learning rules. Two forms
of the general learning equation will be presented: a discrete-time
version, in which the weight vector evolves according to a discrete
dynamical system of the form **w**(*k*+1) = *g*(**w**(*k*)),
and a continuous-time version, in which the weight vector evolves
according to a smooth dynamical system of the form .
Statistical analysis of the continuous-time version of the general
learning equation is then performed for selected learning rules,
including correlation, LMS, and Hebbian learning rules.

**4.2.1 General Learning Equation**

and the continuous-time version

**4.2.2 Analysis of the Learning Equation**

The gradient system in Equation (4.2.6) has special
properties that makes its dynamics rather simple to analyze.
First, note that the equilibria **w***
are solutions to <*J*> = **0**. This
means that the equilibria **w***
are local minima, local maxima, and/or saddle points of <*J*>.
Furthermore, it is a well established result that, for any > 0,
these local minima are asymptotically stable points (attractors)
and that the local maxima are unstable points (Hirsch and Smale,
1974). Thus, one would expect the stochastic dynamics of the
system in Equation (4.2.3), with sufficiently small , to approach
a local minimum of <*J*>.

In practice, discrete-time versions of the stochastic
dynamical system in Equation (4.2.3) are used for weight adaptation.
Here, the stability of the corresponding discrete-time average
learning equation (discrete-time gradient system) is ensured if
0 < < , where
max is the largest eigenvalue
of the Hessian matrix **H** = *J*, evaluated
at the current point in the search space (the proof of this statement
is outlined in Problem 4.3.8). These discrete-time "learning
rules" and their associated average learning equations have
been extensively studied in more general context than that of
neural networks. The book by Tsypkin (1971) gives an excellent
treatment of these iterative learning rules and their stability.

**4.2.3 Analysis of Some Basic Learning Rules**

which leads to the average learning equation

Now, by setting = **0**,
one arrives at the (only) equilibrium point

This equation shows that the Hessian of <*J*>
is positive definite; i.e., its eigenvalues are strictly positive
or, equivalently, the eigenvalues of
are strictly negative. This makes the system
locally asymptotically stable at the equilibrium solution **w***
by virtue of Liapunov's first method (see Gill et al., 1981; Dickinson,
1991). Thus, **w***
is a stable equilibrium of Equation (4.2.9). In fact, the positive
definite Hessian implies that **w***
is a minimum of , and therefore the gradient
system converges globally and asymptotically
to **w***, its only
minimum from any initial state. Thus, the trajectory **w**(*t*)
of the stochastic system in Equation (4.2.8) is expected to approach
and then fluctuate about the state .

From Equation (4.2.4), the underlying instantaneous
criterion function *J* is given by

(4.2.12)

which may be minimized by maximizing the correlation
*zy* subject to the regularization term .
Here, the regularization term is needed in order to keep the
solution bounded.

LMS Learning

For *r*(**w**, **x**, *z*)
= *z* - **w**T**x**
(the output error due to input **x**) and = 0, Equation (4.2.2)
leads to the stochastic equation

(4.2.13)

In this case, the average learning equation becomes

(4.2.14)

with equilibria satisfying

or

(4.2.15)

Let **C** denote the positive semidefinite autocorrelation
matrix , defined in Equation (3.3.4),
and . If we have ,
then is the equilibrium state. Note
that **w*** approaches
the minimum SSE solution in the limit of a large training set,
and that this analysis is identical to the analysis of the -LMS
rule in Chapter 3. Let us now check the stability of **w***.
The Hessian matrix is

(4.2.16)

which is positive definite if 0.
Therefore, **w*** = **C**-1**P**
is the only (asymptotically) stable solution for Equation (4.2.14),
and the stochastic dynamics in Equation (4.2.13) are expected
to approach this solution.

Finally, note that with = 0 Equation
(4.2.4) leads to

or

(4.2.17)

which is the instantaneous SSE (or MSE) criterion
function.

Hebbian Learning

Here, upon setting *r*(**w**, **x**, *z*)
= *y* = **w**T**x**,
Equation (4.2.2) gives the Hebbian rule with decay

(4.2.18)

whose average is

(4.2.19)

Setting in Equation (4.2.19)
leads to the equilibria

(4.2.20)

So if **C** happens to have
as an eigenvalue then **w***
will be the eigenvector of **C** corresponding to .
In general, though, will not be an eigenvalue
of **C**, so Equation (4.2.19) will have only one equilibrium
at **w*** = **0**.
This equilibrium solution is asymptotically stable if
is greater than the largest eigenvalue of **C** since this
makes the Hessian

(4.2.21)

positive definite. Now, employing Equation (4.2.4)
we get the instantaneous criterion function minimized by the Hebbian
rule in Equation (4.2.18):

(4.2.22)

The regularization term
is not adequate here to stabilize the Hebbian rule at a solution
which maximizes *y*2.
However, other more appropriate regularization terms can insure
stability, as we will see in the next section.

Goto [4.0][4.1] [4.3] [4.4] [4.5] [4.6] [4.7] [4.8] [4.9] [4.10]