The characterization of topological feature preserving
maps has received special attention in the literature (Kohonen,
1982b; Cottrell and Fort, 1986; Ritter and Schulten, 1986 and
1988b; Tolat, 1990; Heskes and Kappen, 1993a; Lo et al., 1993;
Kohonen, 1993b). In particular, Takeuchi and Amari (1979) and
Amari (1980 and 1983) have extensively studied a continuous-time
dynamical version of this map to investigate the topological relation
between the self-organized map and the input space governed by
the density *p*(**x**), the resolution and stability of
the map, and convergence speed. The characterization of a general
feature map is difficult, and much of the analysis has been done
under simplifying assumptions.

In the following, a one-dimensional version of the
self-organizing feature map of Kohonen is characterized following
the approach of Ritter and Schulten (1986). A continuous dynamical
version of Kohonen's map is also described and analyzed.

**4.7.1 Characterization of Kohonen's Feature Map**

Consider the criterion function *J*(**w**)
defined by (Ritter and Schulten, 1988a)

(4.7.1)

where *i**
is the label of the winner unit upon the presentation of stimulus
(input) **x***k*
and is the neighborhood function that
was introduced in Section 3.5. It can be seen that Equation (4.7.1)
is an extension of the competitive learning criterion function
of Equation (4.6.2). Performing gradient descent on (4.7.1) yields

(4.7.2)

which is just the batch mode version of Kohonen's
self-organizing rule in Equation (3.5.1). Thus, Kohonen's rule
is a stochastic gradient descent search and leads, on average
and for small , to a local minimum of *J* in Equation (4.7.1).
These minima are given as solutions to

(4.7.3)

This equation is not easy to solve; it depends on
the choice of and the distribution *p*(**x**). Actually,
we desire the global minimum of the criterion function *J*.
Local minima of *J* are topological defects like kinks in
one-dimensional maps and twists in two-dimensional maps [Kohonen,
1989; Geszti, 1990].

The analysis of feature maps becomes more tractable
if one replaces Equation (4.7.3) with a continuous version that
assumes a continuum of units and where the distribution *p*(**x**)
appears explicitly, namely

(4.7.4)

where **r*** = **r***(**x**)
is the coordinate vector of the winning unit upon the presentation
of input **x**.

An implicit partial differential equation for **w**
can be derived from Equation (4.7.4) [Ritter and Schulten, 1986].
However, for the case of two- or higher-dimensional maps, no
explicit solutions exist for **w**(**r**) given an arbitrary
*p*(**x**). On the other hand, solutions of Equation
(4.7.4) are relatively easy to find for the one-dimensional map
with scalar *r* and a given input distribution *p*(*x*).
Here, the equilibrium *w**
satisfies (Ritter and Schulten, 1986)

(4.7.5)

which, in turn, satisfies the implicit differential
equation corresponding to Equation (4.7.4), given by (assuming
a sharply peaked symmetric ):

(4.7.6)

In Equations (4.7.5) and (4.7.6), the term *p*(*w*)
is given by *p*(*x*)|*x*=*w*.
Equation (4.7.5) shows that the density of the units in *w*
space is proportional to around point
*r*. This verifies the density preserving feature of the
map. Ideally, however, we would have
for zero distortion. Therefore, a self-organizing feature map
tends to under sample high probability regions and over sample
low probability ones.

Finally, one may obtain the equilibria *w**
by solving Equation (4.7.6). The local stability of some of these
equilibria is ensured (with a probability approaching 1) if the
learning coefficient = (*t*) is sufficiently small,
positive, and decays according to the following necessary and
sufficient conditions (Ritter and Schulten, 1988b)

(4.7.7a)

and

(4.7.7b)

In particular, the decay law
with 0 < 1 ensures convergence. For laws with > 1
or exponential decay laws, Equation (4.7.7a) is not fulfilled,
and some residual error remains even in the limit *t* .
It can also be shown that during convergence, the map first becomes
untangled and fairly even, and then moves into a refinement phase
where it adapts to the details of *p*(*x*). Occasionally,
the "untangling" phase can slow convergence, because
some types of tangle (e.g., kinks and twists) can take a long
time to untangle. Geszti (1990) suggested the use of a strongly
asymmetric neighborhood function in order to speed up learning
by breaking the symmetry effects responsible for slow untangling
of kinks and twists.

**4.7.2 Self-Organizing Neural Fields**

The dynamics of the neural field potential *u*(**r**,
*t*) are given by

(4.7.8)

where

(4.7.9)

and *h* is a constant bias field. In Equation
(4.7.8), it is assumed that the potential *u*(**r**, *t*)
decays with time constant to the resting potential *h* in
the absence of any stimulation. Also, it is assumed that this
potential increases in proportion to the total stimuli *s*(**r**,**
x**) which is the sum of the lateral stimuli
and the input stimuli **w**T**x**
due to the input signal **x** R*n*.
A conceptual diagram for the above neural field is shown in Figure
4.7.2.

In Equation (4.7.8), the rates of change of and
**w**, if any, are assumed to be much slower than that of the
neural field potential. The input signal (pattern) **x** is
a random time sequence, and it is assumed that a pattern **x**
is chosen according to probability density *p*(**x**).
Also, we assume that inputs are applied to the neural field for
a time duration which is longer than the time constant of the
neural field potential. On the other hand, the duration of stimulus
**x** is assumed to be much shorter than the time constant
' of the weight **w**. Thus, the potential distribution *u*(**r**,
*t*) can be considered to change in a quasi-equilibrium manner
denoted by *u*(**r**, **x**).

An initial excitation pattern applied to the neural
field changes according to the dynamics given in Equation (4.7.8),
and eventually converges to one of the equilibrium solutions.
Stable equilibrium solutions are the potential fields *u*(**r**)
which the neural field can retain persistently under a constant
input **x**. The equilibria of Equation (4.7.8) must satisfy
or

(4.7.10)

where *s*(**r**, *u**)
is the total stimuli at equilibrium. When the lateral connections
distribution (**r** - **r**') is strongly off-surround inhibitory,
given any **x**, only a local excitation pattern is aroused
as a stable equilibrium which satisfies Equation (4.7.10) (Amari,
1990). Here, a local excitation is a pattern where the excitation
is concentrated on units in a small local region; i.e., *u**(**r**)
is positive only for a small neighborhood centered at a maximally
excited unit **r**0.
Thus *u**(**r**,**
x**) represents a mapping from the input space onto the neural
field.

Let us now look at the dynamics of the self-organizing
process. We start by assuming a particular update rule for the
input weights of the neural field. One biologically plausible
update rule is the Hebbian rule:

(4.7.11)

where is the neural field's
equilibrium output activity due to input **x**. In Equation
(4.7.11), we use the earlier assumption ' . Next, we assume
strong mixing in Equation (4.7.11) which allows us to write an
expression for the average learning equation (absorbing ' in
and in ) as

(4.7.12)

where the averaging is over all possible **x**.
The equilibria of Equation (4.7.12) are given by

(4.7.13)

If we now transpose Equation (4.7.12) and multiply
it by an arbitrary input vector **x** we arrive at an equation
for the change in input stimuli

(4.7.14)

The vector inner product **x**T
**x**' represents the similarity of two input signals **x**
and **x**' and hence the topology of the signal space (Takeuchi
and Amari, 1979). Note how Equation (4.7.14) relates the topology
of the input stimulus set {**x**} with that of the neural field.

On the other hand, if one assumes a learning rule
where a unit **r** updates its input weight vector in proportion
to the correlation of its equilibrium potential *u**(**r**,
**x**) and the difference **x** - **w**(**r**), one
arrives at the average differential equation

(4.7.15)

This learning rule is equivalent to the averaged
continuum version of Kohonen's self-organizing feature map in
Equation (4.7.2), if one views the potential distribution *u**(**r**,
**x**) as the weighting neighborhood function . Here, self-organization
will emerge if the dynamics of the potential field evolve such
that the quasi-stable equilibrium potential *u**
starts positive for all **r**, then monotonically and slowly
shrinks in diameter for positive time. This may be accomplished
through proper control of the bias field *h*, as described
below.

In general, it is difficult to solve Equation (4.7.8)
and (4.7.12) [or (4.7.15)]. However, some properties of the formation
of feature maps are revealed from these equations for the special,
but revealing, case of a one-dimensional neural field (Takeuchi
and Amari, 1979; Amari, 1980 and 1983). The dynamics of the potential
field in Equation (4.7.8) for a one-dimensional neural field was
analyzed in detail by Amari (1977b) and Kishimoto and Amari (1979)
for a step activation function and a continuous monotonically
non-decreasing activation function, respectively. It was shown
that with (*r* - *r*') as shown in Figure 4.7.1 and
*f*(*u*) = step(*u*), there exist stable equilibrium
solutions *u**(*r*)
for the **x** = 0 case. The 0-solution potential
field, *u**(*r*)
0, and the -solution field, *u**(*r*)
0, are among these stable solutions. The 0-solution is stable
if and only if *h* < 0. On the other hand,
the -solution is stable if and only if *h* > -2 *A*(),
where with *A*(*a*) as defined
below. Local excitations (also known as *a*-solutions) where
*u**(*r*) is
positive only over a finite interval [*a*1,
*a*2] of the neural
field are also possible. An *a*-solution exists if and only
if *h* + *A*(*a*) = 0. Here,
*A*(*a*) is the definite integral defined by

(4.7.16)

and plotted in Figure 4.7.3. Amari (see also Krekelberg
and Kok, 1993) also showed that a single *a*-solution can
exist for the case of a non-zero input stimulus, and that the
corresponding active region of the neural field is centered at
the unit *r* receiving the maximum input. Furthermore, the
width, *a*, of this active region is a monotonically decreasing
function of the bias field *h*. Thus, one may exploit the
fact that the field potential/neighborhood function *u**(*r*, **x**)
is controlled by the bias field *h* in order to control the
convergence of the self-organizing process. Here, the uniform
bias field *h* is started at a positive value *h* > -2*A*()
and is slowly decreased towards negative values. This, in turn,
causes *u** to start
at the -solution, then gradually move through *a*-solutions
with decreasing width *a* until, ultimately, *u**
becomes the 0-solution. For further analysis of the self-organizing
process in a neural field, the reader is referred to Zhang (1991).

Kohonen (1993a and 1993b) proposed a self-organizing
map model for which he gives physiological justification. The
model is similar to Amari's self-organizing neural field except
that it uses a discrete two-dimensional array of units. The model
assumes sharp self-on off-surround lateral interconnections so
that the neural activity of the map is stabilized where the unit
receiving the maximum excitation becomes active and all other
units are inactive. Kohonen's model employs unit potential dynamics
similar to those of Equation (4.7.8). On the other hand, Kohonen
uses a learning equation more complex than those in Equations
(4.7.12) and (4.7.15). This equation is given for the *i*th
unit weight vector by the pseudo-Hebbian learning rule

(4.7.17)

where is a positive constant. The term
in Equation (4.7.17) models a natural "transient" neighborhood
function. It represents a weighted-sum of the output activities
*y**l* of nearby
units, which describes the strength of the diffuse chemical effect
of cell *l* on cell *i*; *h**il*
is a function of the distance of these units. This weighted-sum
of output activities replaces the output activity of the same
cell, *y**i*,
in Hebb's learning rule. On the other hand, the
term acts as a stabilizing term which models "forgetting"
effects, or disturbance due to adjacent synapses. Typically,
forgetting effects in *w**i*
are proportional to the weight *w**i*
itself. In addition, if the disturbance caused by synaptic site
*r* is mediated through the post synaptic potential, the
forgetting effect must further be proportional to *w**irxr*.
This phenomenon is modeled by in the
forgetting term. Here, the summation is taken over a subset of
the synapses of unit *i* that are located near the *j*th
synapse *w**ij*,
and approximately act as one collectively interacting set. The
major difference between the learning rules in Equation (4.7.17)
and (4.7.12) is that, in the former, the "neighborhood"
function is determined by a "transient" activity due
to a diffusive chemical effect of nearby cell potentials, whereas
it is determined by a stable region of neural field potential
in the latter.

Under the assumption that the index *r* ranges
over all components of the input signal **x** and regarding
as a positive scalar independent of **w**
and **x**, the vector form of Equation (4.7.18) takes the Riccati
differential equation form

(4.7.18)

where = > 0. Now, multiplying
both sides of Equation (4.7.18) by 2**w**T
leads to the differential equation

(4.7.19)

Thus, for arbitrary **x** with **w**T**x** > 0,
Equation (4.7.19) converges to . On the other hand, the solution
for the direction of **w***
cannot be determined in closed form from the deterministic differential
Equation (4.7.18). However, a solution for the expected value
of **w** may be found if Equation (4.7.18) is treated as a
stochastic differential equation with strong mixing in accordance
to the discussion of Section 4.3. Taking the expected value of
both sides of Equation (4.7.18) and solving for its equilibrium
points (by setting ) gives

(4.7.20)

Furthermore, this equilibrium point can be shown
to be stable (Kohonen, 1989).

From the above analysis, it can be concluded that
the synaptic weight vector **w** is automatically normalized
to the length , independent of the input signal **x**, and
that **w** rotates such that its average direction is aligned
with the mean of **x**. This is the expected result of a self-organizing
map when a uniform nondecreasing neighborhood function is used.
In general, though, the neighborhood term is nonuniform and
time varying, which makes the analysis of Equation (4.7.17) much
more difficult.

Goto [4.0][4.1] [4.2] [4.3] [4.4] [4.5] [4.6] [4.8] [4.9] [4.10]