In this section, the capacity and retrieval characteristics
of the autoassociative DAMs introduced in the previous section
are analyzed. Correlation-recorded DAMs are considered first,
followed by projection-recorded DAMs.

It has been shown (Amari, 1977a; Hopfield, 1982;
Amit et al., 1985) that if most of the memories in a correlation-recorded
discrete Hopfield DAM, with *w**ii* = 0,
are to be remembered approximately (i.e., nonperfect retrieval
is allowed), then must not exceed 0.15. This value is the relative
capacity of the DAM. Another result on the capacity of this DAM
for the case of error-free memory recall by one-pass parallel
convergence is (in probability) given by the absolute capacity
(Weisbuch and Fogelman-Soulié, 1985; McEliece et al., 1987;
Amari and Maginu, 1988; Newman, 1988), expressed as the limit

(7.2.1)

Equation (7.2.1) indicates that the absolute capacity
approaches zero as *n* approaches infinity! Thus, the correlation-recorded
discrete Hopfield net is an inefficient DAM model. The absolute
capacity result in Equation (7.2.1) is derived below.

Assuming **y***k* = **x***k*
(autoassociative case) in Equation (7.1.7) and *wii* = 0,
*i* = 1, 2, ..., *n*, then
by direct recall from initial input **x**(0) = **x***h*
with **x***h* {**x***k*},
the *i*th bit of the retrieved state **x**(1) is given
by

(7.2.2)

Consider the quantity *C**i*(0) = -*xi*(0)*i*(0).
Now, if *C**i*(0)
is negative then *x**i*(0)
and *i*(0) have the
same sign and the cross-talk term *i*(0)
does no harm. On the other hand, if *C**i*(0)
is positive and larger than 1, then the one-pass retrieved bit
*x**i*(1) is
in error. Next, assume that the stored memories are random, with
equal probability for and for , independently for each *k*
and *i*. Hence, for large *m* and *n*, the *C**i*(0)
term is approximately distributed according to a normal distribution
N(, 2) = N(0, ).
To see this, we first note that , and equivalently *C**i*(0),
is times the sum of (*n* - 1)(*m* - 1)
independent, and uniformly distributed random bipolar binary numbers,
and thus it has a binomial distribution with zero mean and variance
. Thus, by virtue of the Central Limit Theorem, *C**i*(0)
approaches N(0, ) asymptotically as *m* and *n*
become large (Mosteller et al., 1970). Therefore, we may compute
the probability that *x**i*(1)
is in error, *P*error = Prob(*C**i*(0) > 1),
by integrating N(0, ) from 1 to , giving

(7.2.3)

where is the error function. Note that Equation
(7.2.3) is a special case of Equation (7.1.8) where *D*in
in Equation (7.1.8) is set to zero. Now, using the fact that
for a random variable *x* distributed according to N(0, 2),
, and if the *i*th bit *x**i*(1)
in Equation (7.2.2) is required to be retrievable with error probability
less than 0.0014, then the condition 3 < 1 or must
be satisfied. Therefore, *m* < 0.111*n*
is required for *P*error
0.0014. Similarly, Equation (7.2.3) can be solved for the requirement
*P*error 0.005
which gives .

On the other hand, if all memories **x***k*
are required to be equilibria of the DAM with a probability close
to 1, say 0.99, then an upper bound on can be derived by requiring
that all bits of all memories **x***k*
be retrievable with less than a 1 percent error; i.e., (1 - *P*error)*mn* > 0.99.
Employing the binomial expansion, this inequality may be approximated
as *P*error < .
Noting that this stringent error correction requirement necessitates
small values, Equation (7.2.3) can be approximated using the
asymptotic (*x* ) expansion . This approximation
can then be used to write the inequality *P*error <
as

(7.2.4)

Now, taking the natural logarithm (ln) of both sides
of the inequality in Equation (7.2.4) and eliminating all constants
and the ln factor (since they are dominated by higher-order
terms) results in the bound

(7.2.5)

By noting that ln *mn* < ln *n*2
(since *n* > *m*), a more stringent requirement
on in Equation (7.2.5) becomes

(7.2.6)

which represents the absolute capacity of a zero-diagonal
correlation-recorded DAM. The effects of a non-zero diagonal
on DAM capacity are treated in Section 7.4.3.

Another, more useful, DAM capacity measure gives
a bound on in terms of error correction and memory size (Weisbuch
and Fogelman-Soulié, 1985; McEliece et al., 1987). According
to this capacity measure, a correlation-recorded discrete Hopfield
DAM must have its pattern ration, , satisfy

(7.2.7)

in order that error-free one-pass retrieval of a
fundamental memory (say **x***k*)
from random key patterns lying inside the Hamming hypersphere
(centered at **x***k*)
of radius *n *( < ) is achieved with probability
approaching 1. Here, defines the radius of attraction of a fundamental
memory. In other words, is the largest normalized Hamming distance
from a fundamental memory within which almost all of the initial
states reach this fundamental memory, in one-pass. The inequality
in Equation (7.2.7) can be derived by starting with an equation
similar to Equation (7.2.2) with one difference that the input
**x**(0) is not one of the stored memories. Rather, a random
input **x**(0) is assumed that has an overlap with one of
the stored memories, say **x***h*.
Here, one can readily show that the *i*th retrieved bit
*x**i*(1) is
in error if and only if *C**i*(0) > 1 - 2.
The error probability for the *i*th bit is then given by
Equation (7.2.3) with the lower limit on the integral replaced
by 1 - 2.
This leads to which, by using a similar derivation to the one
leading to Equation (7.2.6), leads us to Equation (7.2.7).

The capacity analysis leading to Equations (7.2.6)
and (7.2.7) assumed a single parallel retrieval iteration, starting
from **x**(0) and retrieving **x**(1). This same analysis
can not be applied if one starts the DAM at **x**(*k*),
*k* =1, 2, ...; i.e., Equations (7.2.6) and
(7.2.7) are not valid for the second or higher DAM iterations.
In this case, the analysis is more complicated due to the fact
that **x**(*k*) becomes correlated with the stored memory
vectors, and hence the statistical properties of the noise term
*i*(*k*) in
Equation (7.2.2) are more difficult to determine (in fact, such
correlations depend on the whole history of **x**(*k* - *T*),
*T* = 0, 1, 2, ...). Amari and
Maginu (1988) (see also Amari and Yanai, 1993) analyzed the transient
dynamical behavior of memory recall under the assumption of a
normally distributed *i*(*k*)
[or *C**i*(*k*)]
with mean zero and variance 2(*k*).
This variance was calculated by taking the direct correlations
up to two steps between the bits of the stored memories and those
in **x**(*k*). Under these assumptions, the relative
capacity was found to be equal to 0.16. This theoretical value
is in good agreement with early simulations reported by Hopfield
(1982) and with the theoretical value of 0.14 reported by Amit
et al. (1985) using a method known as the replica method.

Komlós and Paturi (1988) showed that if Equation
(7.2.6) is satisfied, then the DAM is capable of error correction
when multiple-pass retrievals are considered. In other words,
they showed that each of the fundamental memories is an attractor
with a basin of attraction surrounding it. They also showed that
once initialized inside one of these basins of attraction, the
state converges (in probability) to the basin's attractor in order
ln(ln *n*) parallel steps. Burshtien (1993) took these
results a step further by showing that the radius of the basin
of attraction for each fundamental memory is , independent of
the retrieval mode (serial or parallel). He also showed that
a relatively small number of parallel iterations, asymptotically
independent of *n*, is required to recover a fundamental
memory even when is very close to (e.g., for = 0.499,
at most 20 iterations are required).

When initialized with a key input **x**(0) lying
outside the basins of attraction of fundamental memories, the
discrete Hopfield DAM converges to one of an exponential (in *n*)
number of spurious memories. These memories are linear combinations
of fundamental memories (Amit et al., 1985; Gardner, 1986; Komlos
and Paturi, 1988). Thus, a large number of undesirable spurious
states which compete with fundamental memories for basins of attraction
"volumes" are intrinsic to correlation-recorded discrete
Hopfield DAMs.

Next, the capacity and retrieval characteristics
of two analog (continuous-state) correlation-recorded DAMs (one
continuous-time and the other discrete-time parallel updated)
based on models introduced in the previous section are analyzed.
The first analog DAM considered here will be referred to as the
continuous-time DAM in the remainder of this section. It is obtained
from Equation (7.1.21) by setting **C** = = **I**
and = **0**. Its interconnection matrix **W**
is defined by the autocorrelation version of Equation (7.1.6)
with zero-diagonal. The second analog DAM is obtained from Equation
(7.1.29) with = **0**. It employs the normalized
correlation recording recipe for **W** with zero-diagonal as
for the continuous-time DAM. This later analog DAM will be referred
to as the discrete-time DAM in this section. Both DAMs employ
the tangent hyperbolic activation function with gain .

The dynamics of these two DAMs have been studied
in terms of the gain and the pattern ratio for unbiased random
bipolar stored memories (Amit et al., 1985, 1987; Marcus et al.,
1990; Shiino and Fukai, 1990; Kühn et al., 1991; Waugh et
al., 1993). Figure 7.2.1 shows two analytically derived phase
diagrams for the continuous-time and discrete-time DAMs, valid
in the limit of large *n* (Marcus et al., 1990; Waugh et
al., 1993). These diagrams indicate the type of attractors as
a function of activation function gain and pattern ratio . For
the continuous-time DAM, the diagram (Figure 7.2.1a) shows three
regions labeled origin, spin glass, and recall. In the recall
region, the DAM is capable of either exact or approximate associative
retrieval of stored memories. In other words, a set of 2*m*
attractors exist each having a large overlap (inner product) with
a stored pattern or its inverse (the stability of the inverse
of a stored memory is explored in Problem 7.1.13). This region
also contains attractors corresponding to spurious states that
have negligible overlaps with the stored memories (or their inverse).
In the spin glass region (named so because of the similarity
to dynamical behavior of simple models of magnetic material (spin
glasses) in statistical physics), the desired memories are no
longer attractors; hence, the only attractors are spurious states.
Finally, in the origin region, the DAM has the single attractor
state **x** = **0**.

Figure 7.2.1. Phase diagrams for correlation-recorded
analog DAMs with activation function for (a) continuous-time
and (b) parallel discrete-time updating. (Adapted from
C. M.
Marcus et al., 1990, with permission of the American Physical
Society.)

The boundary separating the recall and spin glass
regions determines the relative capacity of the DAM. For the
continuous-time DAM, and in the limit of high gain ( > 10),
this boundary asymptotes at which is essentially the relative
capacity of the correlation-recorded discrete Hopfield DAM analyzed
earlier in this section. This result supports the arguments presented
at the end of Section 7.1.2 on the equivalence of the macroscopic
dynamics of the discrete Hopfield net and the continuous-time,
high-gain Hopfield net.

The phase diagram for the discrete-time DAM is identical
to the one for the continuous-time DAM except for the presence
of a fourth region marked oscillation in Figure 7.2.1 *b*.
In this region, the stability condition in Equation (7.1.32)
is violated (note that the zero-diagonal autocorrelation weight
matrix can have negative eigenvalues. See Problem 7.2.1). Associative
retrieval and spurious states may still exist in this region (especially
if is small), but the DAM can also become trapped in period-2
limit cycles (oscillations). In both DAMs, error correction capabilities
cease to exist at activation gains close to or smaller than 1
even as approaches zero.

The boundary separating the recall and spin glass
regions has been computed by methods that combine the Liapunov
function approach with the statistical mechanics of disordered
systems. The boundary between the spin glass and origin regions
in Figure 7.2.1 (a) and (b) is given by the expression

(7.2.8)

This expression can be derived by performing local
stability analysis about the equilibrium point **x*** = **0**
that defines the origin region. This method is explored in Problems
7.2.2 and 7.2.3 for the continuous-time and discrete-time DAMs,
respectively.

The associative retrieval capability of the above
analog DAMs can vary considerably even within the recall region.
It has been shown analytically and empirically that the basin
of attraction of fundamental memories increases substantially
as the activation function gain decreases with fixed pattern ration
, for values of . Waugh et al. (1991, 1993) showed that the number
of local minima (and thus spurious states) in the Liapunov function
of the above DAMs increases exponentially as exp(*ng*) where
. Here, *g* is monotonically increasing in . Therefore,
even a small decrease in can lead to substantial reduction in
the number of local minima, especially the shallow ones which
corresponds to spurious memories. The reason behind the improved
DAM performance as gain decreases is that the Liapunov function
becomes smoother, so that shallow local minima are eliminated.
Since the fundamental memories tend to lie in wide, deep basins,
essentially all of the local minima eliminated correspond to spurious
memories. This phenomenon is termed deterministic annealing and
it is reminiscent of what happens as temperature increases in
simulated annealing (the reader is referred to the next chapter
for a discussion of annealing methods in the context of neural
networks).

The capacity and performance of autoassociative
correlation-recorded DAMs can be greatly improved if projection
recording is used to store the desired memory vectors (recall
Equation (7.1.12), with **Y** = **X**). Here,
any set of memories can be memorized without errors as long as
they are linearly independent (note that linear independence restricts
*m* to be less than or equal to *n*). In particular,
projection DAMs are well suited for memorizing unbiased random
vectors **x***k* {-1, +1}*n*
since it can be shown that the probability of *m* (*m* < *n*)
of these vectors to be linearly independent approaches 1 in the
limit of large *n* (Komlós, 1967). In the following,
the retrieval properties of projection-recorded discrete-time
DAMs are analyzed. More specifically, the two versions of discrete-state
DAMs, serially updated and parallel updated, and the parallel
updated continuous-state DAM are discussed. For the remainder
of this section, these three DAMs will be referred to as the serial
binary, parallel binary, and parallel analog projection DAMs,
respectively. The following analysis assumes the usual unbiased
random bipolar binary memory vectors.

The relation between the radius of attraction of
fundamental memories and the pattern ratio is a desirable measure
of DAM retrieval/error correction characteristics. For correlation-recorded
binary DAMs, such a relation has been derived analytically for
single-pass retrieval and is given by Equation (7.2.7). On the
other hand, deriving similar relations for multiple-pass retrievals
and/or more complex recording recipes (such as projection recording)
is a much more difficult problem. In such cases, numerical simulations
with large *n* values (typically equal to several hundred)
are a viable tool (e.g., see Kanter and Sompolinsky, 1987; Amari
and Maginu, 1988). Figure 7.2.2, reported byKanter and Sompolinsky
(1987), shows plots of the radius of attraction of fundamental
memories versus pattern ration generated by numerical simulation
for serial binary and parallel binary projections DAMs (multiple-pass
retrieval is assumed).

Figure 7.2.2. Measurements of as a function of
by computer simulation for projection-recorded binary DAMs. The
lines are guides to the eye. The typical size of the statistical
fluctuations is indicated. Lines tagged by *a* refer to
a zero-diagonal projection matrix **W**. Lines tagged by *b*
refer to a standard projection matrix. Solid lines refer to serial
update with a specific order of updating as described in the text.
Dashed lines refer to parallel update. (Adapted from I. Kanter
and H. Sompolinsky, 1987, with permission of the American Physical
Society.)

There are two pairs of plots, labeled *a* and
*b*, in this figure. The pair labeled *a* corresponds
to the case of a zero-diagonal projection weight matrix **W**.
Whereas pair *b* corresponds to the case of a projection
**W** matrix with preserved diagonal. The solid and dashed
lines in Figure 7.2.2 represent serial and parallel retrievals,
respectively.

According to these simulations, forcing the self-coupling
terms *w**ii*
in the diagonal of **W** to zero has a drastic effect on the
size of the basin of attraction. Note that the error correction
capability of fundamental memories ceases as approaches and then
exceeds for both serial and parallel DAMs for the nonzero-diagonal
case (Problem 7.2.8 explores this phenomenon). On the other hand,
the corresponding DAMs with zero-diagonal projection matrix continue
to have substantial error correction capabilities even after
exceed , but ultimately lose these capabilities at . A common
feature of these discrete projection DAMs is the monotonic decrease
of from 0.5 to 0 as increases from 0 to 1. Empirical results
show that, inside the basin of attraction of stable states, the
flows to the states are fast with a maximum of 10 to 20 parallel
iterations (corresponding to starting at the edge of the basin).
These results are similar to the theoretical ones reported for
parallel updated correlation-recorded DAMs (Burshtien, 1993).

For the above parallel updated zero-diagonal projection
DAM, simulations show that in almost all cases where the retrieval
did not result in a fundamental memory, it resulted in a limit
cycle of period two as opposed to spurious memories. For the
preserved-diagonal projection DAM, simulations with finite *n*
and show that no oscillations exist (Youssef and Hassoun, 1989).
(The result of Problem 7.2.10 taken in the limit of
can serve as an analytical proof for the nonexistence of oscillations
for the case of large *n*).

The zero-diagonal serial projection DAM has the
best performance depicted by the solid line *a* in Figure
7.2.2. In this case an approximate linear relation between and
can be deduced from this figure as

(7.2.9)

Here, the serial update strategy used employs an
updating such that the initial updates are more likely to reduce
the Hamming distance (i.e., increase the overlap) between the
DAM's state and the closest fundamental memory rather than increase
it. In the simulations, the initial state **x**(0) tested
had its first *n* bits identical to one of the stored vectors,
say **x**1, while the
remaining bits were chosen randomly. Thus the region represents
the bits where errors are more likely to occur. The serial update
strategy used above allowed the units corresponding to the initially
random bits {*x**i*(0), }
to update their states before the ones having the correct match
with **x**1 (i.e.,
units corresponding to {*x**i*(0),
}). However, in practical applications, this update strategy
may not be applicable (unless we have partial input keys that
match a segment of one of the stored memories such as a partial
image) and hence, a standard serial update strategy (e.g., updating
the *n* states in some random or unbiased deterministic order)
may be employed. Such standard serial updating leads to reduced
error correction behavior compared to the particular serial update
employed in the above simulations. The performance, though, would
still be better than that of a parallel updated DAM. Spurious
memories do exist in the above projection DAMs. These spurious
states are mixtures of the fundamental memories (just as in the
correlation-recorded discrete Hopfield DAM) at very small values.
Above 0.1, mixture states disappear. Instead, most of
the spurious states have very little overlap with individual fundamental
memories.

Lastly, consider the parallel analog projection
DAM with zero-diagonal interconnection matrix **W**. This
DAM has the phase diagram shown in Figure 7.2.3 showing origin,
recall, and oscillation phases, but no spin glass phase (Marcus
et al., 1990). The absence of the spin glass phase does not imply
that this DAM does not have spurious memories; just as for the
correlation-recorded discrete Hopfield DAM, there are many spurious
memories within the recall and oscillation regions which have
small overlap with fundamental memories, especially for large
. However, there is no region where only spurious memories exist.
Note also that in the oscillation region, all fundamental memories
exist as stable equilibria states with basins of attraction defined
around each of them. The radius of this basin decreases (monotonically)
as increases until ultimately all such memories lose their basins
of attraction.

Figure 7.2.3. Phase diagram for the parallel updated
analog projection DAM (with zero-diagonal **W**). (Adapted
from C. M. Marcus et al., 1990, with permission of the American
Physical Society.)

According to Equation (7.1.32), oscillations are
possible in the dynamics of the present analog DAM when , where
min is the minimal eigenvalue
of the interconnection matrix **W**. It can be shown (Kanter
and Sompolinsky, 1987) that a zero-diagonal projection matrix
which stores *m* unbiased random memory vectors **x***k* {-1, +1}*n*
has the extremal eigenvalues and . Therefore, the oscillation
region in the phase diagram of Figure 7.2.3 is defined by . Also,
by following an analysis similar to the one outlined in Problem
7.2.3, it can be shown that the origin point loses its stability
when for and for > 0.5. With these expressions,
it can be easily seen that oscillation-free associative retrieval
is possible up to if the gain is equal to 2. Adding a positive
diagonal element to **W** shifts the extremal eigenvalues
min and max
to and , respectively, and thus increases the value of for oscillation-free
associative retrieval of fundamental memories to a maximum value
of , which exists at = 2. The recall regions for several
values of are shown in Figure 7.2.4. Here, one should note that
the increase in the size of the recall region does not necessarily
imply increased error correction. On the contrary, a large diagonal
term greatly reduces the size of the basins of attraction of fundamental
memories as was seen earlier for the binary projection DAM. The
reader is referred to Section 7.4.3 for further exploration into
the effects of the diagonal term on DAM performance.

Figure 7.2.4. The recall region of a parallel updated
analog projection DAM for various values of diagonal element .
(Adapted from C. M. Marcus et al., 1990, with permission of the
American Physical Society.)