As compared to the above models, a number of more
sophisticated DAMs have been proposed in the literature. Some
of these DAMs are improved variations of the ones discussed above.
Others, though, are substantially different models with interesting
behavior. The following is a sample of such DAMs [for a larger
sample of DAM models and a thorough analysis, the reader is referred
to Hassoun (1993)].

**7.4.1 Brain-State-in-a-Box (BSB) DAM**

The "brain-state-in-a-box" (BSB) model
( Anderson et al., 1977is one of the earliest DAM models. It
is a discrete-time continuous-state parallel updated DAM whose
dynamics are given by

(7.4.1)

where the input key is presented as the initial state
**x**(0) of the DAM. Here, **x**(*k*), with 0 1,
is a decay term of the state **x**(*k*) and is a positive
constant which represents feedback gain. The vector = [*I*1
*I*2 ... *I**n*]T
represents a scaled external input (bias) to the system, which
persists for all time *k*. Some particular choices for
are = 0 (i.e., no external bias) or = .
The operation *F*() is a piece-wise linear operator which
maps the *i*th component of its argument vector according
to:

(7.4.2)

The BSB model gets its name from the fact that the
state of the system is continuous and constrained to be in the
hypercube [-1, +1]*n*.

When operated as a DAM, the BSB model typically
employs an interconnection matrix **W** given by the correlation
recording recipe to store a set of *m* *n*-dimensional
bipolar binary vectors as attractors (located at corners of the
hypercube [-1, +1]*n*).
Here, one normally sets = 0 and assumes the input
to the DAM (i.e., **x**(0)) to be a noisy vector which may
be anywhere in the hypercube [-1, +1]*n*.
The performance of this DAM with random stored vectors, large
*n*, and *m* << *n* has been studied
through numerical simulations by Anderson (1993). These simulations
particularly address the effects of model parameters and on
memory retrieval.

The stability of the BSB model in Equation (7.4.1)
with symmetric **W**, = 0, and = 1 has
been analyzed by several researchers including Golden (1986),
Greenberg (1988), Hui and ak (1992), and Anderson (1993). In
this case, this model reduces to

(7.4.3)

Golden (1986, 1993) analyzed the dynamics of the
system in Equation (7.4.3) and found that it behaves as a gradient
descent system that minimizes the energy

(7.4.4)

He also proved that the dynamics in Equation (7.4.3)
always converges to a local minimum of *E*(**x**) if **W**
is symmetric and min 0
(i.e., **W** is positive semidefinite) or , where min
is the smallest eigenvalue of **W**. With these conditions,
the stable equilibria of this model are restricted to the surface
and/or vertices of the hypercube. It is interesting to note here,
that when this BSB DAM employs correlation recording (with preserved
diagonal of **W**), it always converges to a minimum of *E*(**x**)
because of the positive semidefinite symmetric nature of the autocorrelation
matrix. The following example illustrates the dynamics for a
two-state zero-diagonal correlation-recorded BSB DAM.

**Example 7.4.1:** Consider
the problem of designing a simple BSB DAM which is capable fo
storing the memory vector **x** = [+1 -1]T.
One possible way of recording this DAM with **x** is to employ
the normalized correlation recording recipe of Equation (7.1.6).
This recording results in the symmetric weight matrix

after forcing the diagonal elements to zero. This
matrix has the two eigenvalues min = -0.5
and max = 0.5.
The energy function for this DAM is given by Equation (7.4.4)
and is plotted in Figure 7.4.1. The figure shows two minima of
equal energy at the state [+1 -1]T
and its complement state [-1 +1]T,
and two maxima of equal energy at [+1 +1]T
and [-1 -1]T.
Simulations using the BSB dynamics of Equation (7.4.3) are shown
in Figure 7.4.2 for a number of initial states **x**(0). Using
= 1 and = 0.3 resulted in convergence to
one of the two minima of *E*(**x**), as depicted in Figure
7.4.3a and 7.4.3b, respectively. The basins of attraction of
these stable states are equal in size and are separated by the
line *x*2 = *x*1.
Note that the values of used here satisfy the condition . The
effects of violating this condition on the stability of the DAM
are shown in Figure 7.4.3, where was set equal to five. The
figure depicts a limit cycle or an oscillation between the two
states of maximum energy, [-1 -1]T
and [+1 +1]T.
This limit cycle was generated by starting from **x**(0) = [0.9 0.7]T.
Starting from **x**(0) = [0.9 0.6]T
leads to convergence to the desired state [+1 -1]T
as depicted by the lower state trajectory in Figure 7.4.3. It
is interesting to note how this state was reached by bouncing
back and forth off the boundaries of the state space, .

Figure 7.4.1. A plot of the energy function *E*(**x**)
for the BSB DAM of Example 7.4.1. There are two minima with energy
*E* = -0.5
at states [+1 -1]T
and [-1 +1]T,
and two maxima with energy *E* = 0.5 at [+1 +1]T
and [-1 -1]T.

Figure 7.4.2. State space trajectories of a two-state
BSB DAM, which employs a zero-diagonal autocorrelation weight
matrix to store the memory vector . The resulting weight matrix
is symmetric with min = -0.5
and . (a) = 1, and (b) = 0.3. Circles
indicate state transitions. The lines are used as guides to the
eye.

Figure 7.4.3. State space trajectories of the BSB
DAM of Figure 7.4.2, but with = 5. The limit cycle
(top trajectory) was obtained by starting from **x**(0) = [0.9 0.7]T.
The converging dynamics (bottom trajectory) was obtained by starting
from **x**(0) = [0.9 0.6]T.

Greenberg (1988) showed the following interesting
BSB DAM property. He showed that all vertices of a BSB DAM are
attractors (asymptotically stable equilibria) if

, *i* = 1, 2, ..., *n*
(7.4.5)

Equation (7.4.5) defines what is referred to as a
"strongly" row diagonal dominant matrix **W**. As
an example, it is noted that the BSB DAM with **W** = **I**
has its vertices as attractors. For associative memories, though,
it is not desired to have all vertices (2*n*
of them) of the hypercube as attractors. Therefore, a row diagonally
dominant weight matrix is to be avoided (recall that the interconnection
matrix in a DAM is usually treated by forcing its diagonal to
zero).

A more general result concerning the stability of
the verticies of the BSB model in Equation (7.4.1), with = 1
and = , was reported by Hui and ak (1992, 1993). They
showed that if for *i* = 1, 2, ..., *n*,
then all vertices of the bipolar hypercube are asymptotically
stable equilibrium points. Here, **W** need not be symmetric
and *Ii* is an arbitrary constant bias input to the *i*th
unit. Hui and ak also showed that, if **W** is symmetric,
a hypercube vertex **x***
which satisfies the condition

(7.4.6)

is a stable equilibrium. Here, []*i*
signifies the *i*th component of the argument vector. Equation
(7.4.6) is particularly useful in characterizing the capacity
of a zero-diagonal correlation-recorded BSB DAM where *m*
unbiased and independent random vectors **x***k* {-1, +1}*n*
are stored. Let **x***h*
be one of these vectors and substitute it in Equation (7.4.6).
Assuming the DAM is receiving no bias ( = **0**),
the inequality in Equation (7.4.6) becomes

(7.4.7)

or

(7.4.8)

since and > 0. Thus, the vertex **x***h*
is an attractor if

(7.4.9)

or, equivalently

(7.4.10)

where the term inside the parenthesis is the cross-talk
term. In section 7.2.1, it was determined that the probability
of the *n* inequalities in Equation (7.4.10) to be more than
99 percent correct for all *m* memories approaches 1, in
the limit of large *n*, if . Hence, it is concluded that
the absolute capacity of the BSB DAM for storing random bipolar
binary vectors is identical to that of the discrete Hopfield DAM
when correlation recording is used with zero self-coupling (i.e.,
*w**ii* = 0
for all *i*). In fact, the present capacity result is stronger
than the absolute capacity result of Section 7.2.1; when is smaller
than the condition of Equation (7.4.6) is satisfied and, therefore,
all **x***k* vectors
are stable equilibria.

**7.4.2 Non-Monotonic Activations DAM**

First, consider the zero-diagonal correlation-recorded
discrete Hopfield DAM discussed earlier in this chapter. The
retrieval dynamics of this DAM show some strange dynamical behavior.
When initialized with a vector **x**(0) that has an overlap
*p* with one of the stored random memories, say **x**1,
the DAM state **x**(*k*) initially evolves towards **x**1
but does not always converge or stay close to **x**1
as shown in Figure 7.4.4. It has been shown (Amari and Maginu,
1988) that the overlap , when started with *p*(0) less than
some critical value *p**c*,
initially increases but soon starts to decrease and ultimately
stabilizes at a value less than 1. In this case, the DAM converges
to a spurious memory as depicted schematically by the trajectory
on the right in Figure 7.4.4. The value of *p**c*
increases (from zero) monotonically with the pattern ratio and
increases sharply from about 0.5 to 1 as becomes larger than
0.15, the DAM's relative capacity (note that *p**c*
can also be written as where is the radius of attraction of
a fundamental memory as in Section 7.2.1). This peculiar phenomenon
can be explained by first noting the effects of the overlaps *p*(*k*)
and , on the *i*th unit weighted-sum *u**i*(*k*)
given by

(7.4.11)

or, when written in terms of *p*(*k*) and
*q**h*(*k*),

(7.4.12)

Note the effects of the overlap terms *q**h*(*k*)
on the value of *u**i*(*k*).
The higher the overlaps with memories other than **x**1,
the larger the value of the cross-talk term [the summation term
in Equation (7.4.12)] which, in turn, drives |*u**i*(*k*)|
to large values. Morita (1993) showed, using simulations, that
both the sum of squares of the overlaps with all stored memories
except **x**1, defined
as

(7.4.13)

and *p*2(*k*)
initially increase with *k*. Then, one of two scenarios
might occur. In the first scenario, *s*(*k*) begins
to decrease and *p*2(*k*)
continues to increase until it reaches 1; i.e., **x**(*k*)
stabilizes at **x**1.
Whereas in the second scenario, *s*(*k*) continues
to increase and may attain values larger than 1 while *p*2(*k*)
decreases.

Figure 7.4.4. Schematic representation of converging
trajectories in a correlation-recorded discrete Hopfield DAM.
When the distance (overlap) between **x**(0) and **x**1
is larger (smaller) than some critical value, the DAM converges
to a spurious memory (right hand side trajectory). Otherwise,
the DAM retrieves the fundamental memory **x**1.
(From *Neural Networks*, **6**, M. Morita, Associative
Memory With Nonmonotone Dynamics, pp. 115-126, Copyright 1993,
with kind permission from Pergamon Press Ltd., Headington Hill
Hall, Oxford 0X3 0BW, UK.)

The above phenomenon suggests a method for improving
DAM performance by modifying the dynamics of the Hopfield DAM
such that the state is forced to move in such a direction that
*s*(*k*) is reduced, but not *p*2(*k*).
One such method is to reduce the influence of units with large
|*u**i*| values.
Such neurons actually cause the increase in *s*(*k*).
The influence of a unit *i* with large |*u**i*|,
say |*u**i*| > > 0,
can be reduced by reversing the sign of *x**i*.
This method can be implemented using the "partial reverse"
dynamics (Morita, 1993) given by:

(7.4.14)

where > 0 and *F* and *G*
are activation functions which operate componentwise on their
vector arguments. Here, *F* is the sgn activation function
and *G* is defined by

(7.4.15)

where *u* is a component of the vector **u** = **Wx**(*k*).
The values of parameters and must be determined with care.
Empirically, = 2.7 and may be chosen. These parameters
are chosen so that the number of units which satisfy |*u**i*| >
is small when **x**(*k*) is close to any of the stored
memories provided is not too large. It should be noted that
Equation (7.4.14) does not always converge to a stable equilibrium.
Numerical simulations show that a DAM employing this partial
reverse method has several advantages over the same DAM, but with
pure sgn activations. These advantages include a smaller critical
overlap *p**c*
(i.e., wider basins of attraction for fundamental memories), faster
convergence, lower rate of convergence to spurious memories, and
error correction capability for pattern ratios up to .

Continuous Model

Consider the continuous-time DAM in Equation (7.1.21)
with **C** = = **I**, and = **0**.
Namely,

(7.4.16)

where **W** is the usual zero-diagonal normalized
autocorrelation matrix . Here, the partial reverse method described
above cannot be applied directly due to the continuous DAM dynamics.
One can still capture the essential elements of this method,
though, by designing the activation function such that it reduces
the influence of unit *i* if |*u**i*|
is very large. This can be achieved by employing the non-monotone
activation function shown in Figure 7.4.5 with the following analytical
form (Morita et al., 1990 a, b):

(7.4.17)

where , ', and are positive constants with typical
values of 50, 15, 1 and 0.5, respectively. This non-monotone
activation function operates to keep the variance of |*u**i*|
from growing too large, and hence implements a similar effect
to the one implemented by the partial reverse method. Empirical
results show that this DAM has an absolute capacity that is proportional
to *n* with substantial error correction capabilities. Also,
this DAM almost never converges to spurious memories when retrieval
of a fundamental memory is not successful; instead, the DAM state
continues to wander (chaotically) without reaching any equilibrium
(Morita, 1993).

Figure 7.4.5. Non-monotonic activation function generated
from Equation (7.4.17) with , , and .

Figure 7.4.6 gives simulation-based capacity curves
(Yoshizawa et al., 1993a) depicting plots of the critical overlap
*p**c* (i.e.,
1 - 2
where is the radius of attraction of a fundamental memory) versus
pattern ratio for three DAMs with the dynamics in Equation (7.4.16).
Two of the DAMs (represented by curves A and B) employ a zero-diagonal
autocorrelation interconnection matrix and a sigmoidal activation
function. The third DAM (curve C) employs projection recording
with preserved **W** diagonal and the activation function in
Equation (7.4.17). As expected, the DAM with sigmoid activation
(curve A) loses its associative retrieval capabilities and the
ability to retain the stored memories as fixed points (designated
by the dashed portion of curve A) as approaches 0.15. On the
other hand, and with the same interconnection matrix, the non-monotone
activation DAM exhibits good associative retrieval for a wide
range of pattern ratios even when exceeds 0.5 (for example, Figure
7.4.6 (curve B) predicts a basin of attraction radius 0.22
at = 0.5. This means that proper retrieval is possible
from initial states having 22 percent or less random errors with
any one of the stored memories). It is interesting to note that
this performance exceeds that of a projection-recorded DAM with
sigmoid activation function, represented by curve C. Note, though,
that the performance of the zero-diagonal projection-recorded
discrete DAM with serial update of states (refer to Section 7.2.2
and Figure 7.2.2) has (or ) which exceeds that of the non-monotone
activations correlation-recorded DAM. Still, the demonstrated
retrieval capabilities of the non-monotone activations DAM are
impressive. The non-monotone dynamics can thus be viewed as extracting
and using intrinsic information from the autocorrelation matrix
which the "sigmoid dynamics" is not capable of utilizing.
For a theoretical treatment of the capacity and stability of
the non-monotone activations DAM, the reader is referred to Yoshizawa
et al. (1993a, b). Nishimori and Opris (1993) reported a discrete-time
discrete-state version of this model where the nonmonotonic activation
function in Figure 7.4.5 is used with , = 1.0, and
arbitrary > 0. They showed that a maximum capacity
of is possible with , and gave a complete characterization of
this model's capacity versus the parameter .

Figure 7.4.6. Simulation generated capacity/error
correction curves for the continuous-time DAM of Equation (7.4.16).
Curves A and B represent the cases of zero-diagonal correlation-recorded
DAM with sigmoidal activation function and non-monotone activation
function, respectively. Curve C is for a projection recorded
(preserved diagonal) DAM with sigmoidal activation function which
is given for comparison purposes. (From *Neural Networks*,
**6**, S. Yoshizawa et al., Capacity of Associative Memory
Using a Nonmonotonic Neuron Model, pp. 167-176, with kind permission
from Pergamon Press Ltd., Headington Hill Hall, Oxford 0X3 0BW,
UK.)

**7.4.3 Hysteretic Activations DAM**

where the activation function *f**i*
is given by:

Figure 7.4.7. Transfer characteristics of a unit
with hysteretic activation function.

In the previous section, it was noted that when
the state of a DAM is not far from a fundamental memory, degradation
of the associative retrieval process is caused by the units moving
from the right states to the wrong ones. The motivation behind
the hysteretic property is that it causes units with proper response
to preserve their current state longer, thus increasing the chance
for the DAM to correct its "wrong" states and ultimately
converge to the closest fundamental memory. But, simultaneously,
there are units moving from the wrong states to the right ones,
and hysteresis tends to prevent these transitions as well. It
has been shown (Yanai and Sawada, 1990) that, for the proper choice
of , the former process is more effective than the later and associative
retrieval can be enhanced. For small , Yanai and Sawada (1990)
showed that the absolute capacity of a zero-diagonal correlation-recorded
DAM with hysteretic units is given by the limit

(7.4.20)

This result can be derived by first noting that the
*i*th bit transition for the above DAM may be described by
Equation (7.2.2) with *x**i*(0)
replaced by (1 + )*x**i*(0),
and following a derivation similar to that in Equations (7.2.3)
through (7.2.6). By comparing Equations (7.4.20) and (7.2.6)
we find that hysteresis leads to a substantial increase in the
number of memorizable vectors as compared to using no hysteresis
at all. Yanai and Sawada also showed that the relative capacity
increases with (e.g., the relative capacity increases from 0.15
at = 0 to about 0.25 at ). This implies that the basin
of attraction of fundamental memories increase when hysteresis
is employed. Empirical results suggest that a value of slightly
higher than (with << 1) leads to the largest
basin of attraction size around fundamental memories.

Hysteresis can arise from allowing a non-zero diagonal
element *w**ii*,
*i* = 1, 2, ..., *n*. To see
this, consider a discrete DAM with a normalized autocorrelation
matrix . Note that this matrix has diagonal elements *w**ii* = ,
*i* = 1, 2, ..., *n*. Now,
the update rule for the *i*th unit is

(7.4.21)

Comparing Equation (7.4.21) to Equation (7.4.19)
of a hysteretic activation DAM reveals that the two DAMs are mathematically
equivalent if is set equal to (it is interesting to note that
is, approximately, the empirically optimal value for the hysteretic
parameter ). Therefore, it is concluded that preserving the original
diagonal in a correlation-recorded DAM is advantageous in terms
of the quality of associative retrieval, when << 1
(see Problem 7.4.8 for further explanation).

The advantages of small positive self-connections
has also been demonstrated for projection-recorded discrete DAMs.
Krauth et al. (1988) have demonstrated that using a small positive
diagonal element with the projection-recorded discrete DAM increases
the radius of attraction of fundamental memories if the DAM is
substantially loaded (i.e., is approximately greater than 0.4).
For example, they found numerically that for = 0.5,
using a diagonal term of about 0.075 instead of zero increases
the basins of attraction of fundamental memories by about 50%.
For the projection DAM, though, retaining the original diagonal
leads to relatively large values for the self-connections if the
pattern ratio is large (for *m* unbiased random memories
, we have *w**ii* ,
*i* = 1, 2, ..., *n*). This
greatly reduces the basin of attraction size for fundamental memories
as shown empirically in Section 7.2.2. All this suggests that
the best approach is to give all self-connections a small positive
value << 1.

**7.4.4 Exponential Capacity DAM**

Up to this point, the best DAM considered has a
capacity proportional to *n*, the number of units in the
DAM. However, very high capacity DAMs (e.g., *m* exponential
in *n*) can be realized if one is willing to consider more
complex memory architectures than the ones considered so far.
An exponential capacity DAM was proposed by Chiueh and Goodman
(1988, 1991). This DAM can store up to *c**n*
(*c* > 1) random vectors **x***h* {-1, +1}*n*,
*h* = 1, 2, ..., *m*, with
substantial error correction abilities. The exponential DAM is
described next.

Consider the architecture in Figure 7.4.8. This
architecture describes a two-layer dynamic autoassociative memory.
The dynamics are nonlinear due to the nonlinear operations *G*
and *F*. The output nonlinearity *F* implements sgn
activations which gives the DAM a discrete-state nature. The
matrix **X** is the matrix whose columns are the desired memory
vectors **x***h*,
*h* = 1, 2, ..., *m*. This
DAM may update its state **x**(*k*) in either serial or
parallel mode. The parallel updated dynamics are given in vector
form as:

(7.4.22)

and in component form as:

(7.4.23)

where *g* is the scalar version of the operator
*G*. Here, *g* is normally assumed to be a continuous
monotone non-decreasing function over [-*n*, +*n*].
The recording of a new memory vector **x***h*
is simply done by augmenting the matrices **X**T
and **X** with (**x***h*)T
and **x***h*,
respectively (this corresponds to allocating two new *n*-input
units, one in each layer, in the network of Figure 7.4.5. The
choice of the first layer activation functions, *g*, plays
a critical role in determining the capacity and dynamics of this
DAM. To see this, assume first a simple linear activation function
*g*() = . This assumption reduces the dynamics
in Equation (7.4.22) to

(7.4.24)

which is simply the dynamics of the correlation-recorded
discrete DAM.

On the other hand, if one chooses *g*() = (*n* + )*q*,
*q* is an integer greater than 1, a higher-order DAM results
with polynomial capacity *m* *n**q*
for large *n* (Psaltis and Park, 1986). Exponential capacity
is also possible with proper choice of *g*. Such choices
include *g*() = (*n* - )-*a*,
*a* > 1 (Dembo and Zeitouni, 1988; Sayeh and
Han, 1987) and *g*() = *a, a* > 1
(Chiueh and Goodman, 1988).

The choice *g*() = *a results in
an "exponential" DAM with capacity m* = *c**n*
and with error correction capability. Here, *c* is a function
of *a* and the radius of attraction of fundamental memories
as depicted in Figure 7.4.9. According to this figure, the exponential
DAM is capable of achieving the ultimate capacity of a binary
state DAM, namely *m* = 2*n*,
in the limit of large *a*. As one might determine intuitively,
though, this DAM has no error correction abilities at such loading
levels. For relatively small *a*, one may come up with an
approximate linear relation between *c* and . For example,
Figure 7.4.9 gives *c* 1.2 - 0.4
for *a* = . Hence, an exponential DAM with nonlinearity
can store up to (1.2 - 0.4)*n*
random memories if it is desired that all such memories have
basins of attraction of size , 0 < (here,
one pass retrieval is assumed).

Figure 7.4.9. Relation between the base constant
*c* and fundamental memories basins of attraction radius
, for various values of the nonlinearity parameter *a*, for
an exponential DAM. This DAM has a storage capacity of *c**n*.
(Adapted from T. D. Chiueh and R. M. Goodman, 1991, Recurrent
Correlation Associative Memories, *IEEE Transactions on Neural
Networks*, **2**(2), pp. 275-284, ©1991 IEEE.)

Chiueh and Goodman (1991) showed that a sufficient
condition for the dynamics in Equation (7.4.23) to be stable in
both serial and parallel update modes is that the activation function
*g*() be continuous and monotone non-decreasing over [-*n*, +*n*].
This condition can be easily shown to be true for all choices
of *g*() indicated above.

A Heteroassociative DAM (HDAM) is shown in the block
diagram of Figure 7.4.10 (Okajima et al., 1987). It consists
of two processing paths which form a closed loop. The first processing
path computes a vector ** **from an input **x** {-1, +1}*n*
according to the parallel update rule

(7.4.26)

or its serial (asynchronous) version where one and
only one unit updates its state at a given time. Here, *F*
is usually the sgn activation operator. Similarly, the second
processing path computes a vector **x** according to

(7.4.27)

or its serial version. The vector **y** in Equation
(7.4.27) is the same vector generated by Equation (7.4.26). Also,
because of the feedback employed, **x** in Equation (7.4.26)
is given by Equation (7.4.27).

The HDAM can be operated in either parallel or serial
retrieval modes. In the parallel mode, the HDAM starts from an
initial state **x**(0) [**y**(0)], computes its state **y
**(**x**) according to Equation (7.4.26) [Equation (7.4.27)],
and then updates state **x** (**y**) according to Equation
(7.4.27) [Equation (7.4.26)]. This process is iterated until
convergence; i.e., until state **x** (or equivalently **y**)
ceases to change. On the other hand, in the serial update mode,
only one randomly chosen component of the state **x** or **y**
is updated at a given time.

Various methods have been proposed for storing a
set of heteroassociations {**x***k*, **y***k*},
*k* = 1, 2, ..., *m* in the
HDAM. In most of these methods, the interconnection matrices
**W**1 and **W**2
are computed independently by requiring that all one-pass associations
**x***k* **y***k*
and **y***k* **x***k*,
respectively, are perfectly stored. Here, it is assumed that
the set of associations to be stored forms a one-to-one mapping;
otherwise, perfect storage becomes impossible. Examples of such
HDAM recording methods include the use of projection recording
(Hassoun, 1989 a, b) and Householder transformation-based recording
(Leung and Cheung, 1991). These methods require the linear independence
of the vectors **x***k*
(also **y***k*)
for which a capacity of *m* = min(*n*, *L*)
is achievable. One draw back of these techniques, though, is
that they do not guarantee the stability of the HDAM; i.e., convergence
to spurious cycles is possible. Empirical results show (Hassoun,
1989b) that parallel updated projection-recorded HDAMs exhibit
significant oscillatory behavior only at memory loading levels
close to the HDAM capacity.

Kosko (1987, 1988), independently, proposed a heteroassociative
memory with the architecture of the HDAM, but with the restriction
. This memory is known as a bidirectional associative memory
(BAM). The interesting feature of a BAM is that it is stable
for any choice of the real-valued interconnection matrix **W**
and for both serial and parallel retrieval modes. This can be
shown by starting from the BAM's bounded Liapunov (energy) function

(7.4.28)

and showing that each serial or parallel state update
decrease *E* (see Problem 7.4.14). One can also prove BAM
stability by noting that a BAM can be converted to a discrete
autoassociative DAM (discrete Hopfield DAM) with state vector
**x**' = [**x**T **y**T]T
and interconnection matrix **W**' given by

(7.4.29)

Now, since **W**' is a symmetric zero-diagonal
matrix, the autoassociative DAM is stable if serial update is
assumed as was discussed in Section 7.1.2 (also, see Problem 7.1.18).
Therefore, the serially updated BAM is stable. One may also
use this equivalence property to show the stability of the parallel
updated BAM (note that a parallel updated BAM is not equivalent
to the (nonstable) parallel updated discrete Hopfield DAM. This
is because either states **x** or **y**, but not both, are
updated in parallel at each step.)

From above, it can be concluded that the BAM always
converges to a local minimum of its energy function defined in
Equation (7.4.28). It can be shown (Wang et al., 1991) that these
local minima include all those that correspond to associations
{**x***k*, **y***k*}
which are successfully loaded into the BAM (i.e., associations
which are equilibria of the BAM dynamics.)

The most simple storage recipe for storing the associations
as BAM equilibrium points is the correlation recording recipe
of Equation (7.1.2). This recipe guarantees the BAM requirement
that the forward path and backward path interconnection matrices
**W**1 and **W**2
are the transpose of each other, since

(7.4.30)

and

(7.4.31)

However, some serious drawbacks of using the correlation
recording recipe are low capacity and poor associative retrievals;
when *m* random associations are stored in a correlation-recorded
BAM, the condition *m* << min(*n*, *L*)
must be satisfied if good associative performance is desired (Hassoun,
1989b; Simpson, 1990). Heuristics for improving the performance
of correlation-recorded BAMs can be found in Wang et al. (1990).

Before leaving this section, it should be noted
that the above models of associative memories are by no means
exclusive. A number of other interesting models have been reported
in the literature (interested readers may find the volume edited
by Hassoun (1993) useful in this regard.) Some of these models
are particularly interesting because of connections to biological
memories (e.g., Kanerva, 1988 and 1993; Alkon et al., 1993).