7.4 Other DAM Models

As compared to the above models, a number of more sophisticated DAMs have been proposed in the literature. Some of these DAMs are improved variations of the ones discussed above. Others, though, are substantially different models with interesting behavior. The following is a sample of such DAMs [for a larger sample of DAM models and a thorough analysis, the reader is referred to Hassoun (1993)].

7.4.1 Brain-State-in-a-Box (BSB) DAM

The "brain-state-in-a-box" (BSB) model ( Anderson et al., 1977is one of the earliest DAM models. It is a discrete-time continuous-state parallel updated DAM whose dynamics are given by

(7.4.1)

where the input key is presented as the initial state x(0) of the DAM. Here, x(k), with 0   1, is a decay term of the state x(k) and is a positive constant which represents feedback gain. The vector  = [I1 I2 ... In]T represents a scaled external input (bias) to the system, which persists for all time k. Some particular choices for are  = 0 (i.e., no external bias) or  = . The operation F() is a piece-wise linear operator which maps the ith component of its argument vector according to:

(7.4.2)

The BSB model gets its name from the fact that the state of the system is continuous and constrained to be in the hypercube [-1, +1]n.

When operated as a DAM, the BSB model typically employs an interconnection matrix W given by the correlation recording recipe to store a set of m n-dimensional bipolar binary vectors as attractors (located at corners of the hypercube [-1, +1]n). Here, one normally sets  = 0 and assumes the input to the DAM (i.e., x(0)) to be a noisy vector which may be anywhere in the hypercube [-1, +1]n. The performance of this DAM with random stored vectors, large n, and m << n has been studied through numerical simulations by Anderson (1993). These simulations particularly address the effects of model parameters and on memory retrieval.

The stability of the BSB model in Equation (7.4.1) with symmetric W,  = 0, and  = 1 has been analyzed by several researchers including Golden (1986), Greenberg (1988), Hui and ak (1992), and Anderson (1993). In this case, this model reduces to

(7.4.3)

Golden (1986, 1993) analyzed the dynamics of the system in Equation (7.4.3) and found that it behaves as a gradient descent system that minimizes the energy

(7.4.4)

He also proved that the dynamics in Equation (7.4.3) always converges to a local minimum of E(x) if W is symmetric and min  0 (i.e., W is positive semidefinite) or , where min is the smallest eigenvalue of W. With these conditions, the stable equilibria of this model are restricted to the surface and/or vertices of the hypercube. It is interesting to note here, that when this BSB DAM employs correlation recording (with preserved diagonal of W), it always converges to a minimum of E(x) because of the positive semidefinite symmetric nature of the autocorrelation matrix. The following example illustrates the dynamics for a two-state zero-diagonal correlation-recorded BSB DAM.

Example 7.4.1: Consider the problem of designing a simple BSB DAM which is capable fo storing the memory vector x = [+1  -1]T. One possible way of recording this DAM with x is to employ the normalized correlation recording recipe of Equation (7.1.6). This recording results in the symmetric weight matrix


after forcing the diagonal elements to zero. This matrix has the two eigenvalues min = -0.5 and max = 0.5. The energy function for this DAM is given by Equation (7.4.4) and is plotted in Figure 7.4.1. The figure shows two minima of equal energy at the state [+1  -1]T and its complement state [-1  +1]T, and two maxima of equal energy at [+1  +1]T and [-1  -1]T. Simulations using the BSB dynamics of Equation (7.4.3) are shown in Figure 7.4.2 for a number of initial states x(0). Using  = 1 and  = 0.3 resulted in convergence to one of the two minima of E(x), as depicted in Figure 7.4.3a and 7.4.3b, respectively. The basins of attraction of these stable states are equal in size and are separated by the line x2 = x1. Note that the values of used here satisfy the condition . The effects of violating this condition on the stability of the DAM are shown in Figure 7.4.3, where was set equal to five. The figure depicts a limit cycle or an oscillation between the two states of maximum energy, [-1  -1]T and [+1  +1]T. This limit cycle was generated by starting from x(0) = [0.9  0.7]T. Starting from x(0) = [0.9  0.6]T leads to convergence to the desired state [+1  -1]T as depicted by the lower state trajectory in Figure 7.4.3. It is interesting to note how this state was reached by bouncing back and forth off the boundaries of the state space, .












Figure 7.4.1. A plot of the energy function E(x) for the BSB DAM of Example 7.4.1. There are two minima with energy E = -0.5 at states [+1  -1]T and [-1  +1]T, and two maxima with energy E = 0.5 at [+1  +1]T and [-1  -1]T.








(a)









(b)

Figure 7.4.2. State space trajectories of a two-state BSB DAM, which employs a zero-diagonal autocorrelation weight matrix to store the memory vector . The resulting weight matrix is symmetric with min = -0.5 and . (a)  = 1, and (b)  = 0.3. Circles indicate state transitions. The lines are used as guides to the eye.












Figure 7.4.3. State space trajectories of the BSB DAM of Figure 7.4.2, but with  = 5. The limit cycle (top trajectory) was obtained by starting from x(0) = [0.9  0.7]T. The converging dynamics (bottom trajectory) was obtained by starting from x(0) = [0.9  0.6]T.

Greenberg (1988) showed the following interesting BSB DAM property. He showed that all vertices of a BSB DAM are attractors (asymptotically stable equilibria) if

,          i = 1, 2, ..., n (7.4.5)

Equation (7.4.5) defines what is referred to as a "strongly" row diagonal dominant matrix W. As an example, it is noted that the BSB DAM with W = I has its vertices as attractors. For associative memories, though, it is not desired to have all vertices (2n of them) of the hypercube as attractors. Therefore, a row diagonally dominant weight matrix is to be avoided (recall that the interconnection matrix in a DAM is usually treated by forcing its diagonal to zero).

A more general result concerning the stability of the verticies of the BSB model in Equation (7.4.1), with  = 1 and  = , was reported by Hui and ak (1992, 1993). They showed that if for i = 1, 2, ..., n, then all vertices of the bipolar hypercube are asymptotically stable equilibrium points. Here, W need not be symmetric and Ii is an arbitrary constant bias input to the ith unit. Hui and ak also showed that, if W is symmetric, a hypercube vertex x* which satisfies the condition

(7.4.6)

is a stable equilibrium. Here, []i signifies the ith component of the argument vector. Equation (7.4.6) is particularly useful in characterizing the capacity of a zero-diagonal correlation-recorded BSB DAM where m unbiased and independent random vectors xk  {-1, +1}n are stored. Let xh be one of these vectors and substitute it in Equation (7.4.6). Assuming the DAM is receiving no bias ( = 0), the inequality in Equation (7.4.6) becomes

(7.4.7)

or

(7.4.8)

since and  > 0. Thus, the vertex xh is an attractor if

(7.4.9)

or, equivalently

(7.4.10)

where the term inside the parenthesis is the cross-talk term. In section 7.2.1, it was determined that the probability of the n inequalities in Equation (7.4.10) to be more than 99 percent correct for all m memories approaches 1, in the limit of large n, if . Hence, it is concluded that the absolute capacity of the BSB DAM for storing random bipolar binary vectors is identical to that of the discrete Hopfield DAM when correlation recording is used with zero self-coupling (i.e., wii = 0 for all i). In fact, the present capacity result is stronger than the absolute capacity result of Section 7.2.1; when is smaller than the condition of Equation (7.4.6) is satisfied and, therefore, all xk vectors are stable equilibria.

7.4.2 Non-Monotonic Activations DAM

As indicated in Section 7.3, one way of improving DAM performance for a given recording recipe is by appropriately designing the DAM components. Here, the idea is to design the DAM retrieval process so that the DAM dynamics exploit certain known features of the synthesized interconnection matrix W. This section presents correlation-recorded DAMs whose performance is significantly enhanced as a result of modifying the activation functions of their units from the typical sgn- or sigmoid-type activation to more sophisticated non-monotonic activations. Two DAMs are considered: A discrete-time discrete-state parallel-updated DAM, and a continuous-time continuous-state DAM.

Discrete Model

First, consider the zero-diagonal correlation-recorded discrete Hopfield DAM discussed earlier in this chapter. The retrieval dynamics of this DAM show some strange dynamical behavior. When initialized with a vector x(0) that has an overlap p with one of the stored random memories, say x1, the DAM state x(k) initially evolves towards x1 but does not always converge or stay close to x1 as shown in Figure 7.4.4. It has been shown (Amari and Maginu, 1988) that the overlap , when started with p(0) less than some critical value pc, initially increases but soon starts to decrease and ultimately stabilizes at a value less than 1. In this case, the DAM converges to a spurious memory as depicted schematically by the trajectory on the right in Figure 7.4.4. The value of pc increases (from zero) monotonically with the pattern ratio and increases sharply from about 0.5 to 1 as becomes larger than 0.15, the DAM's relative capacity (note that pc can also be written as where is the radius of attraction of a fundamental memory as in Section 7.2.1). This peculiar phenomenon can be explained by first noting the effects of the overlaps p(k) and , on the ith unit weighted-sum ui(k) given by

(7.4.11)

or, when written in terms of p(k) and qh(k),

(7.4.12)

Note the effects of the overlap terms qh(k) on the value of ui(k). The higher the overlaps with memories other than x1, the larger the value of the cross-talk term [the summation term in Equation (7.4.12)] which, in turn, drives |ui(k)| to large values. Morita (1993) showed, using simulations, that both the sum of squares of the overlaps with all stored memories except x1, defined as

(7.4.13)

and p2(k) initially increase with k. Then, one of two scenarios might occur. In the first scenario, s(k) begins to decrease and p2(k) continues to increase until it reaches 1; i.e., x(k) stabilizes at x1. Whereas in the second scenario, s(k) continues to increase and may attain values larger than 1 while p2(k) decreases.












Figure 7.4.4. Schematic representation of converging trajectories in a correlation-recorded discrete Hopfield DAM. When the distance (overlap) between x(0) and x1 is larger (smaller) than some critical value, the DAM converges to a spurious memory (right hand side trajectory). Otherwise, the DAM retrieves the fundamental memory x1. (From Neural Networks, 6, M. Morita, Associative Memory With Nonmonotone Dynamics, pp. 115-126, Copyright 1993, with kind permission from Pergamon Press Ltd., Headington Hill Hall, Oxford 0X3 0BW, UK.)

The above phenomenon suggests a method for improving DAM performance by modifying the dynamics of the Hopfield DAM such that the state is forced to move in such a direction that s(k) is reduced, but not p2(k). One such method is to reduce the influence of units with large |ui| values. Such neurons actually cause the increase in s(k). The influence of a unit i with large |ui|, say |ui| >  > 0, can be reduced by reversing the sign of xi. This method can be implemented using the "partial reverse" dynamics (Morita, 1993) given by:

(7.4.14)

where  > 0 and F and G are activation functions which operate componentwise on their vector arguments. Here, F is the sgn activation function and G is defined by

(7.4.15)

where u is a component of the vector u = Wx(k). The values of parameters and must be determined with care. Empirically,  = 2.7 and may be chosen. These parameters are chosen so that the number of units which satisfy |ui| >  is small when x(k) is close to any of the stored memories provided is not too large. It should be noted that Equation (7.4.14) does not always converge to a stable equilibrium. Numerical simulations show that a DAM employing this partial reverse method has several advantages over the same DAM, but with pure sgn activations. These advantages include a smaller critical overlap pc (i.e., wider basins of attraction for fundamental memories), faster convergence, lower rate of convergence to spurious memories, and error correction capability for pattern ratios up to .

Continuous Model

Consider the continuous-time DAM in Equation (7.1.21) with C =  = I, and  = 0. Namely,

(7.4.16)

where W is the usual zero-diagonal normalized autocorrelation matrix . Here, the partial reverse method described above cannot be applied directly due to the continuous DAM dynamics. One can still capture the essential elements of this method, though, by designing the activation function such that it reduces the influence of unit i if |ui| is very large. This can be achieved by employing the non-monotone activation function shown in Figure 7.4.5 with the following analytical form (Morita et al., 1990 a, b):

(7.4.17)

where , ', and are positive constants with typical values of 50, 15, 1 and 0.5, respectively. This non-monotone activation function operates to keep the variance of |ui| from growing too large, and hence implements a similar effect to the one implemented by the partial reverse method. Empirical results show that this DAM has an absolute capacity that is proportional to n with substantial error correction capabilities. Also, this DAM almost never converges to spurious memories when retrieval of a fundamental memory is not successful; instead, the DAM state continues to wander (chaotically) without reaching any equilibrium (Morita, 1993).











Figure 7.4.5. Non-monotonic activation function generated from Equation (7.4.17) with , , and .

Figure 7.4.6 gives simulation-based capacity curves (Yoshizawa et al., 1993a) depicting plots of the critical overlap pc (i.e., 1 - 2 where is the radius of attraction of a fundamental memory) versus pattern ratio for three DAMs with the dynamics in Equation (7.4.16). Two of the DAMs (represented by curves A and B) employ a zero-diagonal autocorrelation interconnection matrix and a sigmoidal activation function. The third DAM (curve C) employs projection recording with preserved W diagonal and the activation function in Equation (7.4.17). As expected, the DAM with sigmoid activation (curve A) loses its associative retrieval capabilities and the ability to retain the stored memories as fixed points (designated by the dashed portion of curve A) as approaches 0.15. On the other hand, and with the same interconnection matrix, the non-monotone activation DAM exhibits good associative retrieval for a wide range of pattern ratios even when exceeds 0.5 (for example, Figure 7.4.6 (curve B) predicts a basin of attraction radius   0.22 at  = 0.5. This means that proper retrieval is possible from initial states having 22 percent or less random errors with any one of the stored memories). It is interesting to note that this performance exceeds that of a projection-recorded DAM with sigmoid activation function, represented by curve C. Note, though, that the performance of the zero-diagonal projection-recorded discrete DAM with serial update of states (refer to Section 7.2.2 and Figure 7.2.2) has (or ) which exceeds that of the non-monotone activations correlation-recorded DAM. Still, the demonstrated retrieval capabilities of the non-monotone activations DAM are impressive. The non-monotone dynamics can thus be viewed as extracting and using intrinsic information from the autocorrelation matrix which the "sigmoid dynamics" is not capable of utilizing. For a theoretical treatment of the capacity and stability of the non-monotone activations DAM, the reader is referred to Yoshizawa et al. (1993a, b). Nishimori and Opris (1993) reported a discrete-time discrete-state version of this model where the nonmonotonic activation function in Figure 7.4.5 is used with ,  = 1.0, and arbitrary  > 0. They showed that a maximum capacity of is possible with , and gave a complete characterization of this model's capacity versus the parameter .











Figure 7.4.6. Simulation generated capacity/error correction curves for the continuous-time DAM of Equation (7.4.16). Curves A and B represent the cases of zero-diagonal correlation-recorded DAM with sigmoidal activation function and non-monotone activation function, respectively. Curve C is for a projection recorded (preserved diagonal) DAM with sigmoidal activation function which is given for comparison purposes. (From Neural Networks, 6, S. Yoshizawa et al., Capacity of Associative Memory Using a Nonmonotonic Neuron Model, pp. 167-176, with kind permission from Pergamon Press Ltd., Headington Hill Hall, Oxford 0X3 0BW, UK.)

7.4.3 Hysteretic Activations DAM

Associative recall of DAMs can be improved by introducing hysteresis to the units' activation function. This phenomenon is described next in the context of a discrete Hopfield DAM. Here, the interconnection matrix W is the normalized zero-diagonal autocorrelation matrix with the ith DAM state updated according to:

(7.4.18)

where the activation function fi is given by:

(7.4.19)

The following discussion assumes a hysteretic parameter i =  for all units i = 1, 2, ..., n. A plot of this activation function is given in Figure 7.4.7 which shows a hysteretic property controlled by the parameter  > 0. Qualitatively speaking, the hysteresis term xi(k) in Equation (7.4.19) favors a unit to stay in its current state xi(k); the larger the value of , the higher the tendency of unit i to retain its current state.









Figure 7.4.7. Transfer characteristics of a unit with hysteretic activation function.

In the previous section, it was noted that when the state of a DAM is not far from a fundamental memory, degradation of the associative retrieval process is caused by the units moving from the right states to the wrong ones. The motivation behind the hysteretic property is that it causes units with proper response to preserve their current state longer, thus increasing the chance for the DAM to correct its "wrong" states and ultimately converge to the closest fundamental memory. But, simultaneously, there are units moving from the wrong states to the right ones, and hysteresis tends to prevent these transitions as well. It has been shown (Yanai and Sawada, 1990) that, for the proper choice of , the former process is more effective than the later and associative retrieval can be enhanced. For small , Yanai and Sawada (1990) showed that the absolute capacity of a zero-diagonal correlation-recorded DAM with hysteretic units is given by the limit

(7.4.20)

This result can be derived by first noting that the ith bit transition for the above DAM may be described by Equation (7.2.2) with xi(0) replaced by (1 + )xi(0), and following a derivation similar to that in Equations (7.2.3) through (7.2.6). By comparing Equations (7.4.20) and (7.2.6) we find that hysteresis leads to a substantial increase in the number of memorizable vectors as compared to using no hysteresis at all. Yanai and Sawada also showed that the relative capacity increases with (e.g., the relative capacity increases from 0.15 at  = 0 to about 0.25 at ). This implies that the basin of attraction of fundamental memories increase when hysteresis is employed. Empirical results suggest that a value of slightly higher than (with  << 1) leads to the largest basin of attraction size around fundamental memories.

Hysteresis can arise from allowing a non-zero diagonal element wii, i = 1, 2, ..., n. To see this, consider a discrete DAM with a normalized autocorrelation matrix . Note that this matrix has diagonal elements wii = , i = 1, 2, ..., n. Now, the update rule for the ith unit is

(7.4.21)

Comparing Equation (7.4.21) to Equation (7.4.19) of a hysteretic activation DAM reveals that the two DAMs are mathematically equivalent if is set equal to (it is interesting to note that is, approximately, the empirically optimal value for the hysteretic parameter ). Therefore, it is concluded that preserving the original diagonal in a correlation-recorded DAM is advantageous in terms of the quality of associative retrieval, when  << 1 (see Problem 7.4.8 for further explanation).

The advantages of small positive self-connections has also been demonstrated for projection-recorded discrete DAMs. Krauth et al. (1988) have demonstrated that using a small positive diagonal element with the projection-recorded discrete DAM increases the radius of attraction of fundamental memories if the DAM is substantially loaded (i.e.,  is approximately greater than 0.4). For example, they found numerically that for  = 0.5, using a diagonal term of about 0.075 instead of zero increases the basins of attraction of fundamental memories by about 50%. For the projection DAM, though, retaining the original diagonal leads to relatively large values for the self-connections if the pattern ratio is large (for m unbiased random memories , we have wii  , i = 1, 2, ..., n). This greatly reduces the basin of attraction size for fundamental memories as shown empirically in Section 7.2.2. All this suggests that the best approach is to give all self-connections a small positive value  << 1.

7.4.4 Exponential Capacity DAM

Up to this point, the best DAM considered has a capacity proportional to n, the number of units in the DAM. However, very high capacity DAMs (e.g., m exponential in n) can be realized if one is willing to consider more complex memory architectures than the ones considered so far. An exponential capacity DAM was proposed by Chiueh and Goodman (1988, 1991). This DAM can store up to cn (c > 1) random vectors xh  {-1, +1}n, h = 1, 2, ..., m, with substantial error correction abilities. The exponential DAM is described next.

Consider the architecture in Figure 7.4.8. This architecture describes a two-layer dynamic autoassociative memory. The dynamics are nonlinear due to the nonlinear operations G and F. The output nonlinearity F implements sgn activations which gives the DAM a discrete-state nature. The matrix X is the matrix whose columns are the desired memory vectors xh, h = 1, 2, ..., m. This DAM may update its state x(k) in either serial or parallel mode. The parallel updated dynamics are given in vector form as:

(7.4.22)

and in component form as:

(7.4.23)

where g is the scalar version of the operator G. Here, g is normally assumed to be a continuous monotone non-decreasing function over [-n, +n]. The recording of a new memory vector xh is simply done by augmenting the matrices XT and X with (xh)T and xh, respectively (this corresponds to allocating two new n-input units, one in each layer, in the network of Figure 7.4.5. The choice of the first layer activation functions, g, plays a critical role in determining the capacity and dynamics of this DAM. To see this, assume first a simple linear activation function g() = . This assumption reduces the dynamics in Equation (7.4.22) to

(7.4.24)

which is simply the dynamics of the correlation-recorded discrete DAM.


Figure 7.4.8. Architecture of a very high capacity discrete DAM.

On the other hand, if one chooses g() = (n + )q, q is an integer greater than 1, a higher-order DAM results with polynomial capacity m  nq for large n (Psaltis and Park, 1986). Exponential capacity is also possible with proper choice of g. Such choices include g() = (n - )-a, a > 1 (Dembo and Zeitouni, 1988; Sayeh and Han, 1987) and g() = a, a > 1 (Chiueh and Goodman, 1988).

The choice g() = a results in an "exponential" DAM with capacity m = cn and with error correction capability. Here, c is a function of a and the radius of attraction of fundamental memories as depicted in Figure 7.4.9. According to this figure, the exponential DAM is capable of achieving the ultimate capacity of a binary state DAM, namely m = 2n, in the limit of large a. As one might determine intuitively, though, this DAM has no error correction abilities at such loading levels. For relatively small a, one may come up with an approximate linear relation between c and . For example, Figure 7.4.9 gives c  1.2 - 0.4 for a = . Hence, an exponential DAM with nonlinearity can store up to (1.2 - 0.4)n random memories if it is desired that all such memories have basins of attraction of size , 0   <  (here, one pass retrieval is assumed).









Figure 7.4.9. Relation between the base constant c and fundamental memories basins of attraction radius , for various values of the nonlinearity parameter a, for an exponential DAM. This DAM has a storage capacity of cn. (Adapted from T. D. Chiueh and R. M. Goodman, 1991, Recurrent Correlation Associative Memories, IEEE Transactions on Neural Networks, 2(2), pp. 275-284, ©1991 IEEE.)

Chiueh and Goodman (1991) showed that a sufficient condition for the dynamics in Equation (7.4.23) to be stable in both serial and parallel update modes is that the activation function g() be continuous and monotone non-decreasing over [-n, +n]. This condition can be easily shown to be true for all choices of g() indicated above.

7.4.5 Sequence Generator DAM

Autoassociative DAMs can be synthesized to act as sequence generators. In fact, no architectural changes are necessary for a basic DAM to behave as a sequence generator. Furthermore, a simple correlation recording recipe may still be utilized for storing sequences. Here, a simple sequence generator DAM (also called temporal associative memory) is described whose dynamics are given by Equation (7.1.37) operating in a parallel mode with Ii = 0, i = 1, 2, ..., n. Consider a sequence of mi distinct patterns Si:  with , j = 1, 2, ..., mi. The length of this sequence is mi. This sequence is a cycle if with mi > 2. Here, the subscript i on xi and mi refers to the ith sequence. An autoassociative DAM can store the sequence Si when the DAM's interconnection matrix W is defined by

(7.4.25)

Note that the first term on the right hand side of Equation (7.4.25) represents the normalized correlation recording recipe of the heteroassociations , j = 1, 2, ..., mi - 1, whereas the second term is an autocorrelation that attempts to terminate the recollection process at the last pattern of sequence Si, namely, . Similarly, a cycle (i.e., Si with and mi > 2) can be stored using Equation (7.4.25) with the autocorrelation term removed.

This DAM is also capable of storing autoassociations by treating them as sequences of length 1 and using Equation (7.4.25) (here, the first term in Equation (7.4.25) vanishes). Finally, Equation (7.4.25) can be extended for storing s distinct sequences Si by summing it over i, i = 1, 2, ..., s. Hence, this sequence generator DAM is capable of simultaneous storage of sequences with different lengths and cycles with different periods. However, associative retrieval may suffer if the loading of the DAM exceeds its capacity. Also, the asymmetric nature of W in Equation (7.4.25) will generally lead to spurious cycles (oscillations) of period two or higher.

Generally speaking, the capacity of the sequence generator DAM is similar to an autoassociative DAM if unbiased independent random vectors/patterns are assumed in all stored sequences. This implies that the effective number of stored vectors must be very small compared to n, in the limit of large n, for proper associative retrieval.

7.4.6 Heteroassociative DAM

A Heteroassociative DAM (HDAM) is shown in the block diagram of Figure 7.4.10 (Okajima et al., 1987). It consists of two processing paths which form a closed loop. The first processing path computes a vector from an input x  {-1, +1}n according to the parallel update rule

(7.4.26)

or its serial (asynchronous) version where one and only one unit updates its state at a given time. Here, F is usually the sgn activation operator. Similarly, the second processing path computes a vector x according to

(7.4.27)

or its serial version. The vector y in Equation (7.4.27) is the same vector generated by Equation (7.4.26). Also, because of the feedback employed, x in Equation (7.4.26) is given by Equation (7.4.27).


Figure 7.4.10. Block diagram of a Heteroassociative DAM.

The HDAM can be operated in either parallel or serial retrieval modes. In the parallel mode, the HDAM starts from an initial state x(0) [y(0)], computes its state y (x) according to Equation (7.4.26) [Equation (7.4.27)], and then updates state x (y) according to Equation (7.4.27) [Equation (7.4.26)]. This process is iterated until convergence; i.e., until state x (or equivalently y) ceases to change. On the other hand, in the serial update mode, only one randomly chosen component of the state x or y is updated at a given time.

Various methods have been proposed for storing a set of heteroassociations {xkyk}, k = 1, 2, ..., m in the HDAM. In most of these methods, the interconnection matrices W1 and W2 are computed independently by requiring that all one-pass associations xk  yk and yk  xk, respectively, are perfectly stored. Here, it is assumed that the set of associations to be stored forms a one-to-one mapping; otherwise, perfect storage becomes impossible. Examples of such HDAM recording methods include the use of projection recording (Hassoun, 1989 a, b) and Householder transformation-based recording (Leung and Cheung, 1991). These methods require the linear independence of the vectors xk (also yk) for which a capacity of m = min(nL) is achievable. One draw back of these techniques, though, is that they do not guarantee the stability of the HDAM; i.e., convergence to spurious cycles is possible. Empirical results show (Hassoun, 1989b) that parallel updated projection-recorded HDAMs exhibit significant oscillatory behavior only at memory loading levels close to the HDAM capacity.

Kosko (1987, 1988), independently, proposed a heteroassociative memory with the architecture of the HDAM, but with the restriction . This memory is known as a bidirectional associative memory (BAM). The interesting feature of a BAM is that it is stable for any choice of the real-valued interconnection matrix W and for both serial and parallel retrieval modes. This can be shown by starting from the BAM's bounded Liapunov (energy) function

(7.4.28)

and showing that each serial or parallel state update decrease E (see Problem 7.4.14). One can also prove BAM stability by noting that a BAM can be converted to a discrete autoassociative DAM (discrete Hopfield DAM) with state vector x' = [xT yT]T and interconnection matrix W' given by

(7.4.29)

Now, since W' is a symmetric zero-diagonal matrix, the autoassociative DAM is stable if serial update is assumed as was discussed in Section 7.1.2 (also, see Problem 7.1.18). Therefore, the serially updated BAM is stable. One may also use this equivalence property to show the stability of the parallel updated BAM (note that a parallel updated BAM is not equivalent to the (nonstable) parallel updated discrete Hopfield DAM. This is because either states x or y, but not both, are updated in parallel at each step.)

From above, it can be concluded that the BAM always converges to a local minimum of its energy function defined in Equation (7.4.28). It can be shown (Wang et al., 1991) that these local minima include all those that correspond to associations {xkyk} which are successfully loaded into the BAM (i.e., associations which are equilibria of the BAM dynamics.)

The most simple storage recipe for storing the associations as BAM equilibrium points is the correlation recording recipe of Equation (7.1.2). This recipe guarantees the BAM requirement that the forward path and backward path interconnection matrices W1 and W2 are the transpose of each other, since

(7.4.30)

and

(7.4.31)

However, some serious drawbacks of using the correlation recording recipe are low capacity and poor associative retrievals; when m random associations are stored in a correlation-recorded BAM, the condition m << min(nL) must be satisfied if good associative performance is desired (Hassoun, 1989b; Simpson, 1990). Heuristics for improving the performance of correlation-recorded BAMs can be found in Wang et al. (1990).

Before leaving this section, it should be noted that the above models of associative memories are by no means exclusive. A number of other interesting models have been reported in the literature (interested readers may find the volume edited by Hassoun (1993) useful in this regard.) Some of these models are particularly interesting because of connections to biological memories (e.g., Kanerva, 1988 and 1993; Alkon et al., 1993).

Back to the Table of Contents

Back to Main Menu