7.2 DAM Capacity and Retrieval Dynamics

In this section, the capacity and retrieval characteristics of the autoassociative DAMs introduced in the previous section are analyzed. Correlation-recorded DAMs are considered first, followed by projection-recorded DAMs.

7.2.1 Correlation DAMs

DAM capacity is a measure of the ability of a DAM to store a set of m unbiased random binary patterns (that is, the vector components  are independent random variables taking values 1 or with probability ) and at the same time be capable of associative recall (error correction). A commonly used capacity measure, known as "absolute" capacity, takes the form of an upper bound on the pattern ratio in the limit such that all stored memories are equilibrium points, with a probability approaching one. This capacity measure, though, does not assume any error correction behavior; i.e., it does not require that the fundamental memories xk be attractors with associated basins of attraction. Another capacity measure, known as "relative" capacity, has been proposed which is an upper bound on such that the fundamental memories or their "approximate" versions are attractors (stable equilibria).

It has been shown (Amari, 1977a; Hopfield, 1982; Amit et al., 1985) that if most of the memories in a correlation-recorded discrete Hopfield DAM, with wii = 0, are to be remembered approximately (i.e., nonperfect retrieval is allowed), then must not exceed 0.15. This value is the relative capacity of the DAM. Another result on the capacity of this DAM for the case of error-free memory recall by one-pass parallel convergence is (in probability) given by the absolute capacity (Weisbuch and Fogelman-Soulié, 1985; McEliece et al., 1987; Amari and Maginu, 1988; Newman, 1988), expressed as the limit

(7.2.1)

Equation (7.2.1) indicates that the absolute capacity approaches zero as n approaches infinity! Thus, the correlation-recorded discrete Hopfield net is an inefficient DAM model. The absolute capacity result in Equation (7.2.1) is derived below.

Assuming yk = xk (autoassociative case) in Equation (7.1.7) and wii = 0, i = 1, 2, ..., n, then by direct recall from initial input x(0) = xh with xh  {xk}, the ith bit of the retrieved state x(1) is given by

(7.2.2)

Consider the quantity Ci(0) = -xi(0)i(0). Now, if Ci(0) is negative then xi(0) and i(0) have the same sign and the cross-talk term i(0) does no harm. On the other hand, if Ci(0) is positive and larger than 1, then the one-pass retrieved bit xi(1) is in error. Next, assume that the stored memories are random, with equal probability for and for , independently for each k and i. Hence, for large m and n, the Ci(0) term is approximately distributed according to a normal distribution N(, 2) = N(0, ). To see this, we first note that , and equivalently Ci(0), is times the sum of (n - 1)(m - 1) independent, and uniformly distributed random bipolar binary numbers, and thus it has a binomial distribution with zero mean and variance . Thus, by virtue of the Central Limit Theorem, Ci(0) approaches N(0, ) asymptotically as m and n become large (Mosteller et al., 1970). Therefore, we may compute the probability that xi(1) is in error, Perror = Prob(Ci(0) > 1), by integrating N(0, ) from 1 to , giving

(7.2.3)

where is the error function. Note that Equation (7.2.3) is a special case of Equation (7.1.8) where Din in Equation (7.1.8) is set to zero. Now, using the fact that for a random variable x distributed according to N(0, 2), , and if the ith bit xi(1) in Equation (7.2.2) is required to be retrievable with error probability less than 0.0014, then the condition 3 < 1 or must be satisfied. Therefore, m < 0.111n is required for Perror  0.0014. Similarly, Equation (7.2.3) can be solved for the requirement Perror  0.005 which gives .

On the other hand, if all memories xk are required to be equilibria of the DAM with a probability close to 1, say 0.99, then an upper bound on can be derived by requiring that all bits of all memories xk be retrievable with less than a 1 percent error; i.e., (1 - Perror)mn > 0.99. Employing the binomial expansion, this inequality may be approximated as Perror < . Noting that this stringent error correction requirement necessitates small values, Equation (7.2.3) can be approximated using the asymptotic (x  ) expansion . This approximation can then be used to write the inequality Perror <  as

(7.2.4)

Now, taking the natural logarithm (ln) of both sides of the inequality in Equation (7.2.4) and eliminating all constants and the ln  factor (since they are dominated by higher-order terms) results in the bound

(7.2.5)

By noting that ln mn < ln n2 (since n > m), a more stringent requirement on in Equation (7.2.5) becomes

(7.2.6)

which represents the absolute capacity of a zero-diagonal correlation-recorded DAM. The effects of a non-zero diagonal on DAM capacity are treated in Section 7.4.3.

Another, more useful, DAM capacity measure gives a bound on in terms of error correction and memory size (Weisbuch and Fogelman-Soulié, 1985; McEliece et al., 1987). According to this capacity measure, a correlation-recorded discrete Hopfield DAM must have its pattern ration, , satisfy

(7.2.7)

in order that error-free one-pass retrieval of a fundamental memory (say xk) from random key patterns lying inside the Hamming hypersphere (centered at xk) of radius n ( < ) is achieved with probability approaching 1. Here, defines the radius of attraction of a fundamental memory. In other words, is the largest normalized Hamming distance from a fundamental memory within which almost all of the initial states reach this fundamental memory, in one-pass. The inequality in Equation (7.2.7) can be derived by starting with an equation similar to Equation (7.2.2) with one difference that the input x(0) is not one of the stored memories. Rather, a random input x(0) is assumed that has an overlap with one of the stored memories, say xh. Here, one can readily show that the ith retrieved bit xi(1) is in error if and only if Ci(0) > 1 - 2. The error probability for the ith bit is then given by Equation (7.2.3) with the lower limit on the integral replaced by 1 - 2. This leads to which, by using a similar derivation to the one leading to Equation (7.2.6), leads us to Equation (7.2.7).

The capacity analysis leading to Equations (7.2.6) and (7.2.7) assumed a single parallel retrieval iteration, starting from x(0) and retrieving x(1). This same analysis can not be applied if one starts the DAM at x(k), k =1, 2, ...; i.e., Equations (7.2.6) and (7.2.7) are not valid for the second or higher DAM iterations. In this case, the analysis is more complicated due to the fact that x(k) becomes correlated with the stored memory vectors, and hence the statistical properties of the noise term i(k) in Equation (7.2.2) are more difficult to determine (in fact, such correlations depend on the whole history of x(k - T), T = 0, 1, 2, ...). Amari and Maginu (1988) (see also Amari and Yanai, 1993) analyzed the transient dynamical behavior of memory recall under the assumption of a normally distributed i(k) [or Ci(k)] with mean zero and variance 2(k). This variance was calculated by taking the direct correlations up to two steps between the bits of the stored memories and those in x(k). Under these assumptions, the relative capacity was found to be equal to 0.16. This theoretical value is in good agreement with early simulations reported by Hopfield (1982) and with the theoretical value of 0.14 reported by Amit et al. (1985) using a method known as the replica method.

Komlós and Paturi (1988) showed that if Equation (7.2.6) is satisfied, then the DAM is capable of error correction when multiple-pass retrievals are considered. In other words, they showed that each of the fundamental memories is an attractor with a basin of attraction surrounding it. They also showed that once initialized inside one of these basins of attraction, the state converges (in probability) to the basin's attractor in order ln(ln n) parallel steps. Burshtien (1993) took these results a step further by showing that the radius of the basin of attraction for each fundamental memory is , independent of the retrieval mode (serial or parallel). He also showed that a relatively small number of parallel iterations, asymptotically independent of n, is required to recover a fundamental memory even when is very close to (e.g., for  = 0.499, at most 20 iterations are required).

When initialized with a key input x(0) lying outside the basins of attraction of fundamental memories, the discrete Hopfield DAM converges to one of an exponential (in n) number of spurious memories. These memories are linear combinations of fundamental memories (Amit et al., 1985; Gardner, 1986; Komlos and Paturi, 1988). Thus, a large number of undesirable spurious states which compete with fundamental memories for basins of attraction "volumes" are intrinsic to correlation-recorded discrete Hopfield DAMs.

Next, the capacity and retrieval characteristics of two analog (continuous-state) correlation-recorded DAMs (one continuous-time and the other discrete-time parallel updated) based on models introduced in the previous section are analyzed. The first analog DAM considered here will be referred to as the continuous-time DAM in the remainder of this section. It is obtained from Equation (7.1.21) by setting C =  = I and  = 0. Its interconnection matrix W is defined by the autocorrelation version of Equation (7.1.6) with zero-diagonal. The second analog DAM is obtained from Equation (7.1.29) with  = 0. It employs the normalized correlation recording recipe for W with zero-diagonal as for the continuous-time DAM. This later analog DAM will be referred to as the discrete-time DAM in this section. Both DAMs employ the tangent hyperbolic activation function with gain .

The dynamics of these two DAMs have been studied in terms of the gain and the pattern ratio for unbiased random bipolar stored memories (Amit et al., 1985, 1987; Marcus et al., 1990; Shiino and Fukai, 1990; Kühn et al., 1991; Waugh et al., 1993). Figure 7.2.1 shows two analytically derived phase diagrams for the continuous-time and discrete-time DAMs, valid in the limit of large n (Marcus et al., 1990; Waugh et al., 1993). These diagrams indicate the type of attractors as a function of activation function gain and pattern ratio . For the continuous-time DAM, the diagram (Figure 7.2.1a) shows three regions labeled origin, spin glass, and recall. In the recall region, the DAM is capable of either exact or approximate associative retrieval of stored memories. In other words, a set of 2m attractors exist each having a large overlap (inner product) with a stored pattern or its inverse (the stability of the inverse of a stored memory is explored in Problem 7.1.13). This region also contains attractors corresponding to spurious states that have negligible overlaps with the stored memories (or their inverse). In the spin glass region (named so because of the similarity to dynamical behavior of simple models of magnetic material (spin glasses) in statistical physics), the desired memories are no longer attractors; hence, the only attractors are spurious states. Finally, in the origin region, the DAM has the single attractor state x = 0.








(a)









(b)

Figure 7.2.1. Phase diagrams for correlation-recorded analog DAMs with activation function for (a) continuous-time and (b) parallel discrete-time updating. (Adapted from C. M. Marcus et al., 1990, with permission of the American Physical Society.)

The boundary separating the recall and spin glass regions determines the relative capacity of the DAM. For the continuous-time DAM, and in the limit of high gain ( > 10), this boundary asymptotes at which is essentially the relative capacity of the correlation-recorded discrete Hopfield DAM analyzed earlier in this section. This result supports the arguments presented at the end of Section 7.1.2 on the equivalence of the macroscopic dynamics of the discrete Hopfield net and the continuous-time, high-gain Hopfield net.

The phase diagram for the discrete-time DAM is identical to the one for the continuous-time DAM except for the presence of a fourth region marked oscillation in Figure 7.2.1 b. In this region, the stability condition in Equation (7.1.32) is violated (note that the zero-diagonal autocorrelation weight matrix can have negative eigenvalues. See Problem 7.2.1). Associative retrieval and spurious states may still exist in this region (especially if is small), but the DAM can also become trapped in period-2 limit cycles (oscillations). In both DAMs, error correction capabilities cease to exist at activation gains close to or smaller than 1 even as approaches zero.

The boundary separating the recall and spin glass regions has been computed by methods that combine the Liapunov function approach with the statistical mechanics of disordered systems. The boundary between the spin glass and origin regions in Figure 7.2.1 (a) and (b) is given by the expression

(7.2.8)

This expression can be derived by performing local stability analysis about the equilibrium point x* = 0 that defines the origin region. This method is explored in Problems 7.2.2 and 7.2.3 for the continuous-time and discrete-time DAMs, respectively.

The associative retrieval capability of the above analog DAMs can vary considerably even within the recall region. It has been shown analytically and empirically that the basin of attraction of fundamental memories increases substantially as the activation function gain decreases with fixed pattern ration , for values of . Waugh et al. (1991, 1993) showed that the number of local minima (and thus spurious states) in the Liapunov function of the above DAMs increases exponentially as exp(ng) where . Here, g is monotonically increasing in . Therefore, even a small decrease in can lead to substantial reduction in the number of local minima, especially the shallow ones which corresponds to spurious memories. The reason behind the improved DAM performance as gain decreases is that the Liapunov function becomes smoother, so that shallow local minima are eliminated. Since the fundamental memories tend to lie in wide, deep basins, essentially all of the local minima eliminated correspond to spurious memories. This phenomenon is termed deterministic annealing and it is reminiscent of what happens as temperature increases in simulated annealing (the reader is referred to the next chapter for a discussion of annealing methods in the context of neural networks).

7.2.2 Projection DAMs

The capacity and performance of autoassociative correlation-recorded DAMs can be greatly improved if projection recording is used to store the desired memory vectors (recall Equation (7.1.12), with Y = X). Here, any set of memories can be memorized without errors as long as they are linearly independent (note that linear independence restricts m to be less than or equal to n). In particular, projection DAMs are well suited for memorizing unbiased random vectors xk  {-1, +1}n since it can be shown that the probability of m (m < n) of these vectors to be linearly independent approaches 1 in the limit of large n (Komlós, 1967). In the following, the retrieval properties of projection-recorded discrete-time DAMs are analyzed. More specifically, the two versions of discrete-state DAMs, serially updated and parallel updated, and the parallel updated continuous-state DAM are discussed. For the remainder of this section, these three DAMs will be referred to as the serial binary, parallel binary, and parallel analog projection DAMs, respectively. The following analysis assumes the usual unbiased random bipolar binary memory vectors.

The relation between the radius of attraction of fundamental memories and the pattern ratio is a desirable measure of DAM retrieval/error correction characteristics. For correlation-recorded binary DAMs, such a relation has been derived analytically for single-pass retrieval and is given by Equation (7.2.7). On the other hand, deriving similar relations for multiple-pass retrievals and/or more complex recording recipes (such as projection recording) is a much more difficult problem. In such cases, numerical simulations with large n values (typically equal to several hundred) are a viable tool (e.g., see Kanter and Sompolinsky, 1987; Amari and Maginu, 1988). Figure 7.2.2, reported byKanter and Sompolinsky (1987), shows plots of the radius of attraction of fundamental memories versus pattern ration generated by numerical simulation for serial binary and parallel binary projections DAMs (multiple-pass retrieval is assumed).











Figure 7.2.2. Measurements of as a function of by computer simulation for projection-recorded binary DAMs. The lines are guides to the eye. The typical size of the statistical fluctuations is indicated. Lines tagged by a refer to a zero-diagonal projection matrix W. Lines tagged by b refer to a standard projection matrix. Solid lines refer to serial update with a specific order of updating as described in the text. Dashed lines refer to parallel update. (Adapted from I. Kanter and H. Sompolinsky, 1987, with permission of the American Physical Society.)

There are two pairs of plots, labeled a and b, in this figure. The pair labeled a corresponds to the case of a zero-diagonal projection weight matrix W. Whereas pair b corresponds to the case of a projection W matrix with preserved diagonal. The solid and dashed lines in Figure 7.2.2 represent serial and parallel retrievals, respectively.

According to these simulations, forcing the self-coupling terms wii in the diagonal of W to zero has a drastic effect on the size of the basin of attraction. Note that the error correction capability of fundamental memories ceases as approaches and then exceeds for both serial and parallel DAMs for the nonzero-diagonal case (Problem 7.2.8 explores this phenomenon). On the other hand, the corresponding DAMs with zero-diagonal projection matrix continue to have substantial error correction capabilities even after exceed , but ultimately lose these capabilities at . A common feature of these discrete projection DAMs is the monotonic decrease of from 0.5 to 0 as increases from 0 to 1. Empirical results show that, inside the basin of attraction of stable states, the flows to the states are fast with a maximum of 10 to 20 parallel iterations (corresponding to starting at the edge of the basin). These results are similar to the theoretical ones reported for parallel updated correlation-recorded DAMs (Burshtien, 1993).

For the above parallel updated zero-diagonal projection DAM, simulations show that in almost all cases where the retrieval did not result in a fundamental memory, it resulted in a limit cycle of period two as opposed to spurious memories. For the preserved-diagonal projection DAM, simulations with finite n and show that no oscillations exist (Youssef and Hassoun, 1989). (The result of Problem 7.2.10 taken in the limit of    can serve as an analytical proof for the nonexistence of oscillations for the case of large n).

The zero-diagonal serial projection DAM has the best performance depicted by the solid line a in Figure 7.2.2. In this case an approximate linear relation between and can be deduced from this figure as

(7.2.9)

Here, the serial update strategy used employs an updating such that the initial updates are more likely to reduce the Hamming distance (i.e., increase the overlap) between the DAM's state and the closest fundamental memory rather than increase it. In the simulations, the initial state x(0) tested had its first n bits identical to one of the stored vectors, say x1, while the remaining bits were chosen randomly. Thus the region represents the bits where errors are more likely to occur. The serial update strategy used above allowed the units corresponding to the initially random bits {xi(0), } to update their states before the ones having the correct match with x1 (i.e., units corresponding to {xi(0), }). However, in practical applications, this update strategy may not be applicable (unless we have partial input keys that match a segment of one of the stored memories such as a partial image) and hence, a standard serial update strategy (e.g., updating the n states in some random or unbiased deterministic order) may be employed. Such standard serial updating leads to reduced error correction behavior compared to the particular serial update employed in the above simulations. The performance, though, would still be better than that of a parallel updated DAM. Spurious memories do exist in the above projection DAMs. These spurious states are mixtures of the fundamental memories (just as in the correlation-recorded discrete Hopfield DAM) at very small values. Above  0.1, mixture states disappear. Instead, most of the spurious states have very little overlap with individual fundamental memories.

Lastly, consider the parallel analog projection DAM with zero-diagonal interconnection matrix W. This DAM has the phase diagram shown in Figure 7.2.3 showing origin, recall, and oscillation phases, but no spin glass phase (Marcus et al., 1990). The absence of the spin glass phase does not imply that this DAM does not have spurious memories; just as for the correlation-recorded discrete Hopfield DAM, there are many spurious memories within the recall and oscillation regions which have small overlap with fundamental memories, especially for large . However, there is no region where only spurious memories exist. Note also that in the oscillation region, all fundamental memories exist as stable equilibria states with basins of attraction defined around each of them. The radius of this basin decreases (monotonically) as increases until ultimately all such memories lose their basins of attraction.








Figure 7.2.3. Phase diagram for the parallel updated analog projection DAM (with zero-diagonal W). (Adapted from C. M. Marcus et al., 1990, with permission of the American Physical Society.)

According to Equation (7.1.32), oscillations are possible in the dynamics of the present analog DAM when , where min is the minimal eigenvalue of the interconnection matrix W. It can be shown (Kanter and Sompolinsky, 1987) that a zero-diagonal projection matrix which stores m unbiased random memory vectors xk  {-1, +1}n has the extremal eigenvalues and . Therefore, the oscillation region in the phase diagram of Figure 7.2.3 is defined by . Also, by following an analysis similar to the one outlined in Problem 7.2.3, it can be shown that the origin point loses its stability when for and for  > 0.5. With these expressions, it can be easily seen that oscillation-free associative retrieval is possible up to if the gain is equal to 2. Adding a positive diagonal element to W shifts the extremal eigenvalues min and max to and , respectively, and thus increases the value of for oscillation-free associative retrieval of fundamental memories to a maximum value of , which exists at  = 2. The recall regions for several values of are shown in Figure 7.2.4. Here, one should note that the increase in the size of the recall region does not necessarily imply increased error correction. On the contrary, a large diagonal term greatly reduces the size of the basins of attraction of fundamental memories as was seen earlier for the binary projection DAM. The reader is referred to Section 7.4.3 for further exploration into the effects of the diagonal term on DAM performance.












Figure 7.2.4. The recall region of a parallel updated analog projection DAM for various values of diagonal element . (Adapted from C. M. Marcus et al., 1990, with permission of the American Physical Society.)

Back to the Table of Contents

Back to Main Menu