Problems

7.1.1 Verify Equation (7.1.7) by substituting Equation (7.1.6) into Equation (7.1.5).

7.1.2 Consider a set of m uniformly distributed random vectors xk  {-1, +1}n. Assume large n values (e.g., n = 200) and m << n (e.g., m = 10). Compute the inner products for ij = 1, 2, ..., m. What can you say about the degree of orthogonality of this set of vectors? Repeat with .

7.1.3 Consider the autoassociative version of the recording recipe in Equation (7.1.11) and assume that |X|  0. Show that the LAM employing this recording recipe has no error correction capabilities. (Note that Equation (7.1.11) assumes m = n).

7.1.4 Show that the LAM with the projection recording recipe in Equation (7.1.12) and with linearly independent vectors xk  Rn is capable of perfect storage of all associations {xkyk}, k = 1, 2, ..., m, with m < n. Also, show that in the autoassociative case the matrix W = projects onto the m-dimensional subspace spanned by the vectors xk, which are the columns of matrix X.

7.1.5 Consider the autoassociative LAM recording recipe given by


where D = diag [12, ..., m] with i > 0, i = 1, 2, ..., m, and X = [x1 x2 ... xm]. Assume the vectors xk, k = 1, 2, ..., m, are linearly independent. Show that the m vectors xk are stored as eigenvectors of W with corresponding eigenvalues k.

* 7.1.6 Show that Equation (7.1.12) is the solution of the following constrained optimization problem:


with the assumption that the columns of matrix X are linearly independent.

* 7.1.7 Consider an autoassociative correlation memory defined by Equation (7.1.5). Assume that a set of m orthogonal vectors {xk; k = 1, 2, ..., m} has already been loaded into this memory according to Equation (7.1.6) with . Show that any input vector lying within a Hamming distance of from a stored memory xk leads to the perfect retrieval of this memory in one step (Personnaz et al., 1986).

* 7.1.8 Using the notation of Section 7.1.1, show that for any LAM with interconnection matrix W,


where wij is an element of W and the expectation is over all elements of W.

* 7.1.9 For an autoassociative OLAM, show that by employing the projection recording recipe, and recording with linearly independent memories xk  Rn, k = 1, 2, ..., m, the error correction behavior is given by:


(Hint: First, show that and then substitute this result into the expression of from Problem 7.1.8).

7.1.10 Simplify Equation (7.1.14) for the case of orthonormal xk's, k = 1, 2, ..., m, and yk are the columns of the m × m identity matrix I. Repeat for y = cI with |c| < 1. What do these results say about the appropriateness of in measuring the error correction capabilities of heteroassociative OLAMs?

* 7.1.11 A properly normalized performance measure for a heteroassociative OLAM is the signal-to-noise (SNR) ratio given by:


where is given by Equation (7.1.14), and and are signal "energies" given by:

and


where the averaging is over all elements i and all vectors k. Show that for zero-mean orthonormal key vectors xk, k = 1, 2, ..., m, and zero-mean recollection vectors yk, this measure gives:


Compare this result to those in Problem 7.1.10.

7.1.12 Show that the Hopfield net is a gradient system by verifying Equation (7.1.25).

7.1.13 Show that for the Hopfield DAM with the computational energy in Equation (7.1.27), states x and -x have equal energy, E(x) = E(-x). Also, show that if xk is a local minimum of E, then -xk is also a local minimum of E.

7.1.14 Consider the Hopfield DAM with the computational energy function

(1)

The idea of the energy function as something to be minimized in the stable states gives us a way to derive a learning recipe for this DAM. If we desire a memory xk to be a local minimum of the energy, we would require that the overlap between the DAM state x and the vector xk is maximized. So we choose

(2)

If it is desired that m memories be stored as local minima, then we may minimize the energy function

(3)

Show that the computational energy function in (3) is that of the DAM in (1) when correlation recording is used.

7.1.15 Consider the high-gain, zero-bias continuous-time Hopfield net with symmetric W. This net may be described by the gradient system


where


a. Find the equilibria states for this system.

b. Compute the Hessian and employ it to study the stability of the equilibria of part a.

* c. Show that if W is positive definite, then the only attractors are corners of the hypercube.

7.1.16 Verify inequality (7.1.35) for the activation function . [Hint: note that is the minimum curvature of G(x) in Equation (7.1.31).]

7.1.17 Consider the discrete-time discrete-state DAM operated in a serial mode (discrete Hopfield net). Suppose that only one xi(k) changes its value to xi(k+1) = xi(k) + xi according to Equation (7.1.37). Show that if wij = wji, for ij = 1, 2, ..., n, the resulting change in the net's computational energy (Equation 7.1.24) can be expressed as

Next, show that E  0 if wii  0 for i = 1, 2, ..., n, with E = 0 only when an attractor is reached.

7.1.18 Consider the ten 7  5 pixel patterns shown in Figure P7.1.18. Generate a set of ten 35-dimensional bipolar column vectors from these patterns. Here, each vector is obtained by scanning its corresponding pattern starting from the top left corner and scanning each row from left to right (e.g., the vector corresponding to the pattern "0" would be x = [-1 -1 +1 +1 -1 -1 +1 -1 -1 +1 ...]T. Design a static nonlinear associative memory, having the input/output relation in Equation (7.1.5), to store the above patterns. Use the autoassociative version of the correlation recording recipe in Equation (7.1.6). Repeat using the autoassociative version of the projection recording recipe in Equation (7.1.12). Is the correlation-recorded associative memory capable of storing all ten patterns simultaneously? Compare thenoise correction capabilities of both designs. Here, test with noisy input vectors which are obtained by flipping each bit of an original pattern with a probability P. Experiment with P = 0.1, 0.2, 0.3, and 0.4. From these simulations, determine the least and the most "stable" pattern/fundamental memory for each of the two designs.





Figure P7.1.18. A set of ten memory patterns used in Problem 7.1.18.

* 7.2.1 Consider the correlation matrix in Equation (7.1.6) with vanishing diagonal and assume unbiased, independent random vectors yk = xk  {-1, +1}n, k = 1, 2, ..., m. Show that in the limit of large n with m < n, the extrema eigenvalues of this matrix are min =  and max = 1 + 2 (Geman, 1980; Crisanti and Sompolinsky, 1987; Le Cun et al., 1991b).

7.2.2 Show that the boundary between the spin glass and origin regions in the phase diagram of Figure 7.2.1 for the continuous-time zero-diagonal correlation-recorded DAM, in the limit of large n with m < n, is given by


(Hint: Perform local stability analysis about the equilibrium x* = 0 that defines the origin region. First, linearize the dynamical equation of the DAM at x* = 0 to get


Then, use the result of Problem 7.2.1 and the fact that x* = 0 is destabilized when at least one eigenvalue of the matrix has a positive real part to establish the desired transition condition on the stability of the origin).

7.2.3 Repeat Problem 7.2.2 for the discrete-time zero-diagonal correlation-recorded analog DAM. (Hint: the linearized dynamics at x* = 0 are given by


and the transition point between stability and instability of x* is when at least one eigenvalue of W has a magnitude of 1).

7.2.4 Show that the boundary of the oscillation region for the discrete-time zero-diagonal correlation-recorded analog DAM (refer to Figure 7.2.1 b) is given by .

* 7.2.5 Show that if the diagonal elements of the autocorrelation interconnection matrix W in the analog DAMs (both continuous-time and discrete-time versions) of Section 7.2 are retained (i.e., ), then the spin glass/origin regions boundary becomes


Plot and compare this boundary to the one in Figure 7.2.1 (a or b).

7.2.6 Consider a correlation-recorded DAM with analog activation f(x) = tanh(x) and pattern ratio . Calculate the reduction (in percent) in the number of local minima of the Liapunov function when is reduced from 100 to 10 for n = 100. Use the theoretical values of g(0.1, 100) and g(0.1, 10) given by 0.059 and 0.040, respectively. Repeat for n = 1,000.

* 7.2.7 Consider the autoassociative version of the projection-recorded interconnection matrix in Equation (7.1.12) and assume unbiased, independent random bipolar variables for the components of the n × m matrix X. Show that in the limit as n  , the expected value of a diagonal element wii of W approaches and that the fluctuations of wii around this value are small and are of order .

* 7.2.8 Self-coupling in a discrete-state projection-recorded DAM greatly reduces the basins of attraction of fundamental memories especially for large . Show that the basin of attraction of a fundamental memory vanishes when . (Hint: Equivalently, you may show that for if one starts from a key input which differs from a stored memory xk  {-1, +1}n by only one bit does not converge to xk. The results of Problem 7.2.7 should also be helpful).

7.2.9 Derive the analytical expressions for the phase diagram boundaries in Figure 7.2.3 for the parallel updated analog projection DAM with zero diagonal W.

7.2.10 Show that the parallel updated projection-recorded analog DAM of Section 7.2 has the phase diagram shown in Figure P7.2.10 when the diagonal elements of the W matrix are retained. (Hint: use the results of Problem 7.2.7)


Figure P7.2.10. Phase diagram for the projection-recorded analog DAM with preserved diagonal of W.

7.3.1 Consider the Hamming autoassociative memory shown in Figure P7.3.1. This memory consists of three cascaded modules; a max selector module sandwiched between two LAMs. Matrices W1 and W2 are given by:


where xk  {-1, +1}n, k = 1, 2, ..., m, are stored memory patterns. The first LAM takes the input key and produces the vector z = W1 which becomes the input to the max selector module. The max selector generates an output vector y  {0, 1}m whose ith component yi is given by:


Finally, the second LAM generates the output vector x = W2y. [Note: This associative memory is based on the Hamming net of Lippmann (1987). A more general dynamical associative memory where the max selector module is replaced by a nonlinear dynamical network that allows for the storage and retrieval of temporal and other complex memory trajectories is described in Baird (1990) and analyzed theoretically in Baird and Eeckman (1993)].

a. Show that if yi = 1, then the Hamming distance between the input and stored memory xi must be less than or equal to that between and any stored memory xk, k  i.


Figure P7.3.1. Hamming associative memory.

b. Show that for any input, the output x is either one of the stored memories or the sum of two or more stored memories.

c. What is the maximum number of distinct vectors xk  {-1, +1}n this memory can store? Does the memory have error correction capability when loaded with this maximum number of memories? Why?

†* d. Let the components of the X matrix assume uncorrelated random values of +1 or -1 equally likely, and assume large n. Use simulations to generate a plot relating to the radius of attraction , , of fundamental memories. Assume n = 500. Here, an input is within the radius of attraction of a stored memory xk if the Hamming DAM recovers xk from in one iteration.

e. Assume that the max selector module in Figure P7.3.1 is implemented using the winner-take-all net described in Section 3.4.1. Find the total number of interconnection weights required for the implementation of the Hamming autoassociative memory as a function of m and n.

f. Based on the above results, compare and contrast the performance and implementation efficiency of the Hamming associative memory and the DAMs of Section 7.2.

g. How would you modify the Hamming associative memory so that it becomes heteroassociative?

7.4.1 Consider a correlation-recorded discrete Hopfield net loaded with m random sparse binary vectors . The ith component of vector xk, , is set to 1 with probability an (or is set to zero with a probability 1 - an). Here, the parameter an is equal to , 0   < 1, and it controls the degree of sparcity of vector xk (when  = 0, each vector xk will be random with an equal expected number of 1's and 0's, and as increases the vector xk becomes increasingly sparse with approximately ones). It has been shown (Amari, 1989) that the absolute capacity of this sparsely encoded DAM is given by


Plot versus for n = 1000. Show that the capacity of the above DAM when loaded with non-sparse unipolar binary vectors is much worse than that of the same DAM loaded with non-sparse bipolar binary vectors [recall Equation (7.2.1)]. Also, show that for the sparse case with  > , the above sparsely encoded DAM has a substantial storage capacity where m grows faster than linear in n.

7.4.2 Consider a set of m unbiased and independent random vectors x  {-1, +1}n with large n. Find the range of values for which the BSB DAM in Equation (7.4.3) is stable assuming that the DAM's interconnection matrix W is formed by storing the vectors x according to:

a. Normalized correlation recording.

b. Normalized correlation recording with zero-diagonal W.

c. Projection recording.

d. Projection recording with zero-diagonal W.

7.4.3 Consider the discrete-time parallel updated linear dynamical system


with symmetric W. Find all equilibria states. Show that no finite equilibrium inside the hypercube is stable if either max > 0 or max < 0 with where min and max are the smallest and largest eigenvalues of W, respectively. Employ this result in a BSB DAM to show that all trajectories starting at a state inside the hypercube [-1, +1]n, other than the origin, flow towards the surface of the hypercube (note that this analysis says nothing about the convergence of the BSB DAM to a vertex or a point on the surface of the hypercube).

7.4.4 Consider a two-state BSB DAM with the following parameters:

; ;  = 0.3

Find all attractor states. Does this DAM have any attractors inside the hypercube [-1, +1]n? Why? Show that the points (0.81, 0.18), (0.42, -1), and (-1, -0.22) are unstable equilibrium points. Plot the trajectories x(k) for x(0) = [0.9  0.5]T and x(0) = [0.4  -0.25]T. Estimate (numerically) and plot the boundaries of the basins of attraction for all attractors found.

7.4.5 Consider a BSB DAM with the following parameters:

; ;  = 0.3

Show that W satisfies the inequality


and thus the four vertices of the square [-1, +1]2 are stable equilibria (attractors). Estimate and plot the basins of attraction boundaries for all vertices. Solve for the equilibrium state inside [-1, +1]2 and study its stability. Use the estimated basin boundaries to estimate the four unstable equilibria which lie on the edges of the square .

* 7.4.6 Show that if


in a BSB DAM with symmetric W,  = 1, and  = , then any vertex of the hypercube [-1, +1]n is a stable equilibrium. [Hint: employ the vertex stability criterion in Equation (7.4.6).]

7.4.7 Derive the absolute capacity given in Equation (7.4.20) for the correlation-recorded hysteretic DAM. (Hint: Follow the derivation in Section 7.2.1 for the capacity of a correlation-recorded DAM. Start from Equation (7.2.2) but account for the hysteresis term xi(0).

7.4.8 Show that in the limit of large n, a correlation-recorded DAM with preserved diagonal is capable of correcting n random errors through single-pass retrieval, when loaded with up to unbiased random bipolar binary memories for small values. Compare this result to the one in Equation (7.2.7).

* 7.4.9 Consider the dynamic version of the Hamming autoassociative memory of Problem 7.3.1. This Hamming DAM is shown in Figure P7.4.9. Note that this architecture is similar to that of the exponential DAM of Section 7.4.4 with the nonlinear transformation G taking the form of a "max selector". Here, F is the sgn nonlinearity.

a. Assume that the following three vectors are stored as memories in this Hamming DAM.


Compute the DAM state transitions which result from initializing the DAM with any one of the sixteen possible vectors x(0)  {-1, +1}4.

b. Does this Hamming DAM have exponential capacity? (The answer to this question can be deduced from the answers to Problem 7.3.1, part c and d.)

c. Study the stability of this DAM.

d. Is it possible for this DAM to have spurious memories?

e. Show that this DAM is equivalent to the exponential DAM of Section 7.4.4, if the activation function with a   is employed in the exponential DAM.


Figure P7.4.9. Hamming DAM.

7.4.10 A sufficient condition for the stability of the DAM dynamics in Equation (7.4.23) is that the activation function g() be continuous, monotonically nondecreasing over [-nn]. Verify the stability of this DAM if:

a.

b.

c.

d.

7.4.11 Consider the following vectors:


Employ the recording recipe in Equation (7.4.25), with appropriate modifications, to store the following transitions in the sequence generator DAM of Section 7.4.5:


Verify that the above dynamics are realized by the DAM (assume parallel state update). Is x3 an attractor? If yes, what is the maximum number of errors in x3 that can be tolerated? Does the cycle have a basin of attraction?

7.4.12 Consider the sequence generator DAM discussed in Section 7.4.5. Can this DAM retrieve sequences which intersect each other at a common intermediate state? Why?

7.4.13 Express the interconnection matrices W1 and W2 of a projection-recorded HDAM in terms of the matrices X = [x1  x2  ... xm ] and Y = [y1  y2  ... ym ] where {xkyk}, k = 1, 2, ..., m, is a set of associations to be stored. Assume and . What are the conditions on the vectors xk and yk for perfect storage.

7.4.14 Starting from the energy function , show that a parallel updated BAM always evolves its state x (also y) such that Energy E is monotonically non-increasing. Combine this result with the fact that E is bounded from below to deduce the stability of the BAM. Repeat the analysis by assuming a serial update mode.

7.4.15 Starting from the Liapunov function for a discrete Hopfield DAM and using the equivalence property between this DAM and a BAM which was pointed out in Section 7.4.6, show that is a Liapunov function for the BAM.

7.5.1 Derive the dynamical equations for the circuit in Figure 7.5.1 by applying Kirchoff's current law at nodes u0 and u1 (your answer should be similar to Equation (7.5.13), but with extra decay terms). Explain the discrepancy between your answer and Equation (7.5.13). Can you suggest circuit modifications which eliminate this discrepancy?

7.5.2 Extend the design of the two-bit A/D converter of Example 7.5.1 to an n-bit A/D converter. Show that an appropriate set of weights and inputs is given by

and

for i = 0, 1, 2, ..., n - 1

(Note: Proper operation of the above Hopfield net as an A/D converter requires the reinitialization of all states to zero every time a new analog input x is to be converted (Tank and Hopfield, 1986). This makes it impractical for the on-line conversion of time-varying analog inputs x(t).)

7.5.3 The traveling salesman problem is an example of a constrained combinatorial optimization problem. Here, you are given n cities (points) in a region of as well as the distances dij between them. The task is to find the minimum length closed tour that visits each city once and returns to its starting point. Hopfield and Tank (1985) suggested the use of a continuous Hopfield net for finding solutions to this problem. This net has n × n unipolar sigmoidal activation units where a unit at location ia has output xia = 1 if and only if city i is the ath stop on the tour. Figure P7.5.3 shows a possible net's activity pattern and the tour it corresponds to, for a four-city problem. Now, consider the following objective function


(a) (b)

Figure P7.5.3. Four-city traveling salesman problem. (a) Hopfield net activity pattern and (b) its corresponding tour.

Verify that J is an appropriate objective function for this problem (this can be done by identifying the cost term and verifying its validity and by verifying that all the necessary constraint satisfaction terms are accounted for). Write an expression for the evolution equation of unit xia assuming lossless amplifiers. Write an equation for the weight wiajb which connects unit xjb to unit xia, ijab = 1, 2, ..., n.

7.5.4 Use Liapunov's second method (refer to footnote 1) to show that the origin x = 0 is globally asymptotically stable for the following system:


(Hint: Assume a Liapunov function of the form V(x) = ).

7.5.5 Show that is a Liapunov function for the time-varying linear system


7.5.6 Consider the damped pendulum system


where the function f(x2) arises from a damping force depending on the angular velocity . We assume that f is continuous and that the damping force satisfies x2  f(x2) 0. Show that the function


obtained by adding the kinetic and potential energies of the system (assuming unit mass), is a Liapunov function for this system.

7.5.7 Study the stability of the linear system


using Liapunov's second method. A systematic way for finding an appropriate Liapunov function for the linear case is to set V(x) = xTQx where Q is the solution to the equation ATQ + QA = -I (I is the identity matrix).

* 7.5.8 Consider the discrete-time gradient system


with positive . Let x* be a local minimum of E. Show that is a candidate Liapunov function. Employ Liapunov's Theorem to show that the asymptotic stability of x* is guaranteed if for x  x*.

* 7.5.9 Consider the dynamical system described by


where is a positive constant and f is a sigmoid function. Show that when the weights {wij} are independent and identically distributed random variables with mean and n is very large, then the dynamical behavior of the above system can be approximated by the system of n decoupled equations


as t  . (Hint: Employ the "chaos hypothesis" in analogy to statistical mechanics (Rozonoer, 1969; Amari, 1972b; Geman, 1982) which states that when n is large, the system behaves as if the variables u1(t), ..., un(t) were independent of each other and of the random variables {wij}). Next, let ,  > 0, and establish a relation between , , and such that the above dynamical system is stable.