**6.1.1** Consider the RBF
network in Figure 6.1.1 and assume the Gaussian basis function
in Equation (6.1.3). Employ gradient descent-based minimization
of the instantaneous SSE criterion function [Equation (5.1.1)]
and derive update rules for the hidden unit receptive-field centers
*j* and widths *j*
and for the output layer weights *w**lj*.

**6.1.2** Repeat Problem 6.1.1
using the basis function form in Equation (6.1.4).

**6.1.3** Consider the Gaussian
bar network of Section 6.1.2. Derive expressions for updating
the hidden unit weights *w**ji*,
centers *ji*, and
widths *ji* employing
gradient descent-based minimization of the instantaneous SSE error
function. Assume an output layer of *L* linear units whose
weights are designated as *w**lj*.

**6.1.4** Show that for large
the one-dimensional Gaussian basis function
exhibits polynomial behavior. Also, show that by increasing ,
polynomial models of successively decreasing degree are obtained.
(Hint: Start by expanding *z*(*x*) using Taylor series
expansion at *x* = .) Based on these observations,
comment on the interpolation capability of an RBF network with
very large RBF widths.

* **6.1.5** Consider the problem
of approximating an unknown "smooth" function *f*(*x*)
by a network with input and output *y*(*x*),
given the training pairs {*x**i*,
*f*(*x**i*)},
*k* = 1, 2, ..., *m*. We desire
the approximating network to minimize the criterion function *J*
given by:

which represents the SSE criterion function with
regularization. Assume a regularization term that penalizes "nonsmooth"
approximations of *f*(*x*), given by:

Show that the best approximator network in the sense
of minimizing *J* is an *m*-hidden unit Gaussian RBF
network with the transfer characteristics

**6.1.6** Find the function
*y*(*x*) which minimizes the criterion functional

with respect to {*x**i*, *f*(*x**i*)};
*i* = 1, 2, ..., *m*.

**6.1.7** Employ an RBF net
with two Gaussian hidden units, parameterized by {1 = [0 0]T,
} and {, },
to solve the XOR problem. Assume a linear output unit with a
bias input of +1, and compute its three-component weight vector
using Equation (6.1.7). Generate a 3-dimensional plot for the
output of the RBF net versus the inputs *x*1
and *x*2, for *x*1, *x*2 [-1, 2].
Repeat with and ,
and study via simulation the effect of
on the RBF net's output.

† **6.1.8** Design a
Gaussian RBF net to "strictly" fit the data given in
Problem 5.2.13. (This data represents noise-free samples of the
function plotted in Figure 5.2.3.) Verify
your design by plotting the output of this net. Study, via simulation,
the effect of varying the hidden unit widths *j*
on the interpolation capability of the RBF net (assume *j* =
for all *j*). How does the RBF net compare, in terms of
the quality of interpolation and extrapolation, to the 12 hidden
unit feedforward neural net with backprop training whose output
is shown as the dashed line in Figure 5.2.3? Interested readers
may want to consider repeating this problem for the case of noisy
samples, as in Problem 5.2.14. In this case, "strict"
fitting should be avoided (why?).

**6.2.1** Illustrate geometrically
the receptive fields organization for a two-input CMAC with *c* = 4,
based on the organization scheme described in Section 6.2.

**6.2.2** Give qualitative
arguments in support of the ability of Rosenblatt's perceptron
to solve the parity problem if at least one of its hidden units
is allowed to "see" all inputs.

**6.3.1** Synthesize by inspection
an RCE classifier net which solves the two-class problem in Figure
P3.1.7 with the minimal number of hidden units. Draw a figure
which superimposes the regions of influence of these hidden units
on Figure P3.1.7.

† **6.3.2** Invoke the
RCE classifier training procedure (with *r*max = 0.5)
to solve the classification problem in Figure P5.1.5. Assume
that the training set consists of 1000 uniformly randomly generated
samples from the region defined by -1 < *x*1 < +1
and -1 < *x*2 < +1.
Plot the regions of influence of all hidden units associated
with class B. Repeat for class A. Give a sketch of the resulting
decision boundary.

**6.3.3** Show that, in the
worst case scenario, the PTC training procedure converges after
solving linear programs and after performing
*m* clustering procedures.

† **6.3.4** Implement
the PTC network using the direct Ho-Kashyap (DHK) algorithm (refer
to Section 3.1.4), to solve for hidden unit parameters, and using
*k*-means clustering for breaking up class clusters. Assume
that the samples from a given class are split into two clusters
at each stage, if splitting is needed. Verify your implementation
by classifying the two-class problem depicted in Figure P3.1.7.
Plot the resulting covers (hidden unit regions of influence).
Is the number of synthesized covers optimal? (Answer this question
by trying to construct an optimal set of covers by inspection.)

**6.3.5** Show that the PTC
network (also the RCE network) can be realized using a two layer
feedforward net of LTGs preceded by a preprocessing layer which
computes higher order terms.

**6.3.6** Derive a learning
rule for updating a candidate hidden unit weights in a single
output cascade-correlation net based on gradient ascent on the
criterion function given by Equation (6.3.6). Assume a sigmoidal
hidden unit with a hyperbolic tangent activation function. Also,
assume that the candidate hidden unit is the first unit to be
allocated.

**6.4.1** Cluster (using hand
calculations) the patterns in Figure P6.4.1 using an ART1 net
with = 0.6. Use unipolar binary encoding of the patterns
(e.g., **x**1 = [1 1 0 1 0 0]T).
Repeat the clustering with vigilance ,
also 0.9. For each case, draw the two-dimensional 2 3
patterns (as in the figure) of all generated cluster prototypes.
Make a subjective decision on which vigilance value is more appropriate
for this problem.

† **6.4.2** Generate
a set of twenty-four uniform random binary vectors **x***k* {0, 1}16,
*k* = 1, 2, ..., 24 (i.e., probability
of a '1' bit is equal to 0.5). Employ the ART1 net with = 0.7
to cluster this set. Generate a table which lists each generated
prototype vector and its associated input vectors. Use a two-dimensional
4 4 pattern representation for each vector in this
table. Does the number of clusters change with the permutation
of the training patterns? (Answer this question by repeating
the above simulation with the training patterns presented in reverse
order.)

† **6.4.3** Apply *k*-means
clustering to the random binary vectors generated in Problem 6.4.2.
Set *k* to the number of prototypes generated by the ART1
net in the first simulation of Problem 6.4.2. Compare the results
of *k*-means clustering to those of the ART1 net. (Note
that the cluster prototypes discovered by the *k*-means clustering
algorithm are real-valued. Also, it should be noted that the
clustering of the *k*-means algorithm is sensitive to the
initial selection of the *k* cluster centers).

**6.4.4** The power method
of Equation (6.4.9) diverges if the dominating eigenvector **c**max
of matrix **A** has a corresponding eigenvalue max > 1.
Suggest a simple way of modifying this method so that it converges
to the vector when started from a random
initial vector **c**0.
Verify your modified method with

**6.4.5** Based on the way
you would cluster the patterns in Figure 6.4.2 into 20 and 21
categories, compare and contrast the consistency of the clustering
results produced by the ART2 net in Figure 6.4.2a and the autoassociative
clustering net in Figure 6.4.3b.

† **6.4.6** Apply the
*k*-means clustering algorithm with *k* = 20
prototypes to cluster the 50 analog patterns of Figure 6.4.2.
Figure P6.4.6 gives a numerical representation for all fifty
patterns. Compare the results from this clustering method to
those in Figure 6.4.2 (a). Repeat with *k* = 21
and compare your results to those in Figure 6.4.3 (b).

Figure P6.4.6. Numerical representation of the fifty analog patterns
shown in Figure 6.4.2. Each row represents a distinct 25-dimensional
pattern. (Courtesy of Gail Carpenter and Steven Grossberg, Boston
University.)

† **6.4.7** Repeat Problem
6.4.6 using the incremental version of *k*-means clustering
(Equation 6.1.5) with = 0.05. Assume the same initial
setting of prototypes (cluster centers) as in Problem 6.4.6.
Which version of the algorithm gives better results? Why?

† **6.4.8** Compare the
autoassociative clustering net generated prototypes in Figure
6.4.3 (a) to their respective average patterns generated by averaging
all input patterns (vectors) belonging to a unique cluster. Use
the data in Figure P6.4.6 for computing these averages.

**6.4.9** One advantage of
the ART2 net over the autoassociative clustering network is the
ability of the former to learn continuously. Can you find other
advantages? What about advantages of the autoassociative clustering
net over the ART2 net?