Problems

6.1.1 Consider the RBF network in Figure 6.1.1 and assume the Gaussian basis function in Equation (6.1.3). Employ gradient descent-based minimization of the instantaneous SSE criterion function [Equation (5.1.1)] and derive update rules for the hidden unit receptive-field centers j and widths j and for the output layer weights wlj.

6.1.2 Repeat Problem 6.1.1 using the basis function form in Equation (6.1.4).

6.1.3 Consider the Gaussian bar network of Section 6.1.2. Derive expressions for updating the hidden unit weights wji, centers ji, and widths ji employing gradient descent-based minimization of the instantaneous SSE error function. Assume an output layer of L linear units whose weights are designated as wlj.

6.1.4 Show that for large the one-dimensional Gaussian basis function exhibits polynomial behavior. Also, show that by increasing , polynomial models of successively decreasing degree are obtained. (Hint: Start by expanding z(x) using Taylor series expansion at x = .) Based on these observations, comment on the interpolation capability of an RBF network with very large RBF widths.

* 6.1.5 Consider the problem of approximating an unknown "smooth" function f(x) by a network with input and output y(x), given the training pairs {xi, f(xi)}, k = 1, 2, ..., m. We desire the approximating network to minimize the criterion function J given by:


which represents the SSE criterion function with regularization. Assume a regularization term that penalizes "nonsmooth" approximations of f(x), given by:


Show that the best approximator network in the sense of minimizing J is an m-hidden unit Gaussian RBF network with the transfer characteristics


6.1.6 Find the function y(x) which minimizes the criterion functional


with respect to {xif(xi)}; i = 1, 2, ..., m.

6.1.7 Employ an RBF net with two Gaussian hidden units, parameterized by {1 = [0 0]T, } and {, }, to solve the XOR problem. Assume a linear output unit with a bias input of +1, and compute its three-component weight vector using Equation (6.1.7). Generate a 3-dimensional plot for the output of the RBF net versus the inputs x1 and x2, for x1x2  [-1, 2]. Repeat with and , and study via simulation the effect of on the RBF net's output.

6.1.8 Design a Gaussian RBF net to "strictly" fit the data given in Problem 5.2.13. (This data represents noise-free samples of the function plotted in Figure 5.2.3.) Verify your design by plotting the output of this net. Study, via simulation, the effect of varying the hidden unit widths j on the interpolation capability of the RBF net (assume j =  for all j). How does the RBF net compare, in terms of the quality of interpolation and extrapolation, to the 12 hidden unit feedforward neural net with backprop training whose output is shown as the dashed line in Figure 5.2.3? Interested readers may want to consider repeating this problem for the case of noisy samples, as in Problem 5.2.14. In this case, "strict" fitting should be avoided (why?).

6.2.1 Illustrate geometrically the receptive fields organization for a two-input CMAC with c = 4, based on the organization scheme described in Section 6.2.

6.2.2 Give qualitative arguments in support of the ability of Rosenblatt's perceptron to solve the parity problem if at least one of its hidden units is allowed to "see" all inputs.

6.3.1 Synthesize by inspection an RCE classifier net which solves the two-class problem in Figure P3.1.7 with the minimal number of hidden units. Draw a figure which superimposes the regions of influence of these hidden units on Figure P3.1.7.

6.3.2 Invoke the RCE classifier training procedure (with rmax = 0.5) to solve the classification problem in Figure P5.1.5. Assume that the training set consists of 1000 uniformly randomly generated samples from the region defined by -1 < x1 < +1 and -1 < x2 < +1. Plot the regions of influence of all hidden units associated with class B. Repeat for class A. Give a sketch of the resulting decision boundary.

6.3.3 Show that, in the worst case scenario, the PTC training procedure converges after solving linear programs and after performing m clustering procedures.

6.3.4 Implement the PTC network using the direct Ho-Kashyap (DHK) algorithm (refer to Section 3.1.4), to solve for hidden unit parameters, and using k-means clustering for breaking up class clusters. Assume that the samples from a given class are split into two clusters at each stage, if splitting is needed. Verify your implementation by classifying the two-class problem depicted in Figure P3.1.7. Plot the resulting covers (hidden unit regions of influence). Is the number of synthesized covers optimal? (Answer this question by trying to construct an optimal set of covers by inspection.)

6.3.5 Show that the PTC network (also the RCE network) can be realized using a two layer feedforward net of LTGs preceded by a preprocessing layer which computes higher order terms.

6.3.6 Derive a learning rule for updating a candidate hidden unit weights in a single output cascade-correlation net based on gradient ascent on the criterion function given by Equation (6.3.6). Assume a sigmoidal hidden unit with a hyperbolic tangent activation function. Also, assume that the candidate hidden unit is the first unit to be allocated.

6.4.1 Cluster (using hand calculations) the patterns in Figure P6.4.1 using an ART1 net with  = 0.6. Use unipolar binary encoding of the patterns (e.g., x1 = [1 1 0 1 0 0]T). Repeat the clustering with vigilance , also 0.9. For each case, draw the two-dimensional 2  3 patterns (as in the figure) of all generated cluster prototypes. Make a subjective decision on which vigilance value is more appropriate for this problem.


Figure P6.4.1. Training patterns for the ART1 net in Problem 6.4.1.

6.4.2 Generate a set of twenty-four uniform random binary vectors xk  {0, 1}16, k = 1, 2, ..., 24 (i.e., probability of a '1' bit is equal to 0.5). Employ the ART1 net with  = 0.7 to cluster this set. Generate a table which lists each generated prototype vector and its associated input vectors. Use a two-dimensional 4  4 pattern representation for each vector in this table. Does the number of clusters change with the permutation of the training patterns? (Answer this question by repeating the above simulation with the training patterns presented in reverse order.)

6.4.3 Apply k-means clustering to the random binary vectors generated in Problem 6.4.2. Set k to the number of prototypes generated by the ART1 net in the first simulation of Problem 6.4.2. Compare the results of k-means clustering to those of the ART1 net. (Note that the cluster prototypes discovered by the k-means clustering algorithm are real-valued. Also, it should be noted that the clustering of the k-means algorithm is sensitive to the initial selection of the k cluster centers).

6.4.4 The power method of Equation (6.4.9) diverges if the dominating eigenvector cmax of matrix A has a corresponding eigenvalue max > 1. Suggest a simple way of modifying this method so that it converges to the vector when started from a random initial vector c0. Verify your modified method with


6.4.5 Based on the way you would cluster the patterns in Figure 6.4.2 into 20 and 21 categories, compare and contrast the consistency of the clustering results produced by the ART2 net in Figure 6.4.2a and the autoassociative clustering net in Figure 6.4.3b.

6.4.6 Apply the k-means clustering algorithm with k = 20 prototypes to cluster the 50 analog patterns of Figure 6.4.2. Figure P6.4.6 gives a numerical representation for all fifty patterns. Compare the results from this clustering method to those in Figure 6.4.2 (a). Repeat with k = 21 and compare your results to those in Figure 6.4.3 (b).


10
80
70
60
50
40
30
20
10
20
30
40
50
60
70
80
90
80
70
60
50
40
30
20
10
10
80
74
67
59
50
40
29
17
29
40
50
59
67
74
80
85
80
74
67
59
50
40
29
17
10
80
76
70
62
52
40
26
10
26
40
50
62
70
76
80
82
80
76
70
62
50
40
26
10
10
62
85
92
75
82
74
62
64
74
89
74
63
24
35
22
22
24
39
36
25
28
23
22
23
10
50
50
50
50
50
50
50
50
50
50
50
50
10
10
10
10
10
10
10
10
10
10
10
10
10
80
75
70
65
60
55
50
45
40
35
30
25
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
65
60
55
50
45
40
35
30
25
10
10
10
10
10
10
10
10
10
10
10
10
10
10
90
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
50
90
50
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
20
20
20
20
20
20
20
20
20
20
20
20
10
10
10
10
10
10
10
10
10
10
10
10
10
20
30
39
47
54
60
60
54
47
39
30
20
10
10
10
10
10
10
10
10
10
10
10
10
10
60
54
47
39
30
20
20
30
39
47
54
60
10
10
10
10
10
10
10
10
10
10
10
10
10
70
60
50
40
30
20
20
30
40
50
60
10
10
10
10
10
10
10
10
10
10
10
10
10
10
70
60
50
40
30
20
20
30
40
50
60
15
8
16
18
12
8
30
60
90
60
30
4
2
10
0
0
0
0
0
0
0
0
30
30
30
30
30
30
30
30
50
50
50
50
50
50
50
50
10
0
0
0
0
0
0
0
0
50
50
50
50
50
50
50
50
90
90
90
90
90
90
90
90
10
10
5
14
18
8
16
10
5
45
60
52
46
58
59
42
49
89
95
83
92
90
87
89
95
10
30
50
30
18
8
16
10
5
45
60
52
46
58
59
42
49
89
95
83
92
90
87
89
95
10
60
60
60
60
60
60
60
60
80
80
80
80
80
80
80
80
90
90
90
90
90
90
90
90
10
65
80
80
65
60
60
60
60
80
80
80
80
80
80
80
80
90
90
90
90
90
90
90
90
10
51
55
53
48
60
90
60
90
60
54
53
51
56
55
54
55
52
51
54
55
52
55
53
53
10
10
10
10
10
80
90
80
90
80
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
80
90
80
90
80
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
80
90
30
90
80
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
30
50
30
50
70
50
70
90
70
50
70
50
30
50
30
10
13
11
14
12
13
11
10
10
10
20
30
40
50
60
70
80
90
80
70
60
50
40
30
20
10
10
10
10
10
10
10
10
10
20
20
20
50
50
50
80
80
80
80
50
50
50
20
20
20
10
10
10
10
10
10
10
10
10
10
30
50
30
50
70
50
70
90
70
50
70
50
30
50
30
10
3
15
8
2
12
2
14
10
10
24
28
32
36
40
44
48
52
48
44
40
36
32
28
24
10
10
10
10
10
10
10
10
10
10
50
90
50
18
20
22
24
26
24
22
20
18
16
14
12
10
10
10
10
10
10
10
10
10
10
50
90
50
18
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
10
20
80
95
52
8
40
28
39
36
3
22
25
39
10
3
6
2
40
36
30
20
25
4
6
10
10
30
65
85
99
85
65
30
10
5
10
12
7
6
10
12
9
10
11
30
30
30
30
30
10
10
60
35
80
95
78
69
32
5
14
18
3
28
59
20
30
6
4
60
55
30
35
30
55
10
10
30
68
90
99
85
62
30
10
10
10
12
7
6
10
12
9
10
11
35
30
33
30
30
10
10
80
68
90
99
85
62
30
5
15
20
4
13
6
10
12
9
10
11
35
30
33
30
30
10
10
80
68
90
99
85
62
40
5
15
20
4
13
6
10
12
9
10
11
60
50
30
40
40
10
10
40
68
90
99
85
62
40
10
10
10
12
10
10
10
10
10
10
10
10
10
10
10
10
10
50
60
70
60
80
60
80
60
80
40
60
40
60
40
20
40
20
40
20
40
20
10
20
10
10
40
50
60
70
80
80
80
80
80
60
40
40
40
40
20
20
20
20
20
20
10
10
10
10
10
20
25
30
35
40
40
40
40
40
30
20
20
20
20
10
10
10
10
10
10
5
5
5
5
10
30
25
32
28
45
40
46
35
42
35
15
24
25
20
15
13
9
8
2
18
10
5
2
8
10
20
20
20
20
20
20
20
20
20
90
90
90
90
90
90
90
90
90
90
90
90
90
90
90
10
20
10
35
22
24
40
33
18
23
85
96
90
88
95
99
99
82
80
86
89
93
85
88
93
10
3
3
3
3
3
3
3
3
3
40
40
40
40
40
40
40
40
40
40
40
40
40
40
40
10
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
5
5
5
5
5
10
5
5
5
5
5
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
10
7
5
7
8
2
10
14
12
21
38
30
30
38
48
45
54
62
67
75
75
85
82
91
95
10
8
4
10
3
5
4
13
16
21
24
30
38
41
45
53
56
59
63
70
74
82
86
91
96
10
10
3
5
4
13
16
21
24
30
38
41
45
53
56
59
63
70
74
82
86
91
96
95
98

Figure P6.4.6. Numerical representation of the fifty analog patterns shown in Figure 6.4.2. Each row represents a distinct 25-dimensional pattern. (Courtesy of Gail Carpenter and Steven Grossberg, Boston University.)

6.4.7 Repeat Problem 6.4.6 using the incremental version of k-means clustering (Equation 6.1.5) with  = 0.05. Assume the same initial setting of prototypes (cluster centers) as in Problem 6.4.6. Which version of the algorithm gives better results? Why?

6.4.8 Compare the autoassociative clustering net generated prototypes in Figure 6.4.3 (a) to their respective average patterns generated by averaging all input patterns (vectors) belonging to a unique cluster. Use the data in Figure P6.4.6 for computing these averages.

6.4.9 One advantage of the ART2 net over the autoassociative clustering network is the ability of the former to learn continuously. Can you find other advantages? What about advantages of the autoassociative clustering net over the ART2 net?