1.7 Summary

This chapter introduces the LTG as the basic computational unit in a binary state neural network, and analyzes its computational capabilities in realizing binary mappings. The LTG is then extended to the more powerful PTG. A single LTG can only realize threshold (linearly separable) functions. On the other hand, a PTG with order is capable of realizing any Boolean function of n inputs. The impressive computational power of a PTG comes at the expense of increased implementation complexity.

The notion of a separating surface is introduced and it is used to describe the operation of a threshold gate as a two-class classifier. It is found that a single LTG is characterized by a linear separating surface (hyperplane). On the other hand, a single PTG is capable of realizing a highly-flexible nonlinear (polynomial) separating surface. The flexibility of a PTG's separating surface is due to a dimensionality expanding polynomial mapping (-mapping) which is realized by a preprocessing layer intrinsic to the PTG.

A fundamental theorem, known as the Function Counting Theorem, is proved and is used to derive the following important properties of threshold gates. First, the statistical capacity of a single LTG, assuming data in general position in Rn, is equal to twice the number of its degrees of freedom (weights). This result is also true for a PTG. Second, a threshold gate with d-degrees of freedom trained to correctly classify m patterns will, on the average, respond ambiguously to a new input pattern as long as . In this context, it is found that the probability of generalization error approaches zero asymptotically as . Finally, in the limit as , we find that the average number of patterns needed for complete characterization of a given training set is only twice the number of degrees of freedom d of a threshold gate, assuming the separability of this training set by the threshold gate.

1.1.1 Verify that the patterns in Figure 1.1.5 are admissible threshold patterns.

1.1.2 Derive the formulas given in Table 1.1.2 for degrees of freedom for real and binary input QTG's.

1.1.3 Prove that a PTG(r) with n binary inputs has degrees of freedom, not including threshold.

1.1.4 Derive the closed form expression for the number of degrees of freedom for a PTG (r) given in Equation (1.1.8).

1.2.1 Show that . Also, show that .

1.2.2 a. Prove algebraically that , for 1  i  n.

b. Prove by induction, using the recursion relation in part (a), the Binomial Theorem: where and are real numbers and n is an integer.

c. Use the Binomial Theorem to prove that .

1.2.3 Verify the inequality in Equation (1.2.5) by showing that for n  2.

1.2.4 Plot the function for a = 2 and 3. Now, let x3 = x1x2. Plot g(x1x2x3) = 0. Show that the four patterns of the XNOR function of Example 1.2.1 are properly separated by g.

* 1.3.1 Prove that , and that . Also, show that , if m = 2(n + 1).

1.3.2 Assume n < m d = , is it true that m points in general position in Rn are mapped by the -mapping of a PTG(r) into points in general position in Rd1 space (i.e., -general position)? Why? Can this PTG map m arbitrary points in Rn into m points in -general position? Explain.

* 1.3.3 Given m arbitrary points in Rn space. Show that for m 3n+2 and n 2, the number of dichotomies of the m points which are realizable with a single LTG is bounded from above by . (Hint: use Equation (1.3.7) adapted for the LTG case, with d  1 = n). Note that this bound reduces to for m = 2n, which represents a tighter upper bound than that of Equation (1.2.5). Use this result to show that 1.3.4 Consider a mapping f : R  R, defined as the set of input/output pairs {xi, yi}, i = 1, 2, ..., m. Assume that xi  xj for all i  j. Show that this mapping can be exactly realized by a polynomial of order r = m  1 (this is referred to as "strict" interpolation). Hint: The determinant is the Vandermonde's Determinant, which is nonzero if zi  zj, for all i  j = 1, 2, ..., m.

* 1.3.5 Consider the mapping from R1 to {0, 1} defined by the input/output pairs (0, 0), (1, 1), (2, 0), (3, 0), and (4, 1). Use the result of Problem 1.3.4 to synthesize a polynomial g(x) of minimal order which realizes the above mapping. Plot g(x) to verify your solution. Next, assume that a PTG(r) is used to realize the same mapping. Is it possible for the order r of this PTG to be smaller than that of the polynomial g(x)? If the answer is yes, then synthesize the appropriate weights for this PTG (assume a zero threshold) and plot the weighted-sum (i.e., the PTG output without thresholding) versus x.

1.3.6 Derive Equation (1.3.4) from the recursion relation .

1.3.7 Let y(x) = a + bx, where a, b R. Find the parameters a and b which minimize:

a. b. 1.3.8 Find the polynomial y(x) = ax + bx3, which approximates the function g(x) = sin x over [0, ]. By minimizing . Compare graphically, the functions g(x), its approximation y(x), and its power series approximation g(x)  x for and .

1.4.1 Find the solution vector w in Example 1.4.1 if b = [1 1 1 1 1 3]T . Use the K-map technique to show that the Boolean function in Table 1.4.1 is not a threshold function. Does a b > 0 exist which leads to a solution vector w with four or fewer nonzero components?

1.4.2 Use the method of Section 1.4 to synthesize a PTG for realizing the Boolean function . Use a margin vector b which results in the minimal number of non-zero PTG weights.

* 1.5.1 Derive Equation (1.5.2). See Cover (1965) for hints.

1.5.2 Plot the probability of ambiguous response, A(md), given in Equation (1.5.1) versus , for d = 2, 10, and 20.

1.5.3 Derive Equation (1.5.3).

* 1.6.1 Derive Equation (1.6.3), starting from Equation (1.6.2).
Goto [
1.0][1.1] [1.2] [1.3] [1.4] [1.5] [1.6] Back to the Table of Contents Back to Main Menu