2.2 Necessary Bounds on the Size of LTG Networks

In this section, we derive necessary lower bounds on the number of gates, k, in a network of LTG's for realizing any function : Rn {0, 1} of m points arbitrarily chosen in Rn. For the case of functions defined on m points in general position, similar bounds are also derived. Again, we consider the three network architectures of Section 2.1.2.

2.2.1 Two Layer Feedforward Networks

Case 1: m arbitrary points

Consider the function : Rn {0, 1} defined on m arbitrary points in Rn. Here, we show that a two layer feedforward net with less than LTG's in its hidden layer is not sufficient to realize such an arbitrary function (in the limit of large n). Recalling Equation (2.1.1) and requiring that F1(n, m, k) be larger than or equal to the total number of possible binary functions of m points, we get (assuming m  3n + 2 and n  2) (2.2.1)

We may solve for k as a function of m and n. Taking the logarithm (base 2) of Equation (2.2.1) and rearranging terms, we get (2.2.2)

By employing Stirling's approximation , we can write where a large n is assumed. Similarly, we employ the approximation . Now, Equation (2.2.2) reduces to (2.2.3)

As a special case, we may consider an arbitrary completely specified (m = 2n) Boolean function with . Here, Equation (2.2.3) with m = 2n gives (2.2.4)

which means that as n  , an infinitely large net is required. A limiting case of Equation (2.2.3) is for , which leads to the bound (2.2.5)

Case 2: m points in general position

Recalling Equation (1.3.5) and the discussion in Section 1.3.3 on the statistical capacity of a single LTG, one can determine an upper bound, FGP, on the number of possible functions (dichotomies) on m points in general position in Rn, for a two layer feedforward net with k hidden units as (2.2.6)

for m 2n and n . The first term in the right hand side of Equation (2.2.6) represents the total number of functions realizable by the k hidden layer LTG's, where each LTG is capable of realizing 22(n+1) functions. On the other hand, the term in Equation (2.2.6) represents an upper bound on the number of functions realized by the output LTG. It assumes that the hidden layer transforms the m points in general position in Rn to points which are not necessarily in general position in the hidden space, Rk.

For the net to be able to realize any one of the above functions, its hidden layer size k must satisfy (2.2.7)

or (2.2.8)

This bound is tight for m close to 2n, it gives k = 1, which is the optimal net (recall, that any dichotomy of m points in general position in Rn has a probability approaching one to be realized by a single LTG as long as m  2n for large n). Note that when there is only a single LTG in the hidden layer, the output LTG becomes redundant.

Equation (2.2.8) agrees with the early experimental results reported by Widrow and Angell (1962). Also, it is interesting to note that in the limit as , Equation (2.2.8) gives a lower bound on k, which is equal to one half the number of hidden LTG's of the optimal net reported by Baum (1988). It is important to note that the bounds on k derived using the above approach are relatively tighter for the m points in the general position case than for the case of m points in arbitrary position. This is because, we are using the actual statistical capacity of an LTG for the general position case, as opposed to the upper bound on the number of dichotomies being used for the arbitrary position case. This observation is also valid for the bounds derived in the remainder of this section.

2.2.2 Three Layer Feedforward Networks

Case 1: m arbitrary points

Consider a two hidden layer net having (k is even) LTG's in each of its hidden layers and a single LTG in its output layer. From Equation (2.1.3) the net is capable of realizing at most arbitrary functions. Following earlier derivations, we can bound the necessary net size by (2.2.9)

for m 3n and n . Next, taking the logarithm of both sides of Equation (2.2.9) and employing Sterling's approximation gives (2.2.10)

which can be solved for k as (2.2.11)

Assuming that m >> n2, Equation (2.2.11) can be approximated as (2.2.12)

Case 2: m points in general position

Here, we start from the relation (2.2.13)

which assumes that k is large and that the two hidden layer mappings preserve the general position property of the m points. Now, in the limit of and n , Equation (2.2.13) can be solved for k to give (2.2.14)

2.2.3 Generally Interconnected Networks with no Feedback

It is left as an exercise for the reader to verify that the necessary size of an arbitrarily fully interconnected network of LTG's (with no feedback) for realizing any function : Rn {0, 1} defined on m arbitrary points is given by (see Problem 2.2.3) (2.2.15)

and for points in general position (2.2.16)

It is of interest to compare Equations (2.2.12) and (2.2.14) with Equations (2.2.15) and (2.2.16), respectively. Note that for these two seemingly different network architectures, the number of necessary LTG's for realizing arbitrary functions is of the same order. This agrees with the results of Baum (1988), who showed that these bounds are of the same order for any layered feedforward net with two or more hidden layers and the same number of units, irrespective of the number of layers used. This suggests that if one were able to compute an arbitrary function using a two hidden layer net with only units, there would not be much to gain, in terms of random or arbitrary function realization capability, by using more than two hidden layers! Also, comparing Equations (2.2.3), (2.2.12) and (2.2.15) [or Equations (2.2.8), (2.2.14), and (2.2.16)] shows that when the size of the training set is much larger than the dimension of the input exemplars, then networks with two or more hidden layers may require substantially fewer units than networks with a single hidden layer.

In practice, the actual points (patterns) that we want to discriminate between are not arbitrary or random; rather, they are likely to have natural regularities and redundancies. This may make them easier to realize with networks having substantially smaller size than these enumerational statistics would indicate. For convenience, the results of this section are summarized in Table 2.2.1.

 NETWORK ARCHITECTURE LOWER BOUNDS ON THE SIZE OF AN LTG NET ARBITRARY POINTS POINTS IN GENERAL POSITION One hidden layer feedforward net with k hidden units    Two hidden layer feedforward net with units in each layer    Generally interconnected net with k units (no feedback)    Table 2.2.1. Lower bounds on the size of a net of LTG's for realizing any function of m points, f : Rn {0, 1}, in the limit of large n.

Goto [2.0][2.1] [2.3] [2.4] [2.5] Back to the Table of Contents Back to Main Menu