2.2 Necessary Bounds on the Size of LTG Networks
In this section, we derive necessary lower bounds on the
number of gates, k, in a network of LTG's for realizing any function
f : Rn {0,
1} of m points arbitrarily chosen in Rn.
For the case of functions defined on m points in general position,
similar bounds are also derived. Again, we consider the three network architectures
of Section 2.1.2.
2.2.1 Two Layer Feedforward Networks
Case 1: m arbitrary points
Consider the function f : Rn
{0, 1} defined on m arbitrary points in Rn.
Here, we show that a two layer feedforward net with less than LTG's
in its hidden layer is not sufficient to realize such an arbitrary function
(in the limit of large n). Recalling Equation (2.1.1) and requiring
that F1(n, m, k)
be larger than or equal to the total number of possible binary functions
of m points, we get (assuming m 3n + 2
and n 2)
(2.2.1)
We may solve for k as a function of m and
n. Taking the logarithm (base 2) of Equation (2.2.1) and rearranging
terms, we get
(2.2.2)
By employing Stirling's approximation , we can write
where a large n is assumed. Similarly, we employ
the approximation . Now, Equation
(2.2.2) reduces to
(2.2.3)
with and .
Equation (2.2.3) gives a necessary condition on the size of the net for
realizing any function of m arbitrary ndimensional points.
It is interesting to note the usefulness of this bound by comparing it
to the sufficient net constructed by Baum (1988), which requires hidden
LTG's.
As a special case, we may consider an arbitrary completely
specified (m = 2n)
Boolean function with . Here,
Equation (2.2.3) with m = 2n
gives
(2.2.4)
which means that as n , an infinitely
large net is required. A limiting case of Equation (2.2.3) is for ,
which leads to the bound
(2.2.5)
Case 2: m points in general position
Recalling Equation (1.3.5) and the discussion in Section
1.3.3 on the statistical capacity of a single LTG, one can determine an
upper bound, FGP,
on the number of possible functions (dichotomies) on m points in
general position in Rn,
for a two layer feedforward net with k hidden units as
(2.2.6)
for m 2n and n . The first term in
the right hand side of Equation (2.2.6) represents the total number of
functions realizable by the k hidden layer LTG's, where each LTG
is capable of realizing 22(n+1)
functions. On the other hand, the term in
Equation (2.2.6) represents an upper bound on the number of functions realized
by the output LTG. It assumes that the hidden layer transforms the m
points in general position in Rn
to points which are not necessarily in general position in the hidden space,
Rk.
For the net to be able to realize any one of the above
functions, its hidden layer size k must satisfy
(2.2.7)
or
(2.2.8)
This bound is tight for m close to 2n, it
gives k = 1, which is the optimal net (recall, that any dichotomy
of m points in general position in Rn
has a probability approaching one to be realized by a single LTG as long
as m 2n for large n). Note that when there
is only a single LTG in the hidden layer, the output LTG becomes redundant.
Equation (2.2.8) agrees with the early experimental results
reported by Widrow and Angell (1962). Also, it is interesting to note that
in the limit as , Equation (2.2.8)
gives a lower bound on k, which is equal to one half the number
of hidden LTG's of the optimal net reported by Baum (1988). It is important
to note that the bounds on k derived using the above approach are
relatively tighter for the m points in the general position case
than for the case of m points in arbitrary position. This is because,
we are using the actual statistical capacity of an LTG for the general
position case, as opposed to the upper bound on the number of dichotomies
being used for the arbitrary position case. This observation is also valid
for the bounds derived in the remainder of this section.
2.2.2 Three Layer Feedforward Networks
Case 1: m arbitrary points
Consider a two hidden layer net having (k is even) LTG's in each of its hidden layers and a single LTG in its output layer. From Equation (2.1.3) the net is capable of realizing at most
arbitrary functions. Following earlier derivations, we
can bound the necessary net size by
(2.2.9)
for m 3n and n . Next, taking the
logarithm of both sides of Equation (2.2.9) and employing Sterling's approximation
gives
(2.2.10)
which can be solved for k as
(2.2.11)
Assuming that m >> n2,
Equation (2.2.11) can be approximated as
(2.2.12)
For the special case of arbitrary Boolean functions with
m = 2n,
Equation (2.2.12) gives a lower bound of hidden
LTG's. An upper bound of on
the number of hidden LTG's was reported by Muroga (1959).
Case 2: m points in general position
Here, we start from the relation
(2.2.13)
which assumes that k is large and that the two
hidden layer mappings preserve the general position property of the m
points. Now, in the limit of and
n , Equation (2.2.13) can be solved for k to give
(2.2.14)
2.2.3 Generally Interconnected Networks with no Feedback
It is left as an exercise for the reader to verify that
the necessary size of an arbitrarily fully interconnected network of LTG's
(with no feedback) for realizing any function f : Rn
{0, 1} defined on m arbitrary points is given by (see Problem 2.2.3)
(2.2.15)
and for points in general position
(2.2.16)
It is of interest to compare Equations (2.2.12) and (2.2.14)
with Equations (2.2.15) and (2.2.16), respectively. Note that for these
two seemingly different network architectures, the number of necessary
LTG's for realizing arbitrary functions is of the same order. This agrees
with the results of Baum (1988), who showed that these bounds are of the
same order for any layered feedforward net with two or more hidden layers
and the same number of units, irrespective of the number of layers used.
This suggests that if one were able to compute an arbitrary function using
a two hidden layer net with only units,
there would not be much to gain, in terms of random or arbitrary function
realization capability, by using more than two hidden layers! Also, comparing
Equations (2.2.3), (2.2.12) and (2.2.15) [or Equations (2.2.8), (2.2.14),
and (2.2.16)] shows that when the size of the training set is much larger
than the dimension of the input exemplars, then networks with two or more
hidden layers may require substantially fewer units than networks with
a single hidden layer.
In practice, the actual points (patterns) that we want
to discriminate between are not arbitrary or random; rather, they are likely
to have natural regularities and redundancies. This may make them easier
to realize with networks having substantially smaller size than these enumerational
statistics would indicate. For convenience, the results of this section
are summarized in Table 2.2.1.
NETWORK ARCHITECTURE 
LOWER BOUNDS ON THE SIZE OF AN LTG NET 

ARBITRARY POINTS 
POINTS IN GENERAL POSITION 










Table 2.2.1. Lower bounds on the size of a net of LTG's for realizing
any function of m points, f : Rn
{0, 1}, in the limit of large n.