In this section, we try to answer the following
fundamental question: Given *m* points in {0, 1}*n*,
how many dichotomies of these *m* points are realizable with
a single PTG(*r*), for *r* = 1, 2, ..., *n*? We
are also interested in answering the question for the more general
case of *m* points in R*n*.
But first, we present a classical theorem on the capability of
polynomials as approximators of continuous functions.

**1.3.1 Weierstrass' Approximation Theorem**

The proof of this theorem can be found in Apostol
(1957).

Theorem 1.3.1 is described by the statement: every
continuous function can be "uniformly approximated"
by a polynomial. Note that the order of the polynomial depends
on the function being approximated and the desired accuracy of
the approximation. Weierstrass' Approximation Theorem also applies
for the case of a continuous multivariate function *g* which
maps a compact set R*n*
to a compact set *Y* R (Narendra and Parthasarathy, 1990).
Thus, a single PTG of unrestricted order *r* which employs
no thresholding is a universal approximator for continuous functions
*g*:* Y*.

Theorem 1.3.1 can also be extended to non-polynomial
functions. Let {1(**x**)*,
, **d*(**x**)}
be a complete orthonormal set, then any *g* satisfying the
requirements of Theorem 1.3.1 can be approximated by a function

(1.3.2)

where the *w**j's
*are real constants.

Polynomials may also be used to approximate binary-valued
functions defined on a finite set of points. Examples of such
functions are Boolean functions and functions of the form *g*: *S* {0, 1},
where *S* is a finite set of arbitrary points in R*n*.
A PTG differs from a polynomial in that it has an intrinsic quantifier
(threshold nonlinearity) for the output state. This gives the
PTG the natural capability for realizing Boolean functions and
complex dichotomies of arbitrary points in R*n*.
The remainder of this section develops some theoretical results
needed to answer the questions raised at the beginning of this
section.

**1.3.2 Points in General Position**

**1.3.3 Function Counting Theorem**

**Theorem 1.3.2 (Function Counting Theorem;
****Cover, 1965): ** The number of linearly
separable dichotomies of m points in general position in Euclidean
d-space is

* *(1.3.4)

*The general position requirement on the m points
is a necessary and sufficient condition.
*

Proof of Theorem 1.3.2:

Consider a set of *m* points, *X* = {**x**1, **x**2, ..., **x***m*},
in general position in R*d*.
Let C(*m*, *d*) be the number of linearly separable
dichotomies {*X*+, *X}
of X*. Here, *X*+
(*X) is a subset of X* consisting of all points which lie
above (below) the separating hyperplane.

Consider a new point, **x***m*+1,
such that the set of *m* + 1 points, *X'* = {**x**1, **x**2, ..., **x***m*+1},
is in general position. Now, some of the linear dichotomies of
the set *X* can be achieved by hyperplanes which pass through
**x***m*+1.
Let the number of such dichotomies be *D*. For each of
these *D* linear dichotomies there will be two new dichotomies,
and . This is
because when the points are in general position any hyperplane
through **x***m*+1
that realizes the dichotomy {*X*+, *X}
can be shifted infinitesimally to allow arbitrary classification
of ***x***m*+1
without affecting the separation of the dichotomy {*X*+, *X}.
For the remaining C(m*, *d*) *D*
dichotomies, either or
must be separable. Therefore, there will be one new linear dichotomy
for each old one. Thus, the number of linear dichotomies of *X'*,
C(*m* + 1, *d*), is given by

Again, *D* is the number of linear dichotomies
of *X* that could have had the dividing hyperplane drawn
through **x***m*+1.
But this number is simply C(*m*, *d* 1),
because constraining the hyperplane to go through a particular
point **x***m*+1
makes the problem effectively *d* 1 dimensional.
This observation allows us to obtain the recursion relation

The repeated iteration of the above relation for
*m*, *m* 1, *m* 2, ..., 1
yields

from which the theorem follows immediately on noting
that C(1, *N*) = 2 (one point can be linearly
separated into one category or the other) and
for *m* *d*+1.

Theorem 1.3.2 may now be employed to study the ability
of a single LTG to separate *m* points in general position
in R*n*. Since
the total number of possible dichotomies in R*n*
is 2*m*, the probability
of a single *n*-input LTG to separate *m* points in
general position (assuming equal probability for the 2*m*
dichotomies) is

(1.3.5)

Equation (1.3.5) is plotted in Figure 1.3.3. Note
that if *m* < 2(*n* + 1), then as *n*
approaches infinity, *P**LS*
approaches one; i.e., the LTG almost always separates the *m*
points. At *m* = 2(*n* + 1), exactly
one half of all possible dichotomies of the *m* points are
linearly separable. We refer to *m* = 2(*n* + 1)
as the statistical capacity of the LTG. It is noted from Figure
1.3.3, that a single LTG is essentially not capable of handling
the classification of *m* points in general position when
*m *> 3(*n* + 1).

A PTG(*r*) with labeled inputs **x***
*R*n*, may be
viewed as an LTG with *d* 1 preprocessed inputs
(see Figure 1.3.4), where . We refer
to the mapping (**x**) from the input space R*n*
to the -space R*d1 *as
the -mapping. A dichotomy of *m* points in R*n*
is said to be "-separable" by a PTG(*r*), if there
exists a set of *d 1 PTG weights and a threshold
which correctly classify all m* points. That is, if there
exists a (*d* 2)-dimensional hyperplane in -space
which correctly classifies all *m* points. The inverse image
of this hyperplane in the input space R*n*
defines a polynomial separating surface of order *r*, which
will be referred to as the -surface. Note that the Function Counting
Theorem still holds true if the set of *m* points
is in general position in -space or, equivalently for the case
*d* *m*, if no *d* points lie on the
same -surface in the input space. Here, we say *X* = {**x**1, **x**2, ..., **x***m*}
is in -general position. Therefore, the probability of a single
PTG with *d* degrees of freedom (including threshold) to
separate *m* points in -general position is given by

Figure 1.3.4. Block diagram of a PTG represented
as the cascade of a fixed preprocessing layer and a single LTG.

Since for an arbitrary set of *m* points in
R*n*, the number
of -separable dichotomies, L(*m*, *d 1),
is less than or equal to the number of -separable dichotomies
of m* points in -general position, we may write:

(1.3.7)

Thus, the number of Boolean functions, B*n*(*r*),
of *n*-variables which can be realized by an *n*-input
PTG(*r*) satisfies

B*n*(*r*)
(1.3.8)

which is exactly the inequality of Equation (1.2.1).