1.3 General Position and the Function Counting Theorem

In this section, we try to answer the following fundamental question: Given m points in {0, 1}n, how many dichotomies of these m points are realizable with a single PTG(r), for r = 1, 2, ..., n? We are also interested in answering the question for the more general case of m points in Rn. But first, we present a classical theorem on the capability of polynomials as approximators of continuous functions.

1.3.1 Weierstrass' Approximation Theorem

Theorem 1.3.1 (Weierstrass' Approximation Theorem): Let g be a continuous real-valued function defined on a closed interval [a, b]. Then, given any positive, there exists a polynomial y (which may depend on ) with real coefficients such that

(1.3.1)

for every x [a, b].

The proof of this theorem can be found in Apostol (1957).

Theorem 1.3.1 is described by the statement: every continuous function can be "uniformly approximated" by a polynomial. Note that the order of the polynomial depends on the function being approximated and the desired accuracy of the approximation. Weierstrass' Approximation Theorem also applies for the case of a continuous multivariate function g which maps a compact set Rn to a compact set Y R (Narendra and Parthasarathy, 1990). Thus, a single PTG of unrestricted order r which employs no thresholding is a universal approximator for continuous functions g:  Y.

Theorem 1.3.1 can also be extended to non-polynomial functions. Let {1(x), , d(x)} be a complete orthonormal set, then any g satisfying the requirements of Theorem 1.3.1 can be approximated by a function

(1.3.2)

where the wj's are real constants.

Polynomials may also be used to approximate binary-valued functions defined on a finite set of points. Examples of such functions are Boolean functions and functions of the form gS  {0, 1}, where S is a finite set of arbitrary points in Rn. A PTG differs from a polynomial in that it has an intrinsic quantifier (threshold nonlinearity) for the output state. This gives the PTG the natural capability for realizing Boolean functions and complex dichotomies of arbitrary points in Rn. The remainder of this section develops some theoretical results needed to answer the questions raised at the beginning of this section.

1.3.2 Points in General Position

Let us calculate the number of dichotomies of m points in Rn (ways of labeling m points into two distinct categories) achieved by a linear separating surface (i.e., LTG). We call each of these dichotomies a linear dichotomy. For m n-dimensional patterns, the number of linear dichotomies is equal to twice the number of ways in which the m points can be partitioned by an (n  1)-dimensional hyperplane (for each distinct partition, there are two different classifications).

As an example, consider the case m = 4 and n = 2. Figure 1.3.1 shows four points in a two-dimensional space. The lines li, i = 1, ... , 7, give all possible linear partitions of these four points. In particular, consider l5. It could be the decision surface implementing either of the following: (1) x1 and x4 in class 1, and x2 and x3 in class 2; or (2) x1 and x4 in class 2, and x2 and x3 in class 1. Thus, one may enumerate all possible linear dichotomies to be equal to 14. If three of the points belong to the same line (Figure 1.3.2), there are only six linear partitions. For m > n, we say that a set of m points is in general position in Rn if and only if no subset of n + 1 points lies on an -dimensional hyperplane; for m n, a set of m points is in general position if no (m  2)-dimensional hyperplane contains the set. Thus the four points of Figure 1.3.1 are in general position, whereas the four points of Figure 1.3.2 are not. Equivalently, a set of m points in Rn is in general position if every subset of n or fewer points (vectors) is linearly independent, which implies that

(1.3.3)

Note that general position requires a stringent rank condition on the matrix [x1 x2 ... xm] (the matrix [x1 x2 ... xm] has maximal rank n if at least one n × n submatrix has a nonzero determinant).

It can be shown (see Section 1.3.3) that for m points in general position with m n+1, the total number of linear dichotomies is 2m. This means that a hyperplane is not constrained by the requirement of correctly classifying n + 1 or fewer points in general position. Note that as n , a set of m random points in Rn is in general position with probability approaching one.

Figure 1.3.1 Points in general position. Figure 1.3.2. Points not in general position.

1.3.3 Function Counting Theorem

The so-called Function Counting Theorem, which counts the number of linearly separable dichotomies of m points in general position in Rd, is essential for estimating the separating capacity of an LTG or a PTG and is considered next. This theorem is also useful in giving an upper bound on the number of linear dichotomies for points in arbitrary position.

Theorem 1.3.2 (Function Counting Theorem; Cover, 1965): The number of linearly separable dichotomies of m points in general position in Euclidean d-space is

(1.3.4)

The general position requirement on the m points is a necessary and sufficient condition.

Proof of Theorem 1.3.2:

Consider a set of m points, X = {x1x2, ..., xm}, in general position in Rd. Let C(md) be the number of linearly separable dichotomies {X+X} of X. Here, X+ (X) is a subset of X consisting of all points which lie above (below) the separating hyperplane.

Consider a new point, xm+1, such that the set of m + 1 points, X' = {x1x2, ..., xm+1}, is in general position. Now, some of the linear dichotomies of the set X can be achieved by hyperplanes which pass through xm+1. Let the number of such dichotomies be D. For each of these D linear dichotomies there will be two new dichotomies, and . This is because when the points are in general position any hyperplane through xm+1 that realizes the dichotomy {X+X} can be shifted infinitesimally to allow arbitrary classification of xm+1 without affecting the separation of the dichotomy {X+X}. For the remaining C(md)  D dichotomies, either or must be separable. Therefore, there will be one new linear dichotomy for each old one. Thus, the number of linear dichotomies of X', C(m + 1, d), is given by


Again, D is the number of linear dichotomies of X that could have had the dividing hyperplane drawn through xm+1. But this number is simply C(md  1), because constraining the hyperplane to go through a particular point xm+1 makes the problem effectively d  1 dimensional. This observation allows us to obtain the recursion relation


The repeated iteration of the above relation for mm  1, m  2, ..., 1 yields


from which the theorem follows immediately on noting that C(1, N) = 2 (one point can be linearly separated into one category or the other) and for m  d+1.

Theorem 1.3.2 may now be employed to study the ability of a single LTG to separate m points in general position in Rn. Since the total number of possible dichotomies in Rn is 2m, the probability of a single n-input LTG to separate m points in general position (assuming equal probability for the 2m dichotomies) is

(1.3.5)

Equation (1.3.5) is plotted in Figure 1.3.3. Note that if m < 2(n + 1), then as n approaches infinity, PLS approaches one; i.e., the LTG almost always separates the m points. At m = 2(n + 1), exactly one half of all possible dichotomies of the m points are linearly separable. We refer to m = 2(n + 1) as the statistical capacity of the LTG. It is noted from Figure 1.3.3, that a single LTG is essentially not capable of handling the classification of m points in general position when m > 3(n + 1).











Figure 1.3.3. Probability of linear separability of m points in general position in Rn.

1.3.4 Separability in Space

A PTG(r) with labeled inputs x Rn, may be viewed as an LTG with d  1 preprocessed inputs (see Figure 1.3.4), where . We refer to the mapping (x) from the input space Rn to the -space Rd1 as the -mapping. A dichotomy of m points in Rn is said to be "-separable" by a PTG(r), if there exists a set of d  1 PTG weights and a threshold which correctly classify all m points. That is, if there exists a (d  2)-dimensional hyperplane in -space which correctly classifies all m points. The inverse image of this hyperplane in the input space Rn defines a polynomial separating surface of order r, which will be referred to as the -surface. Note that the Function Counting Theorem still holds true if the set of m points is in general position in -space or, equivalently for the case d  m, if no d points lie on the same -surface in the input space. Here, we say X = {x1x2, ..., xm} is in -general position. Therefore, the probability of a single PTG with d degrees of freedom (including threshold) to separate m points in -general position is given by

(1.3.6)


Figure 1.3.4. Block diagram of a PTG represented as the cascade of a fixed preprocessing layer and a single LTG.

Since for an arbitrary set of m points in Rn, the number of -separable dichotomies, L(md  1), is less than or equal to the number of -separable dichotomies of m points in -general position, we may write:

(1.3.7)

Thus, the number of Boolean functions, Bn(r), of n-variables which can be realized by an n-input PTG(r) satisfies

Bn(r) (1.3.8)

which is exactly the inequality of Equation (1.2.1).

Goto [1.0][1.1] [1.2] [1.4] [1.5] [1.6] [1.7]

Back to the Table of Contents

Back to Main Menu