1.6 Extreme Points

For any linear dichotomy {X+X} of a set of points, there exists a minimal sufficient subset of extreme points such that any hyperplane correctly separating this subset must separate the entire set correctly. From this definition it follows that a point is an extreme point of the linear dichotomy {X+X} if and only if it is ambiguous with respect to {X+X}. Thus, for a set of m points in general position in Rd, each of the m points is ambiguous with respect to precisely C(m  1, d  1) linear dichotomies of the remaining m  1 points. Here, each of the dichotomies is the restriction of two linearly separable dichotomies of the original m points; the two dichotomies which differs only in the classification of the remaining point. Therefore, each of the m points is an extreme point with respect to 2C(m  1, d  1) dichotomies. Since there are a total of C(m, d) linearly separable dichotomies, the probability, Pe, of a point to be an extreme point with respect to a randomly generated dichotomy is (assuming equiprobable dichotomies)

(1.6.1)

Then, the expected number of extreme points in a set of m points in general position in Rd, is equal to the sum of the m probabilities that each point is an extreme point. Since these probabilities are equal, the expected number of extreme points can be written as

(1.6.2)

Figure 1.6.1 illustrates the effect of the size of the training set m on the normalized expected number of extreme points, .








Figure 1.6.1. Normalized expected number of extreme points for d = 5, 20 and as a function of .

For large d, with = m/d, the normalized expected number of extreme points is given by (Cover, 1965)

(1.6.3)

which agrees with the simulation results in Figure 1.6.1 for d  . The above limit on the expected number of extreme points implies, roughly, that for a random separable set of m patterns, the average number of patterns (information) which need to be stored for a complete characterization of the whole set is only twice the number of degrees of freedom of the class of separating surfaces being used. The implication to pattern recognition is that the essential information in an infinite training set can be expected to be loaded into a network of finite capacity (e.g., a PTG of finite order or a finite network of simple gates).

Goto [1.0][1.1] [1.2] [1.3] [1.4] [1.5] [1.7]

Back to the Table of Contents

Back to Main Menu