1.0 Introduction

Artificial neural networks are parallel computational models, comprised of densely interconnected adaptive processing units. These networks are fine-grained parallel implementations of nonlinear static or dynamic systems. A very important feature of these networks is their adaptive nature where "learning by example" replaces "programming" in solving problems. This feature makes such computational models very appealing in application domains where one has little or incomplete understanding of the problem to be solved, but where training data is available. Another key feature is the intrinsic parallel architecture which allows for fast computation of solutions when these networks are implemented on parallel digital computers or, ultimately, when implemented in customized hardware.

Artificial neural networks are viable computational models for a wide variety of problems. These include pattern classification, speech synthesis and recognition, adaptive interfaces between humans and complex physical systems, function approximation, image compression, associative memory, clustering, forecasting and prediction, combinatorial optimization, nonlinear system modeling, and control. These networks are "neural" in the sense that they may have been inspired by neuroscience, but not necessarily because they are faithful models of biological neural or cognitive phenomena. In fact, the majority of the networks covered in this book are more closely related to traditional mathematical and/or statistical models such as non-parametric pattern classifiers, clustering algorithms, nonlinear filters, and statistical regression models than they do with neurobiological models.

The "artificial neuron" is the basic building block/processing unit of an artificial neural network. It is necessary to understand the computational capabilities of this processing unit as a prerequisite for understanding the function of a network of such units. The artificial neuron model considered here is closely related to an early model used in threshold logic (Winder, 1962; Brown, 1964; Cover, 1964; Dertouzos, 1965; Hu, 1965; Lewis and Coates, 1967; Sheng, 1969; Muroga, 1971). Here, an approximation to the function of a biological neuron is captured by the linear threshold gate (McCulloch and Pitts, 1943).

This chapter investigates the computational capabilities of a linear threshold gate (LTG). Also in this chapter, the polynomial threshold gate (PTG) is developed as a generalization of the LTG, and its computational capabilities are studied. An important theorem, known as the Function Counting Theorem, is proved and is used to determine the statistical capacity of LTG's and PTG's. Then, a method for minimal parameter PTG synthesis is developed for the realization of arbitrary binary mappings (switching functions). Finally, the chapter concludes by defining the concepts of ambiguous and extreme points and applies them to study the generalization capability of threshold gates and to determine the average amount of information necessary for characterizing large data sets by threshold gates, respectively.

Goto [1.1] [1.2] [1.3] [1.4] [1.5] [1.6] [1.7]

Back to the Table of Contents

Back to Main Menu