1.0 Introduction
Artificial neural networks are parallel computational
models, comprised of densely interconnected adaptive processing
units. These networks are fine-grained parallel implementations
of nonlinear static or dynamic systems. A very important feature
of these networks is their adaptive nature where "learning
by example" replaces "programming" in solving problems.
This feature makes such computational models very appealing in
application domains where one has little or incomplete understanding
of the problem to be solved, but where training data is available.
Another key feature is the intrinsic parallel architecture which
allows for fast computation of solutions when these networks are
implemented on parallel digital computers or, ultimately, when
implemented in customized hardware.
Artificial neural networks are viable computational
models for a wide variety of problems. These include pattern
classification, speech synthesis and recognition, adaptive interfaces
between humans and complex physical systems, function approximation,
image compression, associative memory, clustering, forecasting
and prediction, combinatorial optimization, nonlinear system modeling,
and control. These networks are "neural" in the sense
that they may have been inspired by neuroscience, but not necessarily
because they are faithful models of biological neural or cognitive
phenomena. In fact, the majority of the networks covered in this
book are more closely related to traditional mathematical and/or
statistical models such as non-parametric pattern classifiers,
clustering algorithms, nonlinear filters, and statistical regression
models than they do with neurobiological models.
The "artificial neuron" is the basic building
block/processing unit of an artificial neural network. It is
necessary to understand the computational capabilities of this
processing unit as a prerequisite for understanding the function
of a network of such units. The artificial neuron model considered
here is closely related to an early model used in threshold logic
(Winder, 1962; Brown, 1964; Cover, 1964; Dertouzos, 1965; Hu,
1965; Lewis and Coates, 1967; Sheng, 1969; Muroga, 1971). Here,
an approximation to the function of a biological neuron is captured
by the linear threshold gate (McCulloch and Pitts, 1943).
This chapter investigates the computational capabilities
of a linear threshold gate (LTG). Also in this chapter, the polynomial
threshold gate (PTG) is developed as a generalization of the LTG,
and its computational capabilities are studied. An important
theorem, known as the Function Counting Theorem, is proved and
is used to determine the statistical capacity of LTG's and PTG's.
Then, a method for minimal parameter PTG synthesis is developed
for the realization of arbitrary binary mappings (switching functions).
Finally, the chapter concludes by defining the concepts of ambiguous
and extreme points and applies them to study the generalization
capability of threshold gates and to determine the average amount
of information necessary for characterizing large data sets by
threshold gates, respectively.