Fundamentals of Artificial Neural Networks
by Mohamad H. Hassoun
(MIT Press, 1995)
My purpose in writing this book has been to give a systematic
account of major concepts and methodologies of artificial neural networks
and to present a unified framework that makes the subject more accessible to
students and practitioners. This book emphasizes fundamental theoretical
aspects of the computational capabilities and learning abilities of artificial
neural networks. It integrates important theoretical results on artificial
neural networks and uses them to explain a wide range of existing empirical
observations and commonly used heuristics.
The main audience is first-year graduate students in electrical
engineering, computer engineering, and computer science. This book may be
adapted for use as a senior undergraduate textbook by selective choice of
topics. Alternatively, it may also be used as a valuable resource for
practicing engineers, computer scientists, and others involved in research
in artificial neural networks.
This book has evolved from lecture notes of two courses on
artificial neural networks, a senior-level course and a graduate-level course,
which I have taught during the last 6 years in the Department of Electrical
and Computer Engineering at Wayne State University.
The background material needed to understand this book is
general knowledge of some basic topics in mathematics, such as probability and
statistics, differential equations and linear algebra, and something about
multivariate calculus. The reader is also assumed to have enough familiarity
with the concept of a system and the notion of "state," as well as with the
basic elements of Boolean algebra and switching theory. The required technical
maturity is that of a senior undergraduate in electrical engineering, computer
engineering, or computer science.
Artificial neural networks are viewed here as parallel
computational models, with varying degrees of complexity, comprised of
densely interconnected adaptive processing units. These networks are fine-
grained parallel implementations of nonlinear static or dynamic systems. A
very important feature of these networks is their adaptive nature, where
"learning by example" replaces traditional "programming" in solving
problems. This feature makes such computational models very appealing in
application domains where one has little or incomplete understanding of the
problem to be solved but where training data is readily available. Another key
feature is the intrinsic parallelism that allows for fast computations of
solutions when these networks are implemented on parallel digital computers
or, ultimately, when implemented in customized hardware.
Artificial neural networks are viable computational models for
a wide variey of problems, including pattern classification, speech
synthesis and recognition, adaptive interfaces between humans and complex
physical systems, function approximation, image data compression,
associative memory, clustering, forecasting and prediction, combinatorial
optimization, nonlinear system modeling, and control. These networks are
"neural" in the sense that they may have been inspired by neuroscience, but
not because they are faithful models of biologic neural or cognitive
phenomena. In fact, the majority of the network models covered in this book
are more closely related to traditional mathematical and/or statistical models
such as optimization algorithms, nonparametric pattern classifiers, clustering
algorithms, linear and nonlinear filters, and statistical regression models
than they are to neurobiologic models.
The theories and techniques of artificial neural networks outlined
here are fairly mathematical, although the level of mathematical rigor is
relatively low. In my exposition I have used mathematics to provide insight
and understanding rather than to establish rigorous mathematical foundations.
The selection and treatment of material reflect my background as
an electrical and computer engineer. The operation of artificial neural
networks is viewed as that of nonlinear systems: Static networks are viewed as
mapping or static input/output systems, and recurrent networks are viewed as
dynamical systems with evolving "state." The systems approach is also
evident when it comes to discussing the stability of learning algorithms and
recurrent network retrieval dynamics, as well as in the adopted
classifications of neural networks as discrete-state or continuous-state and
discrete-time or continuous-time. The neural network paradigms
(architectures and their associated learning rules) treated here were selected
because of their relevence, mathematical tractability, and/or practicality.
Omissions have been made for a number of reasons, including complexity,
obscurity, and space.
This book is organized into eight chapters.
Chapter 1 introduces
the reader to the most basic artificial neural net, consisting of a single
linear threshold gate (LTG). The computational capabilities of linear and
polynomial threshold gates are derived. A fundamental theorem, the function
counting theorem, is proved and is applied to study the capacity and the
generalization capability of threshold gates. The concepts covered in this
chapter are crucial because they lay the theoretical foundations for
justifying and exploring the more general artificial neural network
architectures treated in later chapters.
Chapter 2 mainly deals with theoretical foundations of
multivariate function approximation using neural networks. The function
counting theorem of chapter 1 is employed to derive upper bounds on the
capacity of various feedforward nets of LTGs. The necessary bounds on the size
of LTG-based multilayer classifiers for the cases of training data in
general position and in arbitrary position are derived. Theoretical results on
continuous function approximation capabilities of feedforward nets, with units
employing various nonlinearities, are summarized. The chapter concludes with a
discussion of the computational effectiveness of neural net architectures
and the efficiency of their hardware implementations.
Learning rules for single-unit and single -ayer nets are covered
in Chapter 3. More than 20 basic discrete-time learning rules are presented.
Supervised rules are considered first, followed by reinforcement, Hebbian,
competitive, and feature mapping rules. The presentation of these learning
rules is unified in the sense that they may all be viewed as realizing
incremental steepest-gradient-descent search on a suitable criterion function.
Examples of single-layer architectures are given to illustrate the application
of unsupervised learning rules (e.g., principal component analysis,
clustering, vector quantization, and self-organizing feature maps).
Chapter 4 is concerned with the theoretical aspects of supervised,
unsupervised, and reinforcement learning rules. The chapter starts by
developing a unifying framework for the charaterization of various learning
rules (supervised and unsupervised). Under this framework, a continuous-time
learning rule is viewed as a first-order stochastic differential equation/
dynamical system whereby the state of the system evolves so as to minimize
an associated instantaneous criterion function. Statistical approximation
techniques are employed to study the dynamics and stability, in an "average"
sense, of the stochastic system. This approximation leads to an "average
learning equation" that, in most cases, can be cast as a globally,
asymptotically stable gradient system whose stable equilibria are minimizers
of a well-defined criterion function. Formal analysis is provided for
supervised, reinforcement, Hebbian, competitive, and topology-preserving
learning. Also, the generalization properties of deterministic and
stochastic neural nets are analyzed. The chapter concludes with an
investigatinon of the complexity of learning in multilayer neural nets.
Chapter 5 deals with learning in multilayer artificial neural
nets. It extends the gradient descent-based learning to multilayer feedforward
nets, which results in the back error propagation learning rule (or backprop).
An extensive number of methods and heuristics for improving backprop's
convergence speed and solution quality are presented, and an attempt is made
to give a theoretical basis for such methods and heuristics. Several
significant applications of backprop-trained multilayer nets are described.
These applications include conversion of English text to speech, mapping of
hand gestures to speech, recognition of handwritten ZIP codes, continuous
vehicle navigation, and medical diagnosis. The chapter also extends backprop
to recurrent networks capable of temporal association, nonlinear dynamical
system modeling, and control.
Chapter 6 is concerned with other important adaptive multilayer
net architectures, such as the radial basis function (RBF) net and the
cerebeller model articulation controller (CMAC) net, and their associated
learning rules. These networks often have similar computational capabilities
to feedforward multilayer nets of sigmoidal units, but with the potential
for faster learning. Adaptive mulilayer unit-allocating nets such as
hyperspherical classifiers, restricted Coulomb energy (RCE) net, and cascade-
correlation net are discussed. The chapter also addresses the issue of
unsupervised learning in multilayer nets, and it describes two specific
networks [adaptive resonance theory (ART) net and the autoassociative
clustering net] suitable for adaptive data clustering. The clustering
capabilities of these nets are demonstrated through examples, including the
decomposition of complex electromyogram signals.
Chapter 7 discusses associative neural memories. Various models of
associative learning and retrieval are presented and analyzed, with emphasis
on recurrent models. The stability, capacity, and error-correction
capabilities of these models are analyzed. The chapter concludes by describing
the use of one particular recurrent model (the Hopfield continuous model)
for solving combinatorial optimization problems.
Global search methods for optimal learning and retrieval in
multilayer neural networks is the topic of Chapter 8. It covers the use of
simulated annealing, mean-field annealing, and genetic algorithms for
optimal learning. Simulated annealing is also discussed in the context of
local-minima-free retrievals in recurrent neural networks (Boltzmann
machines). Finally, a hybrid genetic algorithm/gradient-descent search
method that combines optimal and fast learning is described.
Each chapter concludes with a set of problems designed to allow
the reader to further explore the concepts discussed. More than 200 problems
of varying degrees of difficulty are provided. The problems can be divided
roughly into three categories. The first category consists of problems that
are relatively easy to solve. These problems are designed to directly
reinforce the topics discussed in the book. The second category of problems,
marked with an asterisk (*), is relatively more difficult. These problems
normally involve mathematical derivations and proofs and are intended to be
thought provoking. Many of these problems include reference to technical
papers in the literature that may give complete or partial solutions. This
second category of problems is intended mainly for readers interested in
exploring advanced topics for the purpose of stimulating original research
ideas. Problems marked with a dagger ( ) represent a third category of
problems that are numerical in nature and require the use of a computer.
Some of these problems are mini programming projects, which should be
especially useful for students.
This book contains enough material for a full semester course on
artificial neural networks at the first-year graduate level. I have also used
this material selectively to teach an upper-level undergraduate introductory
course. For the undergraduate course, one may choose to skip all or a subset
of the following material: Sections 1.4-1.6, 2.1-2.2, 4.3-4.8, 5.1.2, 5.4.3-
5.4.5, 6.1.2, 6.2-6.4, 6.4.2, 7.2.2, 7.4.1-7.4.4, 8.3.2, 8.4.2, and 8.6.
I hope that this book will prove useful to those students and
practicing professionals who are interested not only in understanding the
underlying theory of artificial neural networks but also in pursuing
research in this area. A list of about 700 relevent references is included
with the aim of providing guidance and direction for the readers' own search
of the research literature. Even though this reference list may seem
comprehensive, the published literature is too extensive to allow such a
list to be complete.