Fundamentals of Artificial Neural Networks
by Mohamad H. Hassoun
(MIT Press, 1995)
My purpose in writing this book has been to give a systematic
account of major concepts and methodologies of artificial neural networks
and to present a unified framework that makes the subject more accessible to
students and practitioners. This book emphasizes fundamental theoretical
aspects of the computational capabilities and learning abilities of artificial
neural networks. It integrates important theoretical results on artificial
neural networks and uses them to explain a wide range of existing empirical
observations and commonly used heuristics.
The main audience is first-year graduate students in electrical engineering, computer engineering, and computer science. This book may be adapted for use as a senior undergraduate textbook by selective choice of topics. Alternatively, it may also be used as a valuable resource for practicing engineers, computer scientists, and others involved in research in artificial neural networks.
This book has evolved from lecture notes of two courses on artificial neural networks, a senior-level course and a graduate-level course, which I have taught during the last 6 years in the Department of Electrical and Computer Engineering at Wayne State University.
The background material needed to understand this book is general knowledge of some basic topics in mathematics, such as probability and statistics, differential equations and linear algebra, and something about multivariate calculus. The reader is also assumed to have enough familiarity with the concept of a system and the notion of "state," as well as with the basic elements of Boolean algebra and switching theory. The required technical maturity is that of a senior undergraduate in electrical engineering, computer engineering, or computer science.
Artificial neural networks are viewed here as parallel computational models, with varying degrees of complexity, comprised of densely interconnected adaptive processing units. These networks are fine- grained parallel implementations of nonlinear static or dynamic systems. A very important feature of these networks is their adaptive nature, where "learning by example" replaces traditional "programming" in solving problems. This feature makes such computational models very appealing in application domains where one has little or incomplete understanding of the problem to be solved but where training data is readily available. Another key feature is the intrinsic parallelism that allows for fast computations of solutions when these networks are implemented on parallel digital computers or, ultimately, when implemented in customized hardware.
Artificial neural networks are viable computational models for a wide variey of problems, including pattern classification, speech synthesis and recognition, adaptive interfaces between humans and complex physical systems, function approximation, image data compression, associative memory, clustering, forecasting and prediction, combinatorial optimization, nonlinear system modeling, and control. These networks are "neural" in the sense that they may have been inspired by neuroscience, but not because they are faithful models of biologic neural or cognitive phenomena. In fact, the majority of the network models covered in this book are more closely related to traditional mathematical and/or statistical models such as optimization algorithms, nonparametric pattern classifiers, clustering algorithms, linear and nonlinear filters, and statistical regression models than they are to neurobiologic models.
The theories and techniques of artificial neural networks outlined here are fairly mathematical, although the level of mathematical rigor is relatively low. In my exposition I have used mathematics to provide insight and understanding rather than to establish rigorous mathematical foundations.
The selection and treatment of material reflect my background as an electrical and computer engineer. The operation of artificial neural networks is viewed as that of nonlinear systems: Static networks are viewed as mapping or static input/output systems, and recurrent networks are viewed as dynamical systems with evolving "state." The systems approach is also evident when it comes to discussing the stability of learning algorithms and recurrent network retrieval dynamics, as well as in the adopted classifications of neural networks as discrete-state or continuous-state and discrete-time or continuous-time. The neural network paradigms (architectures and their associated learning rules) treated here were selected because of their relevence, mathematical tractability, and/or practicality. Omissions have been made for a number of reasons, including complexity, obscurity, and space.
This book is organized into eight chapters.
Chapter 1 introduces the reader to the most basic artificial neural net, consisting of a single linear threshold gate (LTG). The computational capabilities of linear and polynomial threshold gates are derived. A fundamental theorem, the function counting theorem, is proved and is applied to study the capacity and the generalization capability of threshold gates. The concepts covered in this chapter are crucial because they lay the theoretical foundations for justifying and exploring the more general artificial neural network architectures treated in later chapters.
Chapter 2 mainly deals with theoretical foundations of multivariate function approximation using neural networks. The function counting theorem of chapter 1 is employed to derive upper bounds on the capacity of various feedforward nets of LTGs. The necessary bounds on the size of LTG-based multilayer classifiers for the cases of training data in general position and in arbitrary position are derived. Theoretical results on continuous function approximation capabilities of feedforward nets, with units employing various nonlinearities, are summarized. The chapter concludes with a discussion of the computational effectiveness of neural net architectures and the efficiency of their hardware implementations.
Learning rules for single-unit and single -ayer nets are covered in Chapter 3. More than 20 basic discrete-time learning rules are presented. Supervised rules are considered first, followed by reinforcement, Hebbian, competitive, and feature mapping rules. The presentation of these learning rules is unified in the sense that they may all be viewed as realizing incremental steepest-gradient-descent search on a suitable criterion function. Examples of single-layer architectures are given to illustrate the application of unsupervised learning rules (e.g., principal component analysis, clustering, vector quantization, and self-organizing feature maps).
Chapter 4 is concerned with the theoretical aspects of supervised, unsupervised, and reinforcement learning rules. The chapter starts by developing a unifying framework for the charaterization of various learning rules (supervised and unsupervised). Under this framework, a continuous-time learning rule is viewed as a first-order stochastic differential equation/ dynamical system whereby the state of the system evolves so as to minimize an associated instantaneous criterion function. Statistical approximation techniques are employed to study the dynamics and stability, in an "average" sense, of the stochastic system. This approximation leads to an "average learning equation" that, in most cases, can be cast as a globally, asymptotically stable gradient system whose stable equilibria are minimizers of a well-defined criterion function. Formal analysis is provided for supervised, reinforcement, Hebbian, competitive, and topology-preserving learning. Also, the generalization properties of deterministic and stochastic neural nets are analyzed. The chapter concludes with an investigatinon of the complexity of learning in multilayer neural nets.
Chapter 5 deals with learning in multilayer artificial neural nets. It extends the gradient descent-based learning to multilayer feedforward nets, which results in the back error propagation learning rule (or backprop). An extensive number of methods and heuristics for improving backprop's convergence speed and solution quality are presented, and an attempt is made to give a theoretical basis for such methods and heuristics. Several significant applications of backprop-trained multilayer nets are described. These applications include conversion of English text to speech, mapping of hand gestures to speech, recognition of handwritten ZIP codes, continuous vehicle navigation, and medical diagnosis. The chapter also extends backprop to recurrent networks capable of temporal association, nonlinear dynamical system modeling, and control.
Chapter 6 is concerned with other important adaptive multilayer net architectures, such as the radial basis function (RBF) net and the cerebeller model articulation controller (CMAC) net, and their associated learning rules. These networks often have similar computational capabilities to feedforward multilayer nets of sigmoidal units, but with the potential for faster learning. Adaptive mulilayer unit-allocating nets such as hyperspherical classifiers, restricted Coulomb energy (RCE) net, and cascade- correlation net are discussed. The chapter also addresses the issue of unsupervised learning in multilayer nets, and it describes two specific networks [adaptive resonance theory (ART) net and the autoassociative clustering net] suitable for adaptive data clustering. The clustering capabilities of these nets are demonstrated through examples, including the decomposition of complex electromyogram signals.
Chapter 7 discusses associative neural memories. Various models of associative learning and retrieval are presented and analyzed, with emphasis on recurrent models. The stability, capacity, and error-correction capabilities of these models are analyzed. The chapter concludes by describing the use of one particular recurrent model (the Hopfield continuous model) for solving combinatorial optimization problems.
Global search methods for optimal learning and retrieval in multilayer neural networks is the topic of Chapter 8. It covers the use of simulated annealing, mean-field annealing, and genetic algorithms for optimal learning. Simulated annealing is also discussed in the context of local-minima-free retrievals in recurrent neural networks (Boltzmann machines). Finally, a hybrid genetic algorithm/gradient-descent search method that combines optimal and fast learning is described.
Each chapter concludes with a set of problems designed to allow the reader to further explore the concepts discussed. More than 200 problems of varying degrees of difficulty are provided. The problems can be divided roughly into three categories. The first category consists of problems that are relatively easy to solve. These problems are designed to directly reinforce the topics discussed in the book. The second category of problems, marked with an asterisk (*), is relatively more difficult. These problems normally involve mathematical derivations and proofs and are intended to be thought provoking. Many of these problems include reference to technical papers in the literature that may give complete or partial solutions. This second category of problems is intended mainly for readers interested in exploring advanced topics for the purpose of stimulating original research ideas. Problems marked with a dagger ( ) represent a third category of problems that are numerical in nature and require the use of a computer. Some of these problems are mini programming projects, which should be especially useful for students.
This book contains enough material for a full semester course on artificial neural networks at the first-year graduate level. I have also used this material selectively to teach an upper-level undergraduate introductory course. For the undergraduate course, one may choose to skip all or a subset of the following material: Sections 1.4-1.6, 2.1-2.2, 4.3-4.8, 5.1.2, 5.4.3- 5.4.5, 6.1.2, 6.2-6.4, 6.4.2, 7.2.2, 7.4.1-7.4.4, 8.3.2, 8.4.2, and 8.6.
I hope that this book will prove useful to those students and practicing professionals who are interested not only in understanding the underlying theory of artificial neural networks but also in pursuing research in this area. A list of about 700 relevent references is included with the aim of providing guidance and direction for the readers' own search of the research literature. Even though this reference list may seem comprehensive, the published literature is too extensive to allow such a list to be complete.