5.0 Introduction
This chapter extends the gradient descent-based
delta rule of Chapter 3 to multilayer feedforward neural networks.
The resulting learning rule is commonly known as error back propagation
(or backprop), and it is one of the most frequently used learning
rules in many applications of artificial neural networks.
The backprop learning rule is central to much current
work on learning in artificial neural networks. In fact, the
development of backprop is one of the main reasons for the renewed
interest in artificial neural networks. Backprop provides a computationally
efficient method for changing the weights in a feedforward network,
with differentiable activation function units, to learn a training
set of input-output examples. Backprop-trained multilayer neural
nets have been applied successfully to solve some difficult and
diverse problems such as pattern classification, function approximation,
nonlinear system modeling, time-series prediction, and image compression
and reconstruction. For these reasons, we devote most of this
chapter to study backprop, its variations, and its extensions.
Backpropagation is a gradient descent search algorithm
which may suffer from slow convergence to local minima. In this
chapter, several methods for improving backprop's convergence
speed and avoidance of local minima are presented. Whenever possible,
theoretical justification is given for these methods. A version
of backprop based on an enhanced criterion function with global
search capability is described, which, when properly tuned, allows
for relatively fast convergence to good solutions. Several significant
applications of backprop trained multilayer neural networks are
described. These applications include the conversion of English
text into speech, mapping hand gestures to speech, recognition
of hand-written zip codes, autonomous vehicle navigation, medical
diagnosis, and image compression.
The last part of this chapter deals with extensions
of backprop to more general neural network architectures. These
include multilayer feedforward nets whose inputs are generated
by a tapped delay-line circuit and fully recurrent neural networks.
These adaptive networks are capable of extending the applicability
of artificial neural networks to nonlinear dynamical system modeling
and temporal pattern association.
Goto [5.1] [5.2] [5.3] [5.4] [5.5]