2.4 Computational Effectiveness of Neural Networks
2.4.1 Algorithmic Complexity
A general way of looking at the efficiency of embedding
a problem in a neural network comes from a computational complexity point
of view (Abu-Mostafa, 1986a and 1986b). Solving a problem on a sequential
computer requires a certain number of steps (time complexity), memory size
(space complexity), and length of algorithm (Kolmogorov complexity). In
a neural net simulation the number of computations is a measure of time
complexity, the number of units is a measure of space complexity, and the
number of weights (degrees of freedom) where the algorithm is "stored"
is a measure of Kolmogorov complexity. In formulating neural network solutions
for practical problems, we seek to minimize, simultaneously, the resulting
time, space, and Kolmogorov complexities of the network. If a given problem
is very demanding in terms of space complexity, then the required network
size is large and thus the number of weights is large, even if the Kolmogorov
complexity of the algorithm is very modest! This spells inefficiency: neural
net solutions of problems with short algorithms and high-space complexity
are very inefficient. The same is true for problems where time complexity
is very demanding, while the other complexities may not be.
The above complexity discussion leads us to identify certain
problems which best match the computational characteristics of artificial
neural networks. Problems which require a very long algorithm if run on
sequential machines make the most use of neural nets, since the capacity
of the net grows faster than the number of units. Such problems are called
random problems (Abu-Mostafa, 1986b). Examples are pattern recognition
problems in natural environments and AI problems requiring huge data bases.
These problems make the most use of the large capacity of neural nets.
It is interesting to note that humans are very good at such random problems
but not at structured ones. The other interesting property of random problems
is that we do not have explicit algorithms for solving them. A neural net
can develop one by learning from examples, as will be shown in Chapters
5 and 6.
At this point, one may recall the results of Chapter One,
based on Cover's concept of extreme patterns/inequalities, which point
to the effectiveness of threshold units in loading a large set of labeled
patterns (data/prototypes) by only learning on extreme patterns. These
results and those presented here show that neural networks can be very
2.4.2 Computation Energy
For any computational device, be it biologically-based,
microprocessor-based, etc., the cost-per-computation or computational energy
of the device can be directly measured in terms of the energy required
(in units of Joules) to perform the computation. From this point of view,
we may then compare computational devices in terms of the total energy
required to solve a given problem.
As technology evolved, it always moved in the direction
of lower energy per unit computation in order to allow for more computations
per unit time in a practically sized computing machine (note the trend
from vacuum tubes, to transistors, to integrated circuits). A typical microprocessor
chip can perform about 10 million operations per second and uses about
1 watt of power. In round numbers, it costs about 10-7
joules to do one operation on such a chip. The ultimate silicon technology
we can envision today will dissipate on the order of 10-9
joules of energy for each operation at the single chip level.
The brain, on the other hand, has about 1015
synapses. A nerve pulse arrives at each synapse on the average of 10 times
per second. So, roughly, the brain accomplishes 1016
complex operations per second. Since the power dissipation is a few watts,
each operation costs only 10-16
joules! The brain is more efficient, by a factor of 10 million, than the
best digital technology that we can hope to attain.
One reason for the inefficiency in computation energy is due to the way devices are used in a system. In a typical silicon implementation, we switch about 104 transistors to do one operation. Using the physics of the device (e.g., analog computing in a single transistor) can save us these four orders of magnitude in computation energy. For example, analog addition is done for free at a node, and nonlinear activations (sigmoids) can be realized using a single MOS transistor operating in its subthreshold region. Similarly, multiplication may be performed with a small MOS transistor circuit. Therefore, one can very efficiently realize analog artificial neurons with existing technology employing device physics (Hopfield, 1990; Mead, 1991). Carver Mead (1991) wrote: "We pay a factor of 104 for taking all of the beautiful physics that is built into these transistors, mashing it down into a one or a zero, and then painfully building it back, with AND and OR gates to reinvent the multiply. We then string together those multiplications and additions to get an exponential. But we neglected a basic fact: the transistor does an exponential all by itself."
Based on the above, the type of computations required
by neural networks may lend themselves nicely to efficient implementation
with current analog VLSI technology; there is a potential for effectively
and efficiently realizing very large analog neural nets on single chips.
Analog optical implementations of neural networks could also have a competitive
advantage over digital implementations. Optics have the unique advantage
that beams of light can cross each other without affecting their information
contents. Thus, optical interconnections may be used in conjunction with
electronic VLSI chips to efficiently implement very large and richly interconnected
neural networks. The richness of optical device physics, such as holograms,
analog spatial light modulators, optical filters, and fourier lenses suggests
that optical implementation of analog neural nets could be very efficient.
[For examples of very large optical neural networks, the reader is referred
to the works of Paek and Psaltis (1987), Abu-Mostafa and Psaltis (1987),
and Anderson and Erie (1987).]
Back to the Table of Contents
Back to Main Menu