2.4 Computational Effectiveness of Neural Networks

2.4.1 Algorithmic Complexity

A general way of looking at the efficiency of embedding a problem in a neural network comes from a computational complexity point of view (Abu-Mostafa, 1986a and 1986b). Solving a problem on a sequential computer requires a certain number of steps (time complexity), memory size (space complexity), and length of algorithm (Kolmogorov complexity). In a neural net simulation the number of computations is a measure of time complexity, the number of units is a measure of space complexity, and the number of weights (degrees of freedom) where the algorithm is "stored" is a measure of Kolmogorov complexity. In formulating neural network solutions for practical problems, we seek to minimize, simultaneously, the resulting time, space, and Kolmogorov complexities of the network. If a given problem is very demanding in terms of space complexity, then the required network size is large and thus the number of weights is large, even if the Kolmogorov complexity of the algorithm is very modest! This spells inefficiency: neural net solutions of problems with short algorithms and high-space complexity are very inefficient. The same is true for problems where time complexity is very demanding, while the other complexities may not be.

The above complexity discussion leads us to identify certain problems which best match the computational characteristics of artificial neural networks. Problems which require a very long algorithm if run on sequential machines make the most use of neural nets, since the capacity of the net grows faster than the number of units. Such problems are called random problems (Abu-Mostafa, 1986b). Examples are pattern recognition problems in natural environments and AI problems requiring huge data bases. These problems make the most use of the large capacity of neural nets. It is interesting to note that humans are very good at such random problems but not at structured ones. The other interesting property of random problems is that we do not have explicit algorithms for solving them. A neural net can develop one by learning from examples, as will be shown in Chapters 5 and 6.

At this point, one may recall the results of Chapter One, based on Cover's concept of extreme patterns/inequalities, which point to the effectiveness of threshold units in loading a large set of labeled patterns (data/prototypes) by only learning on extreme patterns. These results and those presented here show that neural networks can be very efficient classifiers.

2.4.2 Computation Energy

For any computational device, be it biologically-based, microprocessor-based, etc., the cost-per-computation or computational energy of the device can be directly measured in terms of the energy required (in units of Joules) to perform the computation. From this point of view, we may then compare computational devices in terms of the total energy required to solve a given problem.

As technology evolved, it always moved in the direction of lower energy per unit computation in order to allow for more computations per unit time in a practically sized computing machine (note the trend from vacuum tubes, to transistors, to integrated circuits). A typical microprocessor chip can perform about 10 million operations per second and uses about 1 watt of power. In round numbers, it costs about 10-7 joules to do one operation on such a chip. The ultimate silicon technology we can envision today will dissipate on the order of 10-9 joules of energy for each operation at the single chip level.

The brain, on the other hand, has about 1015 synapses. A nerve pulse arrives at each synapse on the average of 10 times per second. So, roughly, the brain accomplishes 1016 complex operations per second. Since the power dissipation is a few watts, each operation costs only 10-16 joules! The brain is more efficient, by a factor of 10 million, than the best digital technology that we can hope to attain.

One reason for the inefficiency in computation energy is due to the way devices are used in a system. In a typical silicon implementation, we switch about 104 transistors to do one operation. Using the physics of the device (e.g., analog computing in a single transistor) can save us these four orders of magnitude in computation energy. For example, analog addition is done for free at a node, and nonlinear activations (sigmoids) can be realized using a single MOS transistor operating in its subthreshold region. Similarly, multiplication may be performed with a small MOS transistor circuit. Therefore, one can very efficiently realize analog artificial neurons with existing technology employing device physics (Hopfield, 1990; Mead, 1991). Carver Mead (1991) wrote: "We pay a factor of 104 for taking all of the beautiful physics that is built into these transistors, mashing it down into a one or a zero, and then painfully building it back, with AND and OR gates to reinvent the multiply. We then string together those multiplications and additions to get an exponential. But we neglected a basic fact: the transistor does an exponential all by itself."

Based on the above, the type of computations required by neural networks may lend themselves nicely to efficient implementation with current analog VLSI technology; there is a potential for effectively and efficiently realizing very large analog neural nets on single chips. Analog optical implementations of neural networks could also have a competitive advantage over digital implementations. Optics have the unique advantage that beams of light can cross each other without affecting their information contents. Thus, optical interconnections may be used in conjunction with electronic VLSI chips to efficiently implement very large and richly interconnected neural networks. The richness of optical device physics, such as holograms, analog spatial light modulators, optical filters, and fourier lenses suggests that optical implementation of analog neural nets could be very efficient. [For examples of very large optical neural networks, the reader is referred to the works of Paek and Psaltis (1987), Abu-Mostafa and Psaltis (1987), and Anderson and Erie (1987).]

Goto [2.0][2.1] [2.2] [2.3] [2.5]

Back to the Table of Contents

Back to Main Menu