8.4 Mean-Field Annealing and Deterministic Boltzmann Machines

Mean field annealing ( Soukoulis et al., 1983; Bilbro et al., 1989) is a deterministic approximation to simulated annealing which is significantly more computationally efficient (faster) than simulated annealing ( Bilbro et al., 1992). Instead of directly simulating the stochastic transitions in simulated annealing, the mean (or average) behavior of these transitions is used to characterize a given stochastic system. Because computations using the mean transitions attain equilibrium faster than those using the corresponding stochastic transitions, mean field annealing relaxes to a solution at each temperature much faster than does stochastic simulated annealing. This leads to a significant decrease in computational effort. The idea of using a deterministic mean-valued approximation for a system of stochastic equations to simplify the analysis has been adopted at various instances in this book (e.g., see Section 4.3). Generally speaking, such approximations are adequate in high dimensional systems of many interacting units (states) where each state is a function of all or a large number of other states allowing the central limit theorem to be used (see Problem 7.5.9). In this section, we restrict our discussion of mean field annealing to the Boltzmann machine which was introduced in the previous section.

Consider a stochastic Hopfield net with the stochastic dynamics given in Equation (8..1). The evolution of the stochastic state xi of unit i depends on neti which involves variables xj that themselves fluctuate between -1 and +1. Let us transform the set of n stochastic equations in xi to n deterministic equations in <xi> governing the means of the stochastic variables. If we focus on a single variable xi and compute its average by assuming no fluctuations of the other xj's (this allows us to replace neti by its average <neti>), 3 we get

(8.4.1)

The system is now deterministic and is approximated by the n mean-field equations represented by Equation (8.4.1). It is important to point out that Equation (8.4.1) is meaningful only when the network is at thermal equilibrium, which means that all the quantities <xi> converge (become time-independent). Luckily, the stochastic Hopfield net is guaranteed to reach thermal equilibrium, as was discussed in Section 8.3.1.

(8.4.2)

The equilibria of Equation (8.4.2) are given by setting to zero, giving

(8.4.3)

Assuming the common choice x = f(u) = tanh(u) in Equation (8.4.3) gives

(8.4.4)

which becomes identical in form to Equation (8.4.1) after setting i = 1.

The electronic continuous-state (deterministic) Hopfield net must employ high-gain amplifiers (large ) in order to achieve a binary-valued solution as is normally generated by the original stochastic net. However, starting with a deterministic net having large may lead to a poor local minimum as is the case with a stochastic net whose "temperature" T is quenched. Since annealing a stochastic net increases the probability that the state will converge to a global minimum, we may try to reach this minimum by annealing the approximate mean-field system. This approach is known as mean-field annealing. Mean-field annealing can be realized very efficiently in electronic (analog) nets like the one in Figure (7.1.3) where dynamic amplifier gains allow for a natural implementation of continuous cooling schedules (Lee and Sheu, 1993). This is referred to as "hardware annealing."

The deterministic Boltzmann machine is applicable only to problems involving quadratic cost functions. However, the principles of mean-field annealing may still be applied to more general cost functions with substantial savings in computing time by annealing a steady-state average system as opposed to a stochastic one. In addition to being faster than simulated annealing, mean-field annealing has proved to lead to better solutions in several optimization problems (Van den Bout and Miller, 1988, 1989; Cortes and Hertz, 1989; Bilbro and Snyder, 1989).

The excessive number of calculations required by Boltzmann machine learning may be circumvented by extending the above mean-field method to adapting the wij's (Peterson and Anderson, 1987). Here, the correlations <xi xj> are approximated by sisj where si is given by the average equation:

(8.4.5)

as in Equation (8.4.1), but with Ii = 0 for convenience. Equation (8.4.5) applies for free units (hidden units and unclamped visible units). For a clamped visible unit i, si is set to 1 (the value that the unit's output is supposed to be clamped at). As required by Boltzmann learning, the correlations <xi xj> should be computed at thermal equilibrium. This means that we must use approximation terms sisj where the si's (sj's) are solutions to the n nonlinear equations represented by Equation (8.4.5). One may employ an iterative method to solve for the unclamped states si according to

(8.4.6)

combined with annealing (gradual increasing of ). Peterson and Anderson (1987) reported that this mean-field learning is 10 to 30 times faster than simulated annealing on some test problems with somewhat better results.