Mean field annealing (
Soukoulis et al., 1983;
Bilbro
et al., 1989) is a deterministic approximation to simulated annealing
which is significantly more computationally efficient (faster)
than simulated annealing (
Bilbro et al., 1992). Instead of directly
simulating the stochastic transitions in simulated annealing,
the mean (or average) behavior of these transitions is used to
characterize a given stochastic system. Because computations
using the mean transitions attain equilibrium faster than those
using the corresponding stochastic transitions, mean field annealing
relaxes to a solution at each temperature much faster than does
stochastic simulated annealing. This leads to a significant decrease
in computational effort. The idea of using a deterministic mean-valued
approximation for a system of stochastic equations to simplify
the analysis has been adopted at various instances in this book
(e.g., see Section 4.3). Generally speaking, such approximations
are adequate in high dimensional systems of many interacting units
(states) where each state is a function of all or a large number
of other states allowing the central limit theorem to be used
(see Problem 7.5.9). In this section, we restrict our discussion
of mean field annealing to the Boltzmann machine which was introduced
in the previous section.
8.4.1 Mean-Field Retrieval
Consider a stochastic Hopfield net with the stochastic
dynamics given in Equation (8..1). The evolution of the stochastic
state xi
of unit i depends on neti
which involves variables xj
that themselves fluctuate between -1
and +1. Let us transform the set of n stochastic equations
in xi to
n deterministic equations in <xi>
governing the means of the stochastic variables. If we focus
on a single variable xi
and compute its average by assuming no fluctuations of the other
xj's (this
allows us to replace neti
by its average <neti>),
3 we get
The system is now deterministic and is approximated
by the n mean-field equations represented by Equation (8.4.1).
It is important to point out that Equation (8.4.1) is meaningful
only when the network is at thermal equilibrium, which means that
all the quantities <xi>
converge (become time-independent). Luckily, the stochastic Hopfield
net is guaranteed to reach thermal equilibrium, as was discussed
in Section 8.3.1.
At thermal equilibrium, the stochastic Hopfield
net fluctuates about the constant average values in Equation (8.4.1).
The mean state <x> is thus one of the local minima
of the quadratic energy function E(x) at temperature
T =
The equilibria of Equation (8.4.2) are given by setting
Assuming the common choice x = f(u) = tanh(u)
in Equation (8.4.3) gives
which becomes identical in form to Equation (8.4.1)
after setting i = 1.
The electronic continuous-state (deterministic)
Hopfield net must employ high-gain amplifiers (large ) in order
to achieve a binary-valued solution as is normally generated by
the original stochastic net. However, starting with a deterministic
net having large may lead to a poor local minimum as is the case
with a stochastic net whose "temperature" T is
quenched. Since annealing a stochastic net increases the probability
that the state will converge to a global minimum, we may try to
reach this minimum by annealing the approximate mean-field system.
This approach is known as mean-field annealing. Mean-field annealing
can be realized very efficiently in electronic (analog) nets like
the one in Figure (7.1.3) where dynamic amplifier gains allow
for a natural implementation of continuous cooling schedules (Lee
and Sheu, 1993). This is referred to as "hardware annealing."
The deterministic Boltzmann machine is applicable
only to problems involving quadratic cost functions. However,
the principles of mean-field annealing may still be applied to
more general cost functions with substantial savings in computing
time by annealing a steady-state average system as opposed to
a stochastic one. In addition to being faster than simulated
annealing, mean-field annealing has proved to lead to better solutions
in several optimization problems (Van den Bout and Miller, 1988,
1989; Cortes and Hertz, 1989; Bilbro and Snyder, 1989).
8.4.2 Mean-Field Learning
The excessive number of calculations required by
Boltzmann machine learning may be circumvented by extending the
above mean-field method to adapting the wij's
(Peterson and Anderson, 1987). Here, the correlations <xi
xj> are
approximated by sisj
where si
is given by the average equation:
as in Equation (8.4.1), but with Ii = 0
for convenience. Equation (8.4.5) applies for free units (hidden
units and unclamped visible units). For a clamped visible unit
i, si
is set to 1 (the value that the unit's output is supposed to be
clamped at). As required by Boltzmann learning, the correlations
<xi xj>
should be computed at thermal equilibrium. This means that we
must use approximation terms sisj
where the si's
(sj's) are
solutions to the n nonlinear equations represented by Equation
(8.4.5). One may employ an iterative method to solve for the
unclamped states si
according to
combined with annealing (gradual increasing of ).
Peterson and Anderson (1987) reported that this mean-field learning
is 10 to 30 times faster than simulated annealing on some test
problems with somewhat better results.
(8.4.1)
. The location of
this minimum may then be computed by solving the set of n
nonlinear mean-field equations. An alternative way is to solve
for <x> by gradient descent on E(x)
from an initial random state. This is exactly what the deterministic
continuous Hopfield net with hyperbolic tangent activations does
[recall Equation (7.1.25)]. In fact, Equation (8.4.1) has the
same form as the equation governing the equilibrium points of
the Hopfield net (Bilbro et al., 1989). To see this, we recall
from Equation (7.1.19) the dynamics of the ith unit in
a continuous-state electronic Hopfield net, namely
(8.4.2)
to zero, giving
(8.4.3)
(8.4.4)
(8.4.5)
(8.4.6)