![]() |
|
|||
![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
![]() |
|
![]() |
[an error occurred while processing this directive]
4.4 Simulated Annealing and Energy FunctionThe concept of equilibrium statistics stems from the principles of statistical physics. A basic assumption concerning many-particle systems in statistical physics refers to the ergodicity hypothesis in respect to ensemble averages which determine the average of observed values in the physical system at thermal equilibrium. Examples of physical quantities which can be attributed to the physical system under such thermal equilibrium conditions are average energy, energy-spread and entropy. Another consideration at the thermal equilibrium is Gibbs statement that, if the ensemble is stationary(which is the case if equilibrium is achieved), its density is a function of the energy of the system. Another feature of interest at thermal equilibrium is that, applying the principle of equal probability, the probability that the system is in a state i with energy Ei, is given by Gibbs or Boltzmanns distribution indicated earlier. In the annealing procedure, also detailed earlier, the probabilities of global states are determined by their energy levels. In the search of global minimum, the stability of a network can be ensured by associating an energy function*which culminates in a minimum value. Designating this energy function as the Lyapunov function, it can be represented in a recurrent network as follows:
where E is an artifical network energy function (Lyapunov function), Wij is the weight from the output of neuron i to the input of neuron j, oj is the output of neuron j, xj is the external input to neuron j, and VTj represents the threshold of neuron j. The corresponding change in the energy ΔE due to a change in the state of neuron j is given by: where Δoj is the change in the output of neuron, j. The above relation assures that the network energy must either decrease or stay invariant as the system evolves according to its dynamic rule regardless of the net value being larger or less than the threshold value. When the net value is equal to VT, the energy remains unchanged. In other words, any change in the state of a neuron will either reduce the energy or maintain its present value. The continuous decreasing trend of E should eventually allow it to settle at a minimum value ensuring the stability of the network as discussed before. 4.5 Cooling SchedulesThese refer to a set of parameters that govern the convergence of simulated annealing algorithms. The cooling schedule specifies a finite sequence of values of the control parameters (Cp) involving the following steps:
A cooling schedule is also in conformity of specifying a finite number of transitions at each value of the control parameter. This condition equates to the simulated annealing algorithms being realized by generating homogeneous chains of finite length for a finite sequence of descending values of the control parameter. A general class of cooling schedule refers to a polynomial-time cooling schedule. It leads to a polynomial-time execution of the simulated algorithm, but it does not guarantee the deviation in cost between the final solution obtained by the algorithm and the optimal cost. The Boltzmann machine follows a simple annealing schedule with a probability of a change in its objective function, as decided by Equation (4.2). The corresponding scheduling warrants that the rate of temperature reduction is proportional to the reciprocal of the logarithm of time to achieve a convergence towards a global minimum. Thus, the cooling rate in a Boltzmann machine is given by [55]: where To is the initial (pseudo) temperature and t is the time. The above relation implies almost an impractical cooling rate, or the Boltzmann machine often takes an infinitely large time to train. The Cauchy distribution is long tailed which corresponds to increased probability of large step-sizes in the search procedure for a global minimum. Hence, the Cauchy machine has a reduced training time with a schedule given by: The simulated annealing pertinent to a gaussian machine has a hyperbolic scheduling, namely, where τT is the time-constant of the annealing schedule. The initial value of the control parameter (To), in general, should not be large enough to allow virtually all transitions to be accepted. This is achieved by having the initial acceptance ratio Χo (defined as the ratio of initial number of accepted transitions to the number of proposed transitions) close to unity. This corresponds to starting with a small To multiplied by a constant factor greater than 1, until the corresponding value of Χo calculated from generated transitions approaches 1. In metallurgical annealing, this refers to heating up the solid until all particles are randomly arranged in the liquid phase. The functional decrement of the control parameter (T) is chosen so that only small changes in control parameters would result. The final value of the control parameter (T) corresponds to the termination of the execution of the algorithm when the cost function of the solution obtained in the last trial remains unchanged for a number of consecutive chains with a Markov structure. The length of the Markov chain is bounded by a finite value compatible to the small decremental value of the control parameter adopted. In the network optimization problems, the change in the reference level with time adaptively for the purpose of a better search is termed as sharpening schedule. That is, sharpening refers to altering the output gain curve by slowly decreasing the value of the reference activation level (ao) over the time-scale. The candidates for the sharpening scheme are commonly exponential, inverse-logarithm, or linear expressions. For gaussian machines, a hyperbolic sharpening schedule of the type: has been suggested. Here Ao is the initial value of ao and τao is the time constant of the sharpening schedule. In general, the major problems that confront the simulated annealing is the convergence speed. For real applications, in order to guarantee fast convergence, Jeong and Park [56] developed lower bounds of annealing schedules for Boltzmann and Cauchy machines by mathematically describing the annealing algorithms via Markov chains. Accordingly, the simulated annealing is defined as a Markov chain consisting of a transition probability matrix P(k) and an annealing schedule T(k) controlling P for each trial, k.
Copyright © CRC Press LLC
![]() |
![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
![]() |
![]() |