EarthWeb   
HomeAccount InfoLoginSearchMy ITKnowledgeFAQSitemapContact Us
     

   
  All ITKnowledge
  Source Code

  Search Tips
  Advanced Search
   
  

  

[an error occurred while processing this directive]
Previous Table of Contents Next


The reason for Szu and Hartley’s modification of the generating probability is that it allows a fast annealing schedule. That is, the presence of a small number of very long jumps allows faster escapes from the local minima. The relevent algorithm thus converges much faster. Simulation done by Szu and Hartley shows that the Cauchy machine is better in reaching and staying in the global minimum as compared with the Boltzmann machine. Hence, they called their method the fast simulated annealing (FSA) schedule.

Another technique rooted in thermodynamics to realize annealing faster than the Cauchy method refers to adjusting the temperature reduction rate according to the (pseudo) specific heat calculated during the training process. The metallurgical correspondence for this strategy follows:

During annealing, metals experience phase changes. These phases correspond to discrete energy levels. At each phase change, there is an abrupt change of the specific heat defined as the rate of change of temperature with energy. The change in specific heat results from the system settling into one of the local energy minima. Similar to metallurgical phase changes, neural networks also pass through phase changes during training. At the phase transistional boundary, a specific heat attribution to the network can therefore be considered which undergoes an abrupt change. This pseudo specific heat refers to the average rate of change of pseudo-temperature with respect to the objective function. Violent initial changes make the average value of the objective function virtually independent of small changes in temperature so that the specific heat is a constant. Also, at low temperatures, the system is frozen into a minimum energy. Thus again, the specific heat is nearly invariant. As such, any rapid temperature fluctuations at the temperature extrema may not improve the objective function to any significant level.

However, at certain critical temperatures (such as a ball having just enough energy for a transit from Lm to Gm, but with insufficient energy for a shift from Gm to Lm), the average value of the objective function makes an abrupt change. At these critical points, the training algorithm must alter the temperature very slowly to ensure the system not trapping into a local minimum (Lm). The critical temperature is perceived by noticing an abrupt decrease in the specific heat, namely, the average rate of change with the objective function. Upon reaching the objective function, the temperature maximal to this value must be traversed slowly enough so as to achieve a convergence towards a global minimum. At other temperatures, a larger extent of temperature reduction can, however, be used freely in order to curtail the training time.

4.3 Neural Network versus Machine Concepts

4.3.1 Boltzmann Machine

The Boltzmann machine has a binary output characterized by a stochastic decision and follows instantaneous activation in turn. Its activation value refers to the net input specified by:

where en is the error (per unit time) on the input caused by random noise, and oj is the output value that an ith neuron receives from other neuron units through the input links such that oj assumes a graded value over a range 0 < oj < 1. The neuron has also an input bias, θi. Unit i ∈ N has state oi ∈ {0, 1} so that the global state-space S of this machine is 2N. Associated with each state s ∈ S is its consensus Cs defined as Σij Wij oj oi. The Boltzmann machine maximizes CSi within the net through the simulated annealing algorithm via pseudo-temperature T which asymptotically reduces to zero. For any fixed value of T > 0, the Boltzmann machine behaves as an irreducible Markov chain tending towards equilibrium. This can be explained as follows.

A finite Markov chain represents, in general, a sequence o(n) (n = ... -1, 0, +1 ...) probability distributions over the finite state-space S. This state-space refers to a stochastic system with the state changing in discrete epochs; and o(n) is the probability distribution of the state of the system in epoch, n such that o(n + 1) depends only on o(n) and not on previous states. The transition from state s to s′ in the Markov chain is depicted by a transitional probability Pss′. The Markov chain can be said to have attained an equilibrium, if the probability of the state-space os(n) remains invariant as πs for all s and n. πs is referred to as the stationary distribution; and it is irreducible, if the set {πs} has nonzero cardinality.

Writing Pss′ = (giss′ pas′), giss′ is the probability of choosing i, a global choice from N, and is taken to be uniformly 1/n; whereas pas′ is the probability of making the change once i has been chosen and is determined locally by the weight at a critical unit where s and s′, being adjacent, differ. The parameter gss′ is the generating probability and pas′ is the acceptance probability considered earlier. That is, for a Boltzmann machine Pass′ = 1/[1 + exp(Δss′)], with Δss′ = (Cs - Cs′)/T. Hence Δs′s = Δss′ so that pas′s = (1 - pass′)


Previous Table of Contents Next

Copyright © CRC Press LLC

HomeAccount InfoSubscribeLoginSearchMy ITKnowledgeFAQSitemapContact Us
Products |  Contact Us |  About Us |  Privacy  |  Ad Info  |  Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.