EarthWeb   
HomeAccount InfoLoginSearchMy ITKnowledgeFAQSitemapContact Us
     

   
  All ITKnowledge
  Source Code

  Search Tips
  Advanced Search
   
  

  

[an error occurred while processing this directive]
Previous Table of Contents Next


That is, a difficulty encountered normally with Hopfield nets is the, tendency for the system to stabilize at a local rather than going to a global minimum. This can, however, be obviated by introducing noise at the input so that the artificial neurons change their state in a statistical rather than in a deterministic fashion. To illustrate this concept, a ball rolling up and down in a terrain can be considered. The ball may settle at a local trap (Lm) such that it may not be able to climb up the global minimum valley (see Figure 4.1). However, a strategy which introduces some disturbances can cause the ball to become unsettled and jump out from the local minima. The ball being at Lm, corresponds to a weight setting initially to a value Lm. If the random weight steps are small, all deviations from Lm increase the objective function (energy) and will be rejected. This refers to trapping at a local minimum. If the weight setting is very large, both the local minimum at Lm and the global minimum at Gm are “frequently revisited”; and the changes in weight occur so drastically that the ball may never settle into a desired minimum.


Figure 4.1  Global and local minima
Gm: Global minimum; Lm: Local minimum; X: Weight state; E(X): Objective function or cost function; DT: Escape from local minima (de-trapping) corresponds to annealing

By starting with large steps and gradually reducing the size of the average random step, the network could, however, escape from the local minima ensuring an eventual network stabilization. This process mimics the metallurgical annealing described above. This simulated annealing enables a combinatorial optimization of finding a solution among a potentially very large number of solutions with minimal cost-function. Here, the cost-function corresponds to the free-energy on a one-to-one basis.

The annealing in a network can be accomplished as follows: When a disturbance is deliberately introduced and started at a random state at each time-step, a new state could be generated according to a generating probability density. This new state would replace the old state, if the new state has a lower energy. If it has higher energy, it is designated as a new state with a probability as determined by an acceptance function. This way, jumps occasionally are allowed to configurations of higher energy. Otherwise the old state is retained. In the search of minimal energy solution, there are possibilities of other suboptimal solutions emerging arbitrarily close to an optimum. Therefore, reaching the optimal solution invariably warrants a rather extensive search with massive computational efforts.

4.2 Machine Representation of Neural Network

In practice, the system incorporated with a method of introducing a disturbance or a random noise for the purpose of state de-trapping (as explained above) is referred to as a machine. For example, Hinton et al. [51] proposed the Boltzmann statistics of thermodynamics to describe the neural system as a machine representing the “constant satisfaction networks that learn” by the implementation of local constraints as connection strength in stochastic networks. In these Boltzmann machines, the generating probability density is gaussian given by:

where the time-schedule of changing fluctuations in the machine is described in terms of an artifical cooling temperature TG(t) (also known as the pseudo-temperature) being inversely logarithmic to time; and the acceptance probability (corresponding to the chance of the ball climbing a hump) follows the Boltzmann distribution, namely:

where ΔE is the increase in energy incurred by a transition. It may be noted that both the acceptance and generating functions are decided essentially by the cooling schedule. The above probability distribution refers to the probability distribution of energy states of the annealing thermodynamics, that is, the probability of the system being in a state with energy ΔE. At high temperatures, this probability approaches a single value for all energy states, so that a high energy state is as likely as a low energy state. As the temperature is lowered, the probability of high energy states decreases as compared to the probability of low energy states. When the temperature approaches zero, it becomes very unlikely that the system will exist in a high energy state.

The Boltzmann machine is essentially a connectionist model of a neural network: It has a large number of interconnected elements (neurons) with bistable states and the interconnections have real-valued strengths to impose local constraints on the states of the neural units; and, as indicated by Aarts and Korst [52], “a consensus function gives a quantitative measure for the ‘goodness’ of a global configuration of the Boltzmann machine determined by the states of all individual units”.

The cooperative process across the interconnections dictates a simple, but powerful massive parallelism and distribution of the state transitional progression and hence portrays a useful configuration model. Optimality search via Boltzmann statistics provides a substantial reduction of computational efforts since the simulated annealing algorithm supports a massively parallel execution. Boltzmann machines also yield higher order optimizations via learning strategies. Further, they can accommodate self-organization (through learning) in line with the cybernetics of the human brain.

Szu and Hartley [53] in describing neural nets advocated the use of a Cauchy machine instead of the Boltzmann machine. The Cauchy machine uses the generating probability with the Cauchy/Lorentzian distribution given by:

where TC(t) is the pseudothermodynamic temperature. It allows the cooling schedule to vary inversely proportional to the time, rather than to the logarithmic function of time. That is, Szu and Hartley used the same acceptance probability given by Equation (4.3), but TG(t) is replaced by TC(t).


Previous Table of Contents Next

Copyright © CRC Press LLC

HomeAccount InfoSubscribeLoginSearchMy ITKnowledgeFAQSitemapContact Us
Products |  Contact Us |  About Us |  Privacy  |  Ad Info  |  Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.