EarthWeb   
HomeAccount InfoLoginSearchMy ITKnowledgeFAQSitemapContact Us
     

   
  All ITKnowledge
  Source Code

  Search Tips
  Advanced Search
   
  

  

[an error occurred while processing this directive]
Previous Table of Contents Next


4.10 Pseudo-Thermodynamic Perspectives of Learning Process

Neural models represent general purpose learning systems that begin with no initial object-oriented knowledge. In such models, learning refers to incremental changes of probability that neurons are activated.

Boltzmann machines as mentioned before have two classes of learning capabilities. They can learn from observations, without supervision. That is, the machine captures the irregularities in its environment and adjusts its internal representation accordingly. Alternatively, the machines can learn from examples and counterexamples of one or more concepts and induce a general description of these concepts. This is also known as supervised learning. The machine that follows unsupervised learning is useful as content addressable memory. The learning capabilities of Boltzmann machines are typical of connectionist network models.

Invariably some of the units in a Boltzmann machine are clamped to a specific state as dictated by the environment. This leaves the machine to adjust the states of the remaining units so as to generate an output that corresponds to the most probable interpretation of the incoming stimuli. By this, the network acquires most probable environmental configuration, with some of its environmental units fixed or clamped.

The environment manifests itself as a certain probability distribution by interacting with the Boltzmann machine via a set vu∴ N of visible (external units) while the remaining units hu ∴ N are hidden and are purely internal. The visible units are clamped to states by samples of environment imposed on them. Under this connection, a learning algorithm permits the determination of appropriate connection weights so that the hidden units can change, repeated by over a number of learning cycles in which the weights are adjusted. The degree of such adjustment is determined by the behavior of the machine under the clamped mode as compared to the normal (free-running) mode.

Pertinent to the clamped mode, Livesey [60] observes that such a mode is not an intrinsic characteristic of the learning algorithm associated with the Boltzmann machine, but rather a condition stipulated by the transition probabilities of Markov chain depicting the state-transitional stochastics of these machines. The relevant condition refers to the underlying time reversibility under equilibrium conditions. A machine in equilibrium is time reversible when it is not possible to tell from its state which way time is flowing. In other words, the chain and its time reversal are identical. This happens under a detailed balanced condition given by:

The essence of machine representation of a neural network embodies a training procedure with an algorithm which compares (for a given set of network inputs) the output set with a desired (or a target) set, and computes the error or the difference [61]. For a given set of synaptic coupling {Wij}, denoting the training error in terms of the energy function by ξ({Wij}), this ensemble can be specified via the equilibrium statistical mechanics concept by Gibbs’ ensemble with the distribution function specified by exp[ - (ξs({Wij})/kBT], where kBT represents the (pseudo) Boltzmann energy.

Here x({Wij}) is pertinent to a subsystem which is taken as the representative of the total collection of subsystems of the total neuronal ensemble. Considering the partitioning of the weights among the energy states given by x({Wij}), the following Gibbs’ relation can be written in terms of a partition function:

where po is existing probability distribution imposing normalization constraints on the system parameters, pM ({Wij}) is the Gibbs’ distribution pertinent to the M associated trainings (or a set of M training examples), and ZM is the partition function defined as:

where β = 1/kBT, and N is the total number of couplings. The average training error per example (etr) can be specified by the (pseudo) thermodynamic (Gibbs’) free - energy, G defined as:

where <...>En is the average over the ensemble of the training examples. That is:

where <...>T is the thermal average. The above relation implies that the free-energy (and hence the training error) are functions of the relative number of training examples and the Boltzmann energy.

From the free-energy relation, the corresponding thermodynamic entropy can be deduced via conventional Legendre transformation, given by:

This entropy function is a measure of the deviation of PM from the initial distribution Po. At the onset of training, (M/N) = 0. Therefore . As the training proceeds, becomes negative. The entropy measure thus describes the evolution or the distribution in the system parameter space.

Akin to the molal free-energy (or the chemical potential*) of thermodynamics, the corresponding factor associated with the relative number of training examples is given by:


*Chemical potential: It is the rate of change of free-energy per mole (in a chemical system) at constant volume and temperature.

where is the one-step entropy defined as:

The one-step entropy is a measure (specified by a small or a large negative number) to describe qualitatively the last learning step resulting in a small or large contraction of the relevant subspace volume respectively.


Previous Table of Contents Next

Copyright © CRC Press LLC

HomeAccount InfoSubscribeLoginSearchMy ITKnowledgeFAQSitemapContact Us
Products |  Contact Us |  About Us |  Privacy  |  Ad Info  |  Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.