|
 |
[an error occurred while processing this directive]
4.12 Learning at Zero Temperature
This refers to a naive training strategy of minimizing the (training) error. It specifies a critical relative number of learning examples below which training with zero error is possible and beyond which, however, error in the training process cannot be avoided. The error normally arises from external noise in the examples. Absence of such noise permits a perfect learning process, and the target rule can be represented by the reference perceptron. The criticality can therefore be quantified as a function of the overlap parameters R. Thermodynamically, an excess of the number of learning examples (beyond the critical value) makes the system unstable.
The entities q and R represent order parameters which vary as a function of the relative number of training examples for a given generic noise input. The R parameter exhibits a nonmonotonic behavior and is the precursor of criticality. When R →1, the training approaches the pure reference system, despite of the presence of noise; or, the system self-regulates as a cybernetic complex and organizes itself so that the learning process filters out the external noise. The convergence of R towards 1 obviously depends on the amount of noise introduced. The smaller the external noise is the faster the convergence is. Criticality is the limit of capacity for error-free learning in the sense that the critical number of training examples brings about a singularity in the learning process, as it is indicated by the behavior of the training error and the different examples. Further, the criticality marks the onset of replica symmetry breaking, implying that the parameter space of interaction with minimal training error breaks up into disconnected subsets.
The naive learning strategy discussed earlier minimizes the training error for a given number of examples. It also results in a generalization error. That is, in characterizing the achievement in learning via R (representing a deviation from the reference perceptron), the probability that the trained perceptron makes an error in predicting a noise output of the reference perceptron is also implicitly assessed; and an error on an example independent of the training set, namely:

could be generalized. (Here, the prime refers to the new example.)
Explicit evaluation of this generalization error indicates that it decreases monotonically with R. In other words, a maximal overlap of R is equivalent to minimizing the generalization error. Hence the algebraic consequence of R → 0 translates into algebraic decay of the generalization error, and such a decay is slower if the examples contain external noise. However, by including the thermal noise into the learning process, the system acquires a new degree of freedom and allows the minimization of the generalization error as a function of temperature. Therefore, the naive learning method (with T → 0) is not an optimal one. In terms of the (functional) number of examples (M/N), the effect of introducing thermal synaptic noise, with a noise parameter (ΦN), a threshold curve M/N (ΦN) exists such that for M/N <(>) M/N(ΦN) the optimum training temperature is zero (positive).
4.13 Concluding Remarks
In summary, the following could be considered as the set of (pseudo) thermodynamic concepts involved in neural network modeling:
- Thermodynamics of learning machines.
- Probability distributions of neural state transitional energy states.
- Cooling schedules, annealing, and cooling rate.
- Boltzmann energy and Boltzmann temperature.
- Reverse-cross and cross entropy concepts.
- System (state) parameters.
- Equilibrium statistics.
- Ensemble of energy functions (Gibbs ensemble).
- Partition function concepts.
- Gibbs free-energy function.
- Entropy.
- Overlaps of replicas.
- Replica symmetry ansatz.
- Order parameters.
- Criticality parameter.
- Replica symmetry breaking.
- Concept of zero temperature.
Evolution of the aforesaid concepts of (pseudo) thermodynamics and principles of statistical physics as applied to neural activity can be summarized by considering the chronological contributions from the genesis of the topic to its present state as stated below. A descriptive portrayal of these contributions is presented in the next chapter.
- McCulloch and Pitts (1943) described the neuron as a binary, all-or-none element and showed the ability of such elements to perform logical computations [7].
- Gabor (1946) proposed a strategy of finding solutions to problems of sensory perception through quantum mechanic concepts [10].
- Wiener (1948) suggested the flexibility of describing the global properties of materials as well as rich and complicated systems via principles of statistical mechanics [9].
- Hebb (1949) developed a notion that a percept or a concept can be represented in the brain by a cell-assembly with the suggestion that the process of learning is the modification of synaptic efficacies [19].
- Cragg and Temperley (1954) indicated an analogy between the persistent activity in the neural network and the collective states of coupled magnetic dipoles [32].
- Caianiello (1961) built the neural statistical theory on the basis of statistical mechanics concepts and pondered over Hebbs learning theory [64].
- Griffith (1966) posed a criticism that the Hamiltonian of the neural assembly is totally unlike the ferromagnetic Hamiltonian [13].
- Cowan (1968) described the statistical mechanics of nervous nets [65].
- Bergstrom and Nevalinna (1972) described a neural system by its total neural energy and its entropy distribution [49].
- Little (1974) elucidated the analogy between noise and (pseudo) temperature in a neural assembly thereby paving half the way towards thermodynamics [33].
- Amari (1974) proposed a method of statistical neurodynamics [66].
- Thompson and Gibson (1981) advocated a general definition of long range order pertinent to the proliferation of the neuronal state transtitional process [37].
- Ingber (1982,1983) studied the statistical mechanics of neurocortical interactions and developed dynamics of synaptic modifications in neural networks [67,68].
- Hopfield (1982,1984) completed the linkage between thermodynamics vis-a-vis spin glass in terms of models of content-addressable memory through the concepts of entropy and provided an insight into the energy functional concept [31,36].
- Hinton, Sejnowski, and Ackley (1984) developed the Boltzmann machine concept representing constraint satisfaction networks that learn [51].
- Peretto (1984) searched for an extensive quantity to depict the Hopfield-type networks and constructed formulations via stochastic units which depict McCulloch and Pitts weighted-sum computational neurons but with the associated dynamics making mistakes with a certain probability analogous to the temperature in statistical mechanics [38].
- Amit, Gutfreund, and Sompolinsky (1985) developed pertinent studies yielding results on a class of stochastic network models such as Hopfields net being amenable to exact treatment [69].
- Toulouse, Dehaene, and Changeux (1986) considered a spin-glass model of learning by selection in a neural network [70].
- Rumelhart, Hinton and Willliams (1986) (re)discovered back-propagation algorithm to match the adjustment of weights connecting units in successive layers of multilayer perceptions [71].
- Gardner (1987) explored systematically the space of couplings through the principles of statistical mechanics with a consequence of such strategies being applied exhaustively to neural networks [72].
- Szu and Hartley (1987) adopted the principles of thermodynamic annealing to achieve energy minimum criterion, and proposed the Cauchy machine representation of neural network [53].
- 1989: The unifying concepts of neural networks and spin glasses were considered in the collection of papers presented in the Stat. Phys. 17 workshop [73].
- Aarts and Korst (1989) elaborated the stochastical approach to combinational optimization and neural computing [52].
- Akiyama, Yamashita, Kajiura and Aiso (1990) formalized gaussian machine representation of neuronal activity with graded response like the Hopfield machine and stochastical characteristics akin to the Boltzmann machine [54].
|