Practical Applications of Computational Intelligence for Adaptive Control:Fuzzy Classifier System

[an error occurred while processing this directive]

Table of Contents

18.3 Learning Classifier Systems

A classifier system is a genetics-based machine learning system that learns rules, called classifiers, to govern its performance in a given environment e.g., the pH titration system. The systems are based on a model of a service economy in which money is exchanged for services, and include a mechanism for discovering new rules. Unlike in traditional expert systems, a rule must earn the right to be implemented, not fixed by the programmer. In classifier systems the rules are forced to coexist in a service economy where a competition is held to decide which rule will be put into effect under a specified set of conditions in the environment. The competitive nature of the economy ensures good rules survive and bad rules die off, while the exploratory portion of the learning classifier system creates new rules. These systems operate incrementally, testing new rules while steadily improving performance in an environment.

In general, classifier systems are composed of three subsystems:

1) Rule and message system;

2) Credit assignment system;

3) Rule discovery system (genetic algorithm).

These three subsystems, when combined with a mathematical model of the environment, form a computationally complete system capable of learning effective rules for interacting in an environment and controlling processes. The mathematical model is used to investigate the effectiveness of untested rules.

The rule and message system is similar to a production system. Production systems are schemes that use rules as their only means of operation. The rules are generally of the form:

When the condition exists in the environment, then the action is taken. In learning classifier systems the rule and message system must complete the following basic tasks. The learning classifier system evaluates the existing state of the environment; it determines what conditions exist. These conditions are compared with the current rule set to determine which rules are eligible to be put into effect. A competition is held among the eligible rules and a set of winning rules is selected. These rules take their associated action by sending a message to the problem environment thereby causing a change in the environment. It should be noted that the first competition is strictly a random decision since all rules initially have the same relative strength. However, as will be seen shortly, all future competitions are decided based upon the previous successes or failures of individual rules. Once the outcome of this competition has been determined, the rule and message system becomes temporarily inactive and the credit assignment system takes over. Before the credit assignment system is discussed, note that the rule and message system is designed to activate several classifiers in parallel, which means that a learning classifier systems is a parallel production system even though it is generally implemented on sequential machines.

The purpose of the credit assignment system (Minsky, 1967) is to evaluate how useful each classifier is in producing desirable effects on the environment (how useful it is in solving the problem at hand). The method used most often is the bucket brigade algorithm of Holland (1962). In the bucket brigade algorithm every classifier is assigned a strength that represents that classifier’s usefulness in solving the problem. The classifiers that are eligible to take an action at a given time step bid a portion of their associated strength for the right to take their action. Once all the eligible rules have made their bids, winners are selected based on the size of the bids; those that bid highest have the highest chance of being selected to take their action. If the environment is in a more desirable state than at the previous time step, the auction winners pay their bids to the classifiers that took action at the previous time step. Thus, the rules that caused desirable changes in the environment are rewarded with payoff. A mandatory payment is also received from every classifier at each time step. This payment, or tax, helps weed out poor rules because in the rule discovery system a classifier’s existence depends on its associated strength. If a classifier fails to win the auction and receive a reward, its chance of survival is reduced. The bucket brigade algorithm’s redistribution of classifier strengths leads to bids that are representative of classifiers’ potential for achieving the goal; effective rules bid proportionately higher.

The purpose of the rule discovery system is to generate new rules with potential for more efficiently reaching the goal assigned to the learning classifier system. The genetic algorithm is the technique generally used in rule discovery systems. The genetic algorithm requires the elements of the search space (in this case the rules) to be coded as finite length strings. Effective classifiers, those with high associated strengths, are selected for combination with other highly fit classifiers. Portions of the classifiers are selected at random and combined with portions of other classifiers through the standard genetic algorithm operators of reproduction, crossover, and mutation to produce new rules. The intent is to combine advantageous qualities from two separate classifiers to form two new, more highly fit classifiers.

The three subsystems discussed in this section form a system capable of learning and evaluating new rules for interacting in an environment. A more detailed description of learning classifier systems can be found in Goldberg (1989).

Table of Contents

Products | Contact Us | About Us | Privacy | Ad Info | Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.