Candidate Elimination Algorithm with Example - JOB SAARNEE

# Candidate Elimination Algorithm

## 1. Introduction to Candidate Elimination Algorithm

1.  Candidate elimination algorithm overcomes the problem that is imposed by the Find-S algorithm. Find-S algorithm only considers positive training examples while Candidate elimination algorithm considers positive and negative both training examples.
2. The candidate-Elimination algorithm computes the version space containing all (and only those) hypotheses from H that are consistent with an observed sequence of training examples.
3. Candidate elimination algorithm performs Bi-directional search in the hypothesis space.
4. In case of positive training examples, it moves from top to bottom (Specific hypothesis to General Hypothesis) and in the case of negative training examples it moves from bottom to top (general hypothesis to specific hypothesis).

## 2 Some important terms used in Candidate Elimination Algorithm

1) Version space: - The version space which is denoted by VS ­H,D with respect to hypothesis space H and training examples D, is the subset of hypothesis from H consistent with training examples in D.
H,D = {h Є H | consistent (h , D)}
In simple words, version space is just an intermediate state of most specific boundary and most general boundary. Most specific boundary is denoted by S­0 ={‘φ’,’φ’,’φ’} and most general boundary is denoted by G­0 = {‘?’,’?’,’?’} where number of ‘φ’ and ‘?’ is equal to the number of attributes in training set.

2) Consistent hypothesis: - A hypothesis h is consistent with a set of training examples D if and only if h(x) = c(x) for each example (x , c(x) ) in D.

Candidate Elimination Algorithm:-
1. Initialize G to the set of maximally general hypothesis in H
2. Initialize H to the set of maximally specific hypothesis in H.
3. For each training example d, do
If d is a positive example
Remove from G any hypothesis that is inconsistent with d
For each hypothesis s in S that is not consistent with d
Remove s from S.
Add to S all minimal generalizations h of s such that
h consistent with d, some member of G is more general than h
Remove from S any hypothesis that is more general than another hypothesis in S
If d is a negative training example
Remove from S any hypothesis that is inconsistent with d
For each hypothesis g in G that is not consistent with d
Remove g from G.
Add to G all minimal specializations h of g such that
h consistent with d , Some member of S is more specific than h                                                     Remove from G any hypothesis that is less general than another hypothesis in G

Example: - we understand this algorithm with the help of below training example: -

 Sky Air Temperature Humidity Wind Water Forecast Enjoy Swimming Sunny Warm Normal Strong Warm Same Yes Sunny Warm High Strong Warm Same Yes Rainy Cold High Strong Warm Change No Sunny Warm High Strong Cool Change Yes

1  1. Initialize most general hypothesis in H
G­0 = <? , ? , ? , ? , ? , ?>
2. Initialize most specific hypothesis in H
S­0 =<φ , φ , φ , φ , φ , φ>
1  3.  Consider first positive training example:

X1=<sunny , warm , normal , strong, warm, same> +
In this candidate elimination algorithm checks S boundary and find that it is overly specific and it fails to cover the positive example. Therefore, S boundary must be generalized to fit this positive training example.
S­0         < φ , φ , φ , φ , φ , φ >
1        <sunny,warm,normal,strong,warm,same>
0, 1              < ? , ? , ? , ? , ? , ? >
4.  Now we consider second positive training example:
X2 = <sunny, warm, high, strong, warm, same> +
In this case we again make changes to S and leaving G again unchanged.
S1        <sunny, warm, normal, strong, warm, same>
S2        <sunny, warm, ?, strong, warm, same>
G0, G1, G2     < ? , ? , ? , ? , ? , ? >

5. Now we consider third negative training example :
X3= <Rainy, cold, high, strong, warm, change> -
In this case, candidate elimination algorithm checks G boundary and find that it is overly generalized and it fails to cover the negative example. Therefore G boundary must be specified to fit this negative training example.
S2, S3              < sunny, warm, ?, strong, warm, same>
G3                   <sunny, ?, ?, ?, ?, ?>  <?, warm, ?, ?, ?, ?>  <?, ?, ?, ?, ?, same>
G2                   <?, ?, ?, ?, ?, ?>
1.      6. Now we consider fourth positive training example:
X4= < sunny, warm, high, strong, cool, change > +
This positive training example further generalizes the S boundary of the version space.
S3        <sunny, warm, ?, strong, warm, same>
S4        <sunny, warm, ?, strong, ?, ? >
G4       <sunny, ?, ?, ?, ?, ?> <? , warm, ?, ?, ?, ?>
G3       <sunny, ?, ?, ?, ?, ?> <?, warm, ?, ?, ?, ? > <?, ?, ?, ?, ?, same>

Here one member of G3 boundary will also removed, because this member fails to cover this new positive training example.

Thus the version space which we will obtained over here is :-