Dustin Stevens-Baier
Data Mining
10-2-06

Assignment#5


1. Consider a binary classification problem with the following set of attributes and attribute values: 

Air Conditioner = {Working, Broken}
Engine = {Good, Bad}
Mileage = {high, Medium, Low}
Rust = {Yes, No}

Suppose a rule based classifier produces the following rule set:

Mileage = High -> Value = Low
Mileage = Low -> Value = High
Air Conditioner = Working, Engine = Good -> Value = High
Air Conditoner = Working, Engine = Bad -> Value = Low
Air Conditioner = Broken -> Value = Low

(a) Are the rules mutually exclusive?

No, they are not mutually exclusive because you can have high mileage and air conditioner working and good engine thus having conflicting values.  This means that you have two rules  triggered by the same record.

(b) Is the rule set exhausitive?

No it is not exhaustive becuase there is no rule that covers rust.

(c) Is ordering needed for this set of the rules?

yes ordering is needed becuase you need to know if mileage is more important than air conditioning. 

(d) Do you need a default class for the rule set?

Yes a default rule is needed because the rule set is not exhaustive.  Whener this is the case you need the default rule to cover the rest of the cases.

2. (a) Suppose r1 is covered by 350 positive examples and 150 negative examples, while R2 is covered by 300 positive examples and 50 negative examples.   Compute the FOIL's information gain for the rule R2 with respect to R1.

FOIL info gain = 350 * (log2(350/(350+150)) - log2(300/(300+50)))   = 350 * (log2(.7) - log2(.857))  = -30.759

(b) Consider  a validation set that contains 500 positive examples and 500 negative examples.  For R1 suppose the number of positive examples covered by the rule is 200 and the number of negative examples covered by the rule is 50.  For R2 suppose the number of positive examples covered by the rules is 100 and the number of negative examples is 5.  Computer the VIREP for both rules.  Which rule does IREP prefer?

R1 VIREP = (200 +(500-50)) / (500+500) = 13/20 =   .65
R2 VIREP = (100 + (500 - 5) )/ (500 + 500) = .595

The IREP prefers the value with a higher values of virep so it prefers R1.

(c) Computer VRIPPER for the previous problem.  Which Rule doe RIPPER prefer.

R1 VRIPPER = (200-50) / (200+50)   = .6
R2 VRIPPER = (100-5)/(100+5)=.905

The VRIPPER prefers R2.

7. Consider the data set shown in Table 5.10

(a) Estimate the conditional probabilities for P(A|+), P(B|+), P(C|+), P(A|-), P(B|-), and P(C|-).

P(A|+) =  3/5

P(B|+)= 1/5

P(C|+)= 4/5

P(A|-)= 2/5

P(B|-)= 2/5

P(C|-)= 5/5

(b) Use the estimate of conditional probabilities given in the previous question to predict the class label for a test sample (A=0, B=1, C=0) using the naive Bayes approach.


P(+|No) = P(A=0 | -) * P(B=1 | -) * P(C=0 | -) = 3/5 * 2/5 * 0/5 = 0

P(+/Yes) = P(A=0 | + )  * P(B=1 | + ) * P (C=0 | + )  = 2/5 * 1/5 * 1/5= 2/125< BR>

(c) estimate the conditional probabilties using the m-estimate approach, with p = 1/2 and m =4.

P(A=0 | -) = (3/5 + 4 * 1/2)/(3 + 4) = .371
P(B=1 | -) = (2/5 + 4 * 1/2) /(3 + 4) = .343
P(C=0 | -) =  (0/5 + 4*1/2)/(3 + 4) = .286 
P(A=0 | + ) = (2/5 + 4 * 1/2)/(3 + 4) = .343
P(B=1 | + ) = (1/5 + 4 * 1/2)/(3 + 4) =.314
P (C=0 | + ) = (1/5 + 4 *1/2)/(3 + 4) = .314 < BR >

(d) Repeat part (b) using the conditional probabilities given part (c).


P(+|No) = P(A=0 | -) * P(B=1 | -) * P(C=0 | -) = (3/5 + 4 * 1/2)/(3 + 4)  * (2/5 + 4 * 1/2) /(3 + 4) * (0/5 + 4*1/2)/(3 + 4) = .371 * .343 * .286 = .036

P(+/Yes) = P(A=0 | + )  * P(B=1 | + ) * P (C=0 | + )  = (2/5 + 4 * 1/2)/(3 + 4)  * (1/5 + 4 * 1/2)/(3 + 4) * (1/5 + 4 *1/2)/(3 + 4) = .343 * .314 * .314 =.034


(e) Compare the two methods for estimating probabilities. Which method is better and why?

the second is better becuase it gets rid of the zero porbability which in most cases is not actually zero, the sample is just too small for the event to have occured.
Hosted by www.Geocities.ws

1