Dustin Stevens-Baier
Data
Mining
10-2-06
Assignment#5
1. Consider a binary
classification problem with the following set of attributes and attribute
values:
Air Conditioner = {Working, Broken}
Engine = {Good,
Bad}
Mileage = {high, Medium, Low}
Rust = {Yes, No}
Suppose a rule
based classifier produces the following rule set:
Mileage = High -> Value =
Low
Mileage = Low -> Value = High
Air Conditioner = Working, Engine = Good ->
Value = High
Air Conditoner = Working, Engine = Bad -> Value =
Low
Air Conditioner = Broken -> Value = Low
(a) Are the
rules mutually exclusive?
No, they are not mutually exclusive
because you can have high mileage and air conditioner working and good engine
thus having conflicting values. This means that you have two rules
triggered by the same record.
(b) Is the rule set
exhausitive?
No it is not exhaustive becuase there is no rule
that covers rust.
(c) Is ordering needed for this set of the
rules?
yes ordering is needed becuase you need to know if
mileage is more important than air conditioning.
(d) Do you
need a default class for the rule set?
Yes a default rule is
needed because the rule set is not exhaustive. Whener this is the case you
need the default rule to cover the rest of the cases.
2. (a) Suppose r1 is covered by 350 positive
examples and 150 negative examples, while R2 is covered by 300 positive examples
and 50 negative examples. Compute the FOIL's information gain for
the rule R2 with respect to R1.
FOIL info gain = 350 *
(log2(350/(350+150)) - log2(300/(300+50))) = 350 * (log2(.7) -
log2(.857)) =
-30.759
(b) Consider a validation set that contains
500 positive examples and 500 negative examples. For R1 suppose the number
of positive examples covered by the rule is 200 and the number of negative
examples covered by the rule is 50. For R2 suppose the number of
positive examples covered by the rules is 100 and the number of negative
examples is 5. Computer the VIREP for both rules. Which rule does
IREP prefer?
R1 VIREP = (200 +(500-50)) / (500+500) = 13/20 =
.65
R2 VIREP = (100 + (500 - 5) )/ (500 + 500) = .595
The IREP prefers the
value with a higher values of virep so it prefers R1.
(c)
Computer VRIPPER for the previous problem. Which Rule doe RIPPER
prefer.
R1 VRIPPER = (200-50) / (200+50) =
.6
R2 VRIPPER = (100-5)/(100+5)=.905
The VRIPPER prefers R2.
7. Consider the data set shown in Table 5.10
(a) Estimate
the conditional probabilities for P(A|+), P(B|+), P(C|+), P(A|-), P(B|-), and
P(C|-).
P(A|+) = 3/5
P(B|+)= 1/5
P(C|+)=
4/5
P(A|-)= 2/5
P(B|-)= 2/5
P(C|-)=
5/5
(b) Use the estimate of
conditional probabilities given in the previous question to predict the class
label for a test sample (A=0, B=1, C=0) using the naive Bayes
approach.
P(+|No) = P(A=0 | -) * P(B=1 | -) * P(C=0 | -) =
3/5 * 2/5 * 0/5 = 0
P(+/Yes) = P(A=0 | + ) * P(B=1 | + ) * P (C=0 |
+ ) =
2/5 * 1/5 * 1/5= 2/125<
BR>
(c) estimate the conditional probabilties using the
m-estimate approach, with p = 1/2 and m =4.
P(A=0 | -) =
(3/5 + 4 * 1/2)/(3 + 4) = .371
P(B=1 | -) = (2/5 + 4 * 1/2) /(3 + 4) =
.343
P(C=0 | -) = (0/5 + 4*1/2)/(3 + 4) = .286
P(A=0 | + ) = (2/5 + 4 * 1/2)/(3 + 4) = .343
P(B=1 | + ) = (1/5 + 4 * 1/2)/(3
+ 4) =.314
P (C=0 | + ) = (1/5 + 4 *1/2)/(3 + 4) =
.314 < BR >
(d) Repeat part (b)
using the conditional probabilities given part (c).
P(+|No)
= P(A=0 | -) * P(B=1 | -) * P(C=0 | -) = (3/5 + 4 * 1/2)/(3 + 4) * (2/5 +
4 * 1/2) /(3 + 4) * (0/5 + 4*1/2)/(3 + 4) = .371 * .343 * .286 =
.036
P(+/Yes) = P(A=0 | + ) * P(B=1 | + ) * P (C=0 | + ) =
(2/5 + 4 * 1/2)/(3 + 4) * (1/5 + 4 * 1/2)/(3 + 4) * (1/5 + 4 *1/2)/(3
+ 4) = .343 * .314 * .314 =.034
(e) Compare the two methods for estimating
probabilities. Which method is better and why?
the second is
better becuase it gets rid of the zero porbability which in most cases is not
actually zero, the sample is just too small for the event to have
occured.