Dustin Stevens-Baier
COMP
578
10-9-06
Assignment #6
12. Given the
Bayesian network shown in Figure 5.48 compute the following
probabilities:
(a) P(B=good, F=empty, G=empty,
S=yes)
= (1 �
P (B = Bad)])* P (F = empty) * P (G = empty | B = good, F = empty) * (1 � P (S =
no | B = good, F = empty)) =.9 * .2 * .8
*.2 = .0144
(b) P(B=bad, F=empty, G=not empty,
S=no)
=P (B = Bad) * P (F
= empty) * (1 -
P (G = empty | B = bad, F
= empty)])* P (S = no | B = bad, F = empty) =
0.1 * .2 * .1 * 1 = 0.002
(c) Given that the battery is bad compute the
probability that the car will start.
P (S = yes) = Sg[1 - P (S = no | B = bad, F = g)] * P (B = bad) * P (F = g) = .1 * 0.1 * .8 + 0 * 0.1 *
0.2 = 0.000016
13. Consider the
one-dimensional data set shown in Table 5.13
(a)
Classify the data point x=5.0 according to its 1-,3-,5-, and 9-nearest
neighbors
The 1-nearest
neighbor y = + at x = 4.9 with D = 0.1 Class label +.
Majority vote
2-1 Class label -.
The 5-nearest neighbors are y = +
at x = 4.9 with D = 0.1, y = -
at x = 5.2
with D = 0.2, y =
- at x =
5.3
with D = 0.3 and y =
+ at x = 4.6 with D = 0.4.
Also you need to randomly select either y = + at x =
4.5 with
D = 0.5 or y = + at x = 5.5 with D = 0.5 Since they are both pluses in this case
it doesn't even matter.
Majority vote
3-2, it is class label +
The 9-nearest neighbors are y = + at x =
4.9 with D =
0.1, y = - at x = 5.2 with D = 0.2,
y = - at x = 5.3 with D = 0.3,
y = + at x = 4.6
with D = 0.4
y = + at x = 4.5 with D =
0.5, y = + at x = 5.5
with D = 0.5, y = - at
x = 3.0 with D = 2.0 and y =
- at x =
7.0 with D = 2.0.
Also you need to randomly either y = - at x = 0.5 with D = 4.5 or y = - at x =
9.5 with D = 4.5 Since they are both minuses it doesn't really
matter.
Majority vote
5-4, it is class label -
(b) Repeat the previous
analysis using the distance-weighted voting approach described in Section
5.2.1.
The 1-nearest neighbor is y = + at x =
4.9 with w =
100, Class label +.
The
3-nearest neighbors are y = + at x = 4.9 with w = 100, y = - at x = 5.2 with w = 25 and y = - at x = 5.3 with w =
11.11
Distance
weigthed voting results in 100 - 25 - 11.11 = Class Label
+
The 5-nearest neighbors are y = + at x =
4.9 with w =
100, y = - at x = 5.2 with w = 25,
y = - at x = 5.3 with w = 11.11 and
y = + at x = 4.6 with w =
6.25
Also need
to select either y = + at x = 4.5 with w = 4 or y = + at x = 5.5 with w =
4
Distance weigthed voting results
in 100 + 6.25 + 4 - 25 - 11.11 = Class label
+
The 9-nearest neighbors are y = + at x =
4.9 with w =
100, y = - at x = 5.2 with w = 25,
y = - at x = 5.3 with w = 11.11,
y = + at x = 4.6 with w =
6.25,
y = + at x =
4.5 with w = 4, y = + at x = 5.5 with w
= 4, y = - at x = 3.0 with w = 0.25,
y = - at x = 7.0 with w =
0.25
Also need to randomly select either y =
- at x = 0.5 with w = 0.05 or y = - at x = 9.5 with w
= 0.05
Distance weigthed voting results
in 100 + 6.25 + 4 + 4 - 25 - 11.11 - 0.25 -
0.25 - 0.05 = Class label +
16. (a) Demonstrate how
the perceptron model can be used to represent the AND and OR functions between a
pair of Boolean variables.
AND
| X1 |
X2 |
y |
| 0 |
0 |
-1 |
| 0 |
1 |
-1 |
| 1 |
0 |
-1 |
| 1 |
1 |
1 |
The AND function has on
a graph four points one at (0,0), (0,1), (1,0), (1,1) the first four a -1
and the last one is 1, A line can be drawn between the two batches of
points.
OR
| X1 |
X2 |
y |
| 0 |
0 |
-1 |
| 0 |
1 |
1 |
| 1 |
0 |
1 |
| 1 |
1 |
1 |
The OR function has
on a graph four points one at (0,0), (0,1), (1,0), (1,1) the
first one is -1 and the last three are 1, A line can be
drawn between the two batches of points.
(b) Comment on the disadvantage of using linear
functions as activation functions for multilayer neural
networks.
Networks that produce linear output to their input can
only classify and seperate problems that are linearly seperable. More complex
activation functions allow these networks to model complex relationships between the input and output
variables and can handle non-linearly seperable problems.
17. You are
asked to evaluate the performance of two classification models M1 and M2. The
test set you have chosen contains 26 binary attributes labeled as A through
Z. table 5.14 shows the posterior probabilities obtained by applying the
models to the test set. As this is a two-class problem, P(-) = 1-P(+) and
P(-|A,...,Z) = 1-P(+|A,...,Z). Assume that we are mostly interested in
detecting instances from the positive class.
(a) Plot the ROC curve for both M1 and M2. Which
model do you think is better? Explain your reasons.
The
M1 is better becuase it has far less Area Under the Curve than
M2.

(b) For model M1, suppose you choose
the cutoff threshold to be t=.5. In other words, any test instances whose
posterior probability is greater than t will be classsified as positive
example. Compute the precision, recall, and F-measure for the model at
this threshold value.
TP = 3, FN = 2, FP = 1, TN =
4
Precision = 3 / (3 + 1) = .75
Recall = 3 / (3 + 2) =
.6
F1 = 2*3 / (2*3 + 1 + 2) =
.67
(c)
Repeat the analysis for part c using the same cutoff
threshold on model M2. Compare the F-measure results for both
models. Which model is better? Are the
results consistent with what you expect from the ROC curve?
TP =
1, FN = 4, FP = 1, TN = 4
Precision = 1 / (1 + 1) = .5
Recall = 1
/ (1 + 4) = .2
F1 = 2*1 / (2*1 + 1 + 4) = .29
M1 has the higher f
measure therefor it is the better one.
(d) Repeat part (c) for
model M1 using the threshold t = .1. Which threshold do you prefer, t = .5 or t
= .1? Are the results consistent with what you expect from the ROC
curve?
TP = 5, FN = 0, FP = 4, TN = 1
Precision =
5 / (5 + 4) = .56
Recall = 5 / (5 + 0) = 1
F1 = 2*5 / (2*5 + 4 +
0) = .71
I like t = .5 better becuase it is just a little more towards
the middle of all the threshold values. For M1 t = .1 is at the beginning
and t=.5 is in the middle. For M2 t=.1 is in the middle and t=.5 is skewed
towards the right just a little.