Information Retrieval (Mustafa's Assignment 3)
I.
- i. Cystic Fibrosis and Human
- ii. Essentially, because these two keywords appear in every single document, they can be of no use since the search would not narrow the original selection of choices down at all; thus arriving at nothing conclusive and basically rendering the keywords ueless.
II.
- a.
- Query 1-- #5, #15, #17, #25, #29, #32, #36
- Query 2-- #1, #4, #6, #9, #14, #19, #21, #29, #36, #37, #38
- Query 3-- #2, #8, #11, #13, #14, #19, #20, #22, #24, #27, #32, #33, #34, #35
- Query 4-- #7, #13, #27, #30, #35
- Query 5-- #8, #10, #18, #32
- b.
- Query 1-- Recall: 4/7 Precision: 4/4
- Query 2-- Recall: 11/12 Precision: 11/13
- Query 3-- Recall: 8/14 Precision: 8/13
- Query 4-- Recall: 4/5 Precision: 4/5
- Query 5-- Recall: 4/4 Precision: 4/4
- c. None
- d.
- i. Recall is undefined. (This basically means nothing is recalled, thus 'Recall' is equal to 0.)
- ii. Precision is equal to 0.
- e1. D
- e2. A
- f. Method A does better overall. When performing a serach for documents, the person searching needs to have SOMETHING to look at. Search C, being TOO narrowly defined and, thus, overly exclusive yielded NO variables to be perused over in deciding ultimate relevancy.
- III.
- a.
- Query 1
- Doc 5-- [8.41/ ((256.78)^.5)] = .5248
- Doc 15--[8.41/ ((149.24)^.5)] = .6884
- Doc 17--[27.04/ ((383.09)^.5)] = 1.3815
- Doc 25--[8.41/ ((85.88)^.5)] = .9075
- Doc 29--[8.41/ ((90.34)^.5)] = .8848
- Doc 32--[8.41/ ((148.2)^.5)] = .6908
- Doc 36--[27.04/ ((307.02)^.5)] = 1.5432
- Query 2
- Doc 1--[4.41/ ((89.82)^.5)]= .4653
- Doc 4--[(4.41+4.84)/ ((59.12)^.5)]= 1.203
- Doc 6--[4.41/ ((80.13)^.5)]= .4927
- Doc 9--[(4.41+4.84)/ ((109.11)^.5)]= .8855
- Doc 14--[4.84/ ((98.43)^.5)]= .4878
- Doc 19--[(4.41+4.84)/ ((380.77)^.5)]= .4740
- Doc 21--[(4.41+4.84)/ ((76.19)^.5)]= 1.0597
- Doc 29--[4.41/ ((90.34)^.5)]= .46398
- Doc 36--[4.84/ ((307.02)^.5)]= .2762
- Doc 37--[(4.41+4.84)/ ((265.31)^.5)]= .5679
- Doc 38--[(4.41+4.84)/ ((119.85)^.5)]= .8449
- Query 3
- Doc 2--[7.29/ ((227.91)^.5)]= .4829
- Doc 8--[8.41/ ((92.53)^.5)]= .8743
- Doc 11--[7.29/ ((44.54)^.5)]= 1.0923
- Doc 13--[8.41/ ((264.27)^.5)]= .5173
- Doc 14--[13.69/ ((98.43)^.5)]= 1.37987
- Doc 19--[13.69/ ((380.77)^.5)]= .7016
- Doc 20--[7.29/ ((97.01)^.5)]= .7401
- Doc 22--[(27.04+7.29)/ ((191.91)^.5)]= 2.4781
- Doc 24--[8.41/ ((175.74)^.5)]= .6344
- Doc 27--[8.41/ ((57.8)^.5)]= 1.1062
- Doc 32--[13.69/ ((148.2)^.5)]= 1.12455
- Doc 33--[7.29/ ((100.91)^.5)]= .72571
- Doc 34--[7.29/ ((24.93)^.5)]= 1.46005
- Doc 35--[8.41/ ((51.28)^.5)]= 1.17442
- Query 4
- Doc 7--[8.41/ ((193.22)^.5)]= .6050
- Doc 13--[8.41/ ((264.27)^.5)]= .5173
- Doc 27--[8.41/ ((57.8)^.5)]= 1.1062
- Doc 30--[8.41/ ((159.46)^.5)]= .665994
- Doc 35--[8.41/ ((51.28)^.5)]= 1.17442
- Query 5
- Doc 8--[17.64/ ((92.53)^.5)]= 1.8388
- Doc 10--[13.69/ ((153.45)^.5)]= 1.10515
- Doc 18--[(13.69+17.64)/ ((337.01)^.5)]= 1.70663
- Doc 32--[13.69/ ((148.2)^.5)]= 1.12455
- b.
- Query 1-- Recall: 4/4 Precision: 4/4
- Query 2-- Recall: 11/13 Precision: 11/13
- Query 3-- Recall: 8/13 Precision: 8/13
- Query 4-- Recall: 4/5 Precision: 4/5
- Query 5-- Recall: 4/4 Precision: 4/4
- Avg. Recall: .8523
- Avg. Precision: .8523
- c. Essentially, my results show that the Vector Model did better with respect to recall while all three methods tied for their performance on precision; however, it seems that the vector model would have done best in this field as well. (*Mathematically, it is common sense that the Vector Model would be more precise since it CALCULATES the relevancy of a document before it searches.*) I believe the Vector Model would deliver the best search overall.
Extra Credit
- IV.
- Query 1
- Doc 5-- [141.52/ ((256.78)^.5)] = 8.8339
- Doc 15--[78.825/ ((149.24)^.5)] = 6.4504
- Doc 17--[211.005/ ((383.09)^.5)] = 10.6192
- Doc 25--[47.145/ ((85.88)^.5)] = 5.0857
- Doc 29--[49.375/ ((90.34)^.5)] = 5.1974
- Doc 32--[85.15/ ((148.2)^.5)] = 4.8602
- Doc 36--[145.19/ ((307.02)^.5)] = 8.2271
- Query 2
- Doc 1--[47.115/ ((89.82)^.5)]= 4.9699
- Doc 4--[34.185/ ((59.12)^.5)]= 4.4512
- Doc 6--[41.665/ ((80.13)^.5)]= 4.6920
- Doc 9--[59.18/ ((109.11)^.5)]= 5.6631
- Doc 14--[51.635/ ((98.43)^.5)]= 5.2051
- Doc 19--[185.01/ ((380.77)^.5)]= 9.9954
- Doc 21--[42.72/ ((76.19)^.5)]= 4.8934
- Doc 29--[40.06/ ((90.34)^.5)]= 4.2168
- Doc 36--[167.155/ ((307.02)^.5)]= 9.5408
- Doc 37--[123.885/ ((265.31)^.5)]= 7.6057
- Doc 38--[64.55/ ((119.85)^.5)]= 5.8963
- Query 3
- Doc 2--[117.6/ ((227.91)^.5)]= 7.7898
- Doc 8--[50.47/ ((92.53)^.5)]= 5.2468
- Doc 11--[25.915/ ((44.54)^.5)]= 3.8831
- Doc 13--[136.34/ ((264.27)^.5)]= 8.3869
- Doc 14--[56.06/ ((98.43)^.5)]= 5.6505
- Doc 19--[197.23/ ((380.77)^.5)]= 10.1075
- Doc 20--[52.15/ ((97.01)^.5)]= 5.2948
- Doc 22--[113.12/ ((191.91)^.5)]= 8.1656
- Doc 24--[92.075/ ((175.74)^.5)]= 6.9455
- Doc 27--[33.105/ ((57.8)^.5)]= 4.3544
- Doc 32--[80.945/ ((148.2)^.5)]= 6.6491
- Doc 33--[54.10/ ((100.91)^.5)]= 5.3856
- Doc 34--[16.11/ ((24.93)^.5)]= 3.2265
- Doc 35--[29.845/ ((51.28)^.5)]= 4.1677
- Query 4
- Doc 7--[100.815/ ((193.22)^.5)]= 7.2527
- Doc 13--[136.34/ ((264.27)^.5)]= 8.3869
- Doc 27--[33.105/ ((57.8)^.5)]= 4.3544
- Doc 30--[83.935/ ((159.46)^.5)]= 6.6469
- Doc 35--[29.845/ ((51.28)^.5)]= 4.1677
- Query 5
- Doc 8--[55.085/ ((92.53)^.5)]= 5.7265
- Doc 10--[83.57/ ((153.45)^.5)]= 6.7463
- Doc 18--[184.17/ ((337.01)^.5)]= 10.0322
- Doc 32--[80.945/ ((148.2)^.5)]= 6.6491