Item Analysis of Raven's Progressive Matrices Glenn Mason-Riseborough (22/4/1998) The Ravens Progressive Matrices (1947 version) contained 48 items in Set II. For the 1962 version, 12 of those 48 items were dropped which were considered not to contribute well to the test as a whole. The aim of this report is to analyze the 48 items of the 1947 version and identify 12 items that could be dropped. The data used for this report was obtained from 179 subjects and is summarized in Figure 1; the circled numbers indicate correct responses. This report will assume that all the items in the test are equally qualitatively valid; that is to say, that they equally well test the domain that is intended to be tested. Thus, the emphasis of this report will be on analyzing the items quantitatively. An emphasis will be placed on statistical analysis, and for this reason the point-biserial correlation of each item will be an important factor in determining whether the item is included or not. Another factor to be considered is whether any item answers are ambiguous. To determine whether this it true, answers will be examined to check if there is an abnormally large number of responses for a particular incorrect choice. Thirdly, extremely easy and extremely difficult items should be examined. It is argued that an item that all subjects answer correctly or incorrectly does not contribute well to the test. A point-biserial correlation (pbsr) gives us a statistical method of analyzing which items of a test are the most predictive of a subject’s overall score. It takes into account the proportion of subjects who answered the item correctly (p) and the mean overall score of those subjects (Xp), the proportion of subjects who answered the item incorrectly (q) and the mean overall score of those subjects (Xf), and the overall standard deviation of the test (SD). It assumes that it is desirable to have 50% of the subjects answer the item correctly, and the subjects who answer the item correctly should have a significantly higher overall score than those who answered the item incorrectly. Hence, a high point-biserial correlation indicates that the item correlates well with the total score. Thus, if this was our only consideration, we could say that the 12 items with the lowest point-biserial correlation are the ones to be dropped from the test. If we rank the 48 items in ascending order of their point-biserial correlation we get: 44, 46, 48, 6, 45, 12, 28, 32, 17, 42, 40, 2, 47, 19, 39, 7, 38, 43, 41, 27, 37, 1, 23, 8, 24, 30, 20, 14, 22, 16, 13, 3, 18, 4, 10, 34, 15, 5, 31, 29, 9, 36, 21, 26, 35, 25, 11, 33. Thus, the 12 items that contribute least to the test are 2, 6, 12, 17, 28, 32, 40, 42, 44, 45, 46, and 48. It can be noted that this includes 6 of the last 9 items. The test is intended to be ranked in order of difficulty – the most difficult items are at the end. Thus, we can explain the low point-biserial correlations of the items at the end by suggesting that many subjects may have simply guessed the answers, and thus these items were not as beneficial as the others. Alternatively, these items may not have been quantitatively more difficult, it may be the case that by the end of the test the subjects were feeling fatigued or bored and were thus not as able or motivated to answer the questions correctly. To determine if this were true, it may be necessary to give the test with the items in a different order. If this led to greater accuracy for these items it would change the point- biserial correlations, and possibly the items to be dropped. It must also be noted that there is very little difference between point-biserial correlations for a number of items; for example items 2, 47 and 19 have point-biserial correlations of 0.372, 0.373, and 0.374 respectively. Item 2 was to be dropped; the other two items were not. Another consideration is to check each item individually for ambiguity. The test is designed such that all incorrect options should be equally likely to be chosen. Thus, any item that has received a large number of responses for a particular incorrect option should be dropped. In particular: ? Item 19 has 37 incorrect responses for option 5, 122 correct responses for option 2, and minimal responses for the other options. ? Item 23 has 40 incorrect responses for option 8, 80 correct responses for option 7, and minimal responses for the other options. ? Item 30 has 34 incorrect responses for option 3, 120 correct responses for option 6, and minimal responses for the other options. ? Item 47 has 38 incorrect responses for option 1, 42 correct responses for option 3, and minimal responses for the other options. Many other items could be considered in this list, where one or more incorrect options have received more responses than randomness would seem to allow; for example items 24, 27, 31, 34, and 36. However it does not seem as if these are as significant as the four items listed above. The p-value is an indicator of the difficulty of an item. A high p- value indicates that many of the subjects answered the item correctly and it is thus considered easy; conversely low p-values indicate that the item is difficult. As stated above, the point- biserial correlation includes the p-value as part of its formula. The point-biserial correlation is able to identify extremely difficult items (low p-value) which are also often answered randomly (small difference between Xp and Xf), for example many of the items at the end of the list. However, it is unable to identify extremely easy items (p-value close to 1) as these items often have extremely large differences between Xp and Xf. For example 177 subjects responded correctly to item 1, giving a p- value of 0.989, the 2 remaining subjects did not answer at all. These two subjects either did not answer any item or answered every item incorrectly, giving an Xf of 0.00, an extremely large Xp/Xf difference, and a moderately high pbsr. Hence, it could be considered that item 1 does not contribute well to the test as a whole, as all subjects who responded did so correctly, and those who did not were identified by all the other items. There are other items with high p-values and yet have moderate to high point-biserial correlations, for example items 3, 5, 7, and 18. However, all of these have a few incorrect responses. It should also be noted that the subjects were all University students, and as such are expected to score higher on the test. The easier items should not be dropped simply based on this test group’s performance. The results obtained from the point-biserial correlations provide important initial information regarding the value of individual items to an overall test. However, other considerations are also important, such as ambiguous and extremely easy or difficult items. It is the recommendation of this report that item 1 be dropped because almost all subjects responded correctly; items 19, 23, 30 and 47 be dropped because of ambiguity; and items 6, 12, 28, 44, 45, 46, and 48 be dropped because of low point- biserial correlations. Thus, the 36 items not to be dropped are: 2, 3, 4, 5, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 20, 21, 22, 24, 25, 26, 27, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, and 43. References: Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Prentice-Hall.