Ecological Validity of Psychological Tasks Glenn Mason-Riseborough (18/4/1997) This essay will discuss the ecological validity of a number of tests performed in the laboratory for a stage II paper (461.220) at the University of Auckland. Ecological validity is the amount with which performance in laboratory tasks are correlated to performance in everyday situations. The tests performed were the Stroop task and the Eriksen task. Both these tasks can be viewed as measures of attentional focus, where attentional focus is ‘the ability to focus out distracting information’ (Lambert, 1997). These two tasks were compared against the results of 25 questions from Broadbent’s Cognitive Failure Questionnaire (CFQ), and a test was performed to find any significant correlation between laboratory and everyday situations. In addition to discussing the results of the experiment and the ecological validity of the two tasks, this essay will also discuss the design features of the tasks and the CFQ. Strengths and weaknesses will be pointed out and improvements and further research discussed. The Stroop task (1) The Stroop task was performed on a computer to enable precise response times to be measured and recorded. The response times were recorded from when the question was initially displayed to when the subject responded. In this variation of the task, the subject responded by pressing a key (F5, F6, F7) that corresponded to one of three colours (red, blue, or green) of a letter string presented on the monitor. The meaning of the word would be either compatible, incompatible or neutral to the colour in which it was displayed. For example a compatible condition would be the word ‘RED’ displayed in red letters, an incompatible condition would be the word ‘RED’ displayed in green letters, and a neutral condition would be the word ‘XXX’ displayed in red letters. Both the times taken to respond and the errors produced were recorded for each of the three conditions for each subject. The Eriksen task (2) This task was also performed on the computer. Again response times and errors produced were recorded for compatible, incompatible, and neutral conditions for each subject. In this task the subjects were asked to respond to a target letter whilst ignoring the distracter letters on both sides of it. The target letter was always ‘A’ or ‘B’, and the distracters were either ‘A’, ‘B’, or ‘X’. Thus, compatible strings were ‘A A A’ or ‘B B B’, incompatible strings were ‘A B A’ or ‘B A B’ and neutral strings were ‘X A X’ or ‘X B X’. The strings were placed randomly across a central horizontal line on the monitor so that the subject could not know beforehand exactly where the target letter would appear. Broadbent’s Cognitive Failure Questionnaire (CFQ) This questionnaire is a self report questionnaire. It contains questions regarding ‘failures in perception, memory and motor function’ (Broadbent, 1982). For example ‘Do you find you forget appointments?’ (Lambert, 1997). The subjects are asked to respond by marking on a scale from 0 to 4 how often they make each of these mistakes. 0 refers to never, 1 is very rarely, 2 is occasionally, 3 is quite often, and 4 is very often. The subject’s answers are then added together. For example a subject may have answered 1, 4, 2, 2, 0 for the first 5 questions, this would add to give a result of 9. In this experiment the subjects were asked to answer 25 questions, thus giving a range of possible results between 0 and 100 inclusive. In a paper on the CFQ, Broadbent and colleagues (1982) discuss the accuracy of the questionnaire in relation to it being a self report questionnaire. It seems that there is agreement between the subjects’ self-reporting in the CFQ and reports made by people who know the subjects well. There is also evidence which indicates that it does, in fact, measure a general liability to failure in everyday situations. Thus, in this experiment the assumption is that the CFQ accurately measures failures in selective attention in everyday situations, and is thus an ecologically valid questionnaire. Results Data from the laboratory was collated and assessed using Pearson’s r. This is a statistical tool which measures correlation between sets of data. In this situation we were looking for a relationship between the CFQ and our laboratory tests. There was 120 subjects tested for the Stroop task and 123 subjects for the Eriksen task. The CFQ was compared against both response times and errors for the Stroop task and the Eriksen task. More specifically, the CFQ was compared against incompatible, compatible, and neutral conditions, as well as incompatible minus compatible and mean for each of the above four tests. These results are summarised below in tables 1 to 4 (Lambert, 1997). The tables contain the mean, standard deviation (S.D.), Pearson r with CFQ, and the significance level of correlation (p), n.s. means not significant. Table 1 -- Performance on the Stroop task: Response times (ms) Response Incompatible Condition Response Compatible Condition Control Condition Inc. RT - Com. RT Mean RT Mean RT (ms) 814 685 677 129 725 S.D. (ms) 233 166 148 112 176 Pearson r with CFQ -0.0005 0.01 0.002 -0.02 0.004 p n.s. n.s. n.s. n.s. n.s. Table 2 -- Performance on the Stroop task: Errors (%) Response Incompatible Condition Response Compatible Condition Control Condition Inc. Error - Com. Error Mean % Error Error (%) 6.0 2.2 2.4 3.8 3.5 S.D. (%) 7.6 3.2 3.9 7.6 3.8 Pearson r with CFQ 0.215 0.089 0.136 0.177 0.215 p <0.01 n.s. n.s. <0.05 <0.01 Table 3 -- Performance on the Eriksen task: Response times (ms) Response Incompatible Condition Response Compatible Condition Control Condition Inc. RT - Com. RT Mean RT Mean RT (ms) 625 571 585 54 593 S.D. (ms) 95 91 116 47 93 Pearson r with CFQ 0.050 0.045 0.002 0.014 0.033 p n.s. n.s. n.s. n.s. n.s. Table 4 -- Performance on the Eriksen task: Errors (%) Response Incompatible Condition Response Compatible Condition Control Condition Inc. Error - Com. Error Mean % Error Error (%) 5.0 2.3 2.7 2.7 3.4 S.D. (%) 5.0 3.0 3.2 4.8 3.0 Pearson r with CFQ 0.033 0.021 0.070 0.021 0.050 p n.s. n.s. n.s. n.s. n.s. Analysis of results It can be seen from the tabulated results above that the CFQ is not significantly correlated to the response speed (Table 3) or the error rates (Table 4) for the Eriksen task. It can also be seen that the CFQ is not significantly correlated to the response speed of the Stroop task (Table 1). Table 2, however, shows that there is correlation in some areas between the CFQ and the error rate of the Stroop task. There was a positive significant correlation between the CFQ and the response incompatible condition, between the CFQ and the incompatible error minus the compatible error, and between the CFQ and the mean percentage error. However, the correlation was less significant for the incompatible error minus compatible error. Consequently, it can be seen from this experiment that performance in both response speed and error rates in the Eriksen task does not relate to everyday cognitive performance. In addition, response speed performance in the Stroop task does not relate to everyday cognitive performance. However, the error rates of the Stroop task warrant further discussion. As stated above, this experiment showed significant correlation between some error rates of the Stroop task and the CFQ. This significance seemed primarily to result from responses to the incompatible condition. This may be assumed because the incompatible condition plays a part in all three calculations where significance is detected. Thus we may conclude that errors in responding to incompatible conditions in the Stroop task may relate to performance in everyday cognitive situations. We may thus regard this part of the Stroop task as being ecologically valid. Strengths & weaknesses of the experimental design and procedure Firstly strengths and weaknesses of the Stroop task. The major controversial part of this task seemed to be with what words were written in the coloured ink. This version of the task was obviously designed with the assumption that all subjects had English as their first language and they knew what the colour words meant. It is assumed that any subject who did not understand the words would not get the compatible or incompatible effect and would see all words as neutral. Another point is that the neutral word used was, in fact, not a word. There is some strength to the argument that the neutral word should be a meaningful word which must be processed in the same way as the colour words. Some suggestions may be words such as ‘THAT’ or WHO’ which (for most people) are not associated with any particular colour. A third minor point is that this test would not work for people with colour-blindness (or blindness in general). A possible alternative task would contain aural cues instead of visual cues. A strength of the Eriksen task is its simplicity. There are only six possible combinations of the three letters -- two for each of conditions. Despite the simplicity and the small number of combinations, it seems as if the subjects have different reaction times to different letter combinations. Further studies may shed light on whether after many trials the subjects memorise all six combinations and the difference in reaction times decreases. Since both tasks were performed on the computer, the measurement of exact reaction times was not a problem. Also the computer was able to quickly calculate the average reaction times for each condition for each subject and save the results in a database. Strengths & weaknesses of the CFQ There were many comments in the laboratory that some of the questions did not seem to relate to attentional failures. For example the question ‘do you drop things?’ could equally be connected to any number of other problems such as neurological motor problems or paralysis, rather than failure of attention. However, Broadbent (1982) states explicitly that the CFQ also measures failures in motor function in addition to perception and memory. Perhaps this is because in most cases general clumsiness is attentional rather than physical. In these situations it may be a good idea to be aware of the medical background of the subjects or make the questions more explicit. Related to this is the problem that a graded scale of five possible answers may not give the subject scope enough to answer. A simple 0 to 4 grading is extremely limiting considering the wide range of possible scenarios with which the question encompasses. On the other hand, such limited responses mean that subjects are assigned with a simple numerical grading, thus allowing large numbers of subjects to be tested, and large amounts of data to be collected and compared. Results tend to suggest that the simplification of results still leaves the data meaningful. Conclusions There is no denying that the Stroop task and the Eriksen task do measure something in the laboratory. Whether they actually measure something in terms of everyday attentional situations is another matter entirely. The experiment conducted and discussed in this essay showed that the Eriksen task is not significantly related to Broadbent’s Cognitive Failure Questionnaire (CFQ). The experiment also showed that only parts of the error rates of the Stroop task significantly correlated to the CFQ -- the response speed of the Stroop task showed no correlation. We have assumed that the CFQ is a good measure of everyday attentional failures. Thus it can be concluded that the Eriksen task has no ecological validity specific to attentional failures and the Stroop task has slight ecological validity, but only for its error rates section. It is impossible to comment on the ecological validity of laboratory assessments of attentional performance in general except to say that not all of these laboratory tasks have ecological validity. This essay in no way states that experiments on attention should not be undertaken in the laboratories. However in terms of practical information gained for real life situations, laboratory assessment of attentional performance could be questioned. References: Broadbent, D. E., P. F. Cooper, P. FitzGerald, & K. R. Parkes (1982). The Cognitive Failure Questionnaire (CFQ) and its correlates. British Journal of Clinical Psychology, 21, 1 - 16. Lambert, T. (1997). Unpublished laboratory notes for University of Auckland paper 461.220. Footnotes: 1 This task was designed by John Ridley Stroop in 1935 and has been repeated many times in various forms since (Lambert, 1997). 2 The Eriksen task was originally designed by Charles Eriksen at the University of Illinois (Lambert, 1997).