Prosodic Cues Glenn Mason-Riseborough (26/5/1997) Introduction In everyday spoken language, people express themselves by varying the pitch, stress, timing, amplitude and intonation of their words to emphasise different aspects of the sentence. This variation in speech is called prosody. This essay will discuss whether or not listeners use prosodic cues to resolve ambiguity in spoken language. Three sets of data will be analysed in this essay in order to examine and resolve the question. The first of these is an off-line study reported by McAllister (1997). This study involved the subjects choosing the correct ending to truncated attachment (1) and closure (2) sentences. The second is a shadowing experiment also reported by McAllister (1997). In this experiment the number of fluent restorations was recorded for subjects shadowing sentences with mispronounced words. The third set of data is a cross-modal naming experiment from a paper published by Marslen-Wilson and colleagues (1992). This is an on-line study looking at attachment sentences. Off-line Experiment An off-line study is an experiment in which the results are not based on the time the subjects took to perform the task. The subjects are given the task and can take as long as they like to respond. The results are then based entirely on whether the task was performed correctly. In this experiment 40 subjects (native speakers of New Zealand English) were given the initial part of sentences and asked to pick the correct ending. These sentences had been recorded by four speakers (again native speakers of New Zealand English). Each speaker read 20 attachment sentences (both MA and NMA for each sentence) and 20 closure sentences (both EC and LC for each sentence) for a total of 80 sentences each. Four tapes were made for each speaker, containing five attachment and five closure sentences on each. Each tape was played to ten of the 40 subjects. Thus, each group of ten subjects listened to ten sentences for each of the four speakers. The results are summarised in Table 1 (maximum is 5.0). Table 1: Mean number of accurate responses to sentence- initial fragments LC EC MA NMA Mean Speaker 1 4.2 3.9 2.1 3.0 3.3 Speaker 2 3.7 4.0 2.3 3.2 3.3 Speaker 3 3.9 4.1 2.6 2.9 3.3 Speaker 4 4.0 3.9 1.5 2.2 2.9 Mean 4.0 4.0 2.1 2.8 3.25 This data was analysed using a two-way analysis of variance (ANOVA): Speaker (4 levels, speaker 1-4) X Sentence Type (4 levels, LC, EC, MA, NMA). For a comparison of speakers, F(3, 117) = 149.91 and p < 0.0001. A T-test of the means showed that speakers 1, 2, and 3 did not differ significantly, however the scores for speaker 4 were significantly less than the other three speakers. From this we can conclude that some speakers use prosody more than other speakers. In some situations it may be possible to draw information from prosodic cues, however in other situations and listening to other people it may be more difficult. For the sentence type F(3, 117) = 1379.96 and p < 0.0001. The two closure sentences had significantly greater scores than the two attachment sentences. The closure sentences were not significantly different from each other, however the NMA sentences had significantly greater correct identifications than the MA sentences. This data shows that people can pick up prosodic cues better for closure sentences than attachment sentences. LC and EC were detected about the same proportion of the time, and with a mean of 4.0 out of 5.0, they are both well above the levels of chance (2.5). Thus we can say that in general people can use prosody to correctly identify closure sentences. On the other hand, MA and NMA sentences were only correctly identified at the level of chance (the means were 2.1 and 2.8 for each respectively). People cannot identify from listening to prosody whether an attachment sentence will be minimal or non- minimal. The significantly greater correct identifications for NMA over MA sentences contradicts the theory that listeners expect MA in preference to NMA sentences. The data was also analysed for speaker X sentence type interaction. F(9, 351) = 335.54 and p < 0.0001. This result showed that all speakers except speaker 3 showed LC = EC > NMA > MA. The results for speaker 3 were LC = EC > NMA = MA. This result shows the same conclusions as the two analyses above. That is, ability to detect sentence endings depend on the speaker and closure sentences can be analysed with more accuracy than attachment sentences. This sort of study (off-line) seems to parallel everyday situations in some ways but not in others. In real life, as with this study, we are not timed when we listen and come to understand speech. On the other hand there are very few situations when we only hear half a sentence. Further studies may shed light on whether some people are better at accurately detecting prosodic cues than others. Are some people able to detect prosodic cues for some types of sentences but not others? Are some social groups (e.g. race, culture, gender) better at predicting sentence endings than others? To answer these questions, the results would have to be analysed based on sub-groupings of subjects rather than by speaker. Studies could also be conducted to find out which prosodic cues are detected. If we eliminate the pitch variation but leave in the pauses (or vice versa), could the ending still be predicted? All these questions could be tackled using a number of experimental designs including off-line, on-line or shadowing studies. Shadowing Experiment In a shadowing experiment, the subjects are asked to listen to a spoken sentence and repeat it while they are still listening to it. In this experiment, the sentences contained mispronounced words which differed by a single feature from the proper pronunciation, for example “gat” instead of “cat”. The assumption was that people correct errors if they are expecting the word (fluent restoration), but will repeat the error if it is an unexpected word. In this experiment, 30 subjects (native speakers of Scottish English) were asked to shadow sentences on a tape (15 shadowed sentences from one tape and the other 15 from a different tape). The tapes were copied from 200 sentences recorded by a native Scottish English speaker. A breakdown of these sentences was: 20 pairs of closure sentences and 20 pairs of attachment sentences which contain mispronunciations, 120 correctly pronounced sentences of which ten were closure and 10 were attachment. Both tapes contained the 120 correctly pronounced sentences and they each contained 40 mispronounced sentences. Each tape did not contain both versions of a single sentence, and they contained ten examples of each of the four sentence types. The number of fluent restorations for each type of sentence was recorded. This is summarised in Table 2 (maximum for each type is 10.0). Table 2: Fluent restorations in a shadowing experiment LC EC MA NMA 4.0 4.2 3.9 1.2 This data was analysed using a one-way ANOVA (4 levels: LC, EC, MA, NMA). This resulted in F(3, 87) = 201.36 and p< 0.0001. A T-test revealed that fluent restorations were equally likely in LC, EC and MA sentences, however fluent restorations were significantly less likely to occur in NMA sentences. This result can be explained using two assumptions. Firstly, when people hear sentences they expect LC over EC and MA over NMA. Secondly, the existence of prosodic cues allows people to differentiate between EC and LC but not between MA and NMA. The first assumption means that because people are expecting LC and MA, they are less likely to be “surprised” by the mispronounced word, not notice it, and fluently restore it. Thus there will be a greater number of fluent restorations in LC and MA over EC and NMA. The second assumption states that both LC and EC can be detected. Thus, people are also not surprised by EC and the mispronounced words are fluently restored at the same rate as the LC. However, NMA sentences are not detected, they are a surprise, therefore the mispronounced words are detected and there is less fluent restoration. From this theory we can conclude that prosodic cues are used to resolve ambiguity in closure sentences but not for attachment sentences. Perhaps the major complaint in this study is the observation that in everyday life we do not parrot back sentences after hearing them. Also we could question the assumption that fluent restorations indicate prosody is being used. Cross-Modal Naming Experiment This experiment is reported in a paper published by Marslen- Wilson and colleagues (1992). It has an on-line and off-line component, the first designed to investigate whether sentences are disambiguated as soon as prosodic information is presented and before other information becomes available. Unlike the experiments discussed above, this experiment included only attachment sentences and ignored closure sentences. There were 40 subjects tested in four groups of ten. Each group was given a different sentence list. For the experiment, the subjects heard a partial sentence then immediately after were shown a single word. They were then asked to name the word (visual probe) as quickly as possible and then indicate on a score sheet on a scale of 0 to 10 whether they thought the probe word was appropriate (10) or inappropriate (0) for the continuation of the sentence. The time taken to respond was recorded for later analysis. The off-line scoring task was given to ensure the subjects attended to the auditory prime and to measure the subjective continuity appropriateness of the visual words. Results are summarised in Table 3, responses over 1000 ms were ignored, as were erroneous repetitions. Difference scores were calculated by subtracting mean latencies for test conditions from mean latencies for neutral carrier-phrase. NUM refers to the control condition in which there was number violation, i.e. “were” instead of “was” or vice versa. Table 3: Mean response latencies by condition Raw (ms) Difference (ms) Appropriateness NMA + Comp 375 -11 9.9 NMA - Comp 378 -10 9.3 MA 397 +14 8.3 NUM 412 +24 0.5 Results were analysed using a one-way ANOVA (4 levels: NMA + Comp, NMA - Comp, MA, NUM). For the raw data F(3, 145) = 2.84 and for the difference data F(3, 177) = 3.73. The p value for both was at the 5% level of significance. If prosody was not being used, then NMA - Comp and MA would both be interpreted the same with an incompatible visual probe. On the other hand, if prosodic cues were used, then MA conditions would be interpreted significantly slower, and only NMA - Comp would be comparable with NMA + Comp. The F values fail to find this significance, however the item analysis shows a difference between MA and NMA - Comp, and the subject analysis shows significant (p<0.05) differences between MA and both NMA conditions. Surprisingly, when we look at the appropriateness scores the MA score is comparable to both the NMA scores and not the NUM score. Thus, the MA is similar to the NMA in the off-line component, but similar to the NUM in the on-line component. These results show that prosodic cues can be used to resolve ambiguity on-line when given partial attachment sentences. However there seems to be a contradiction when we examine the off-line appropriateness task. This seems to suggest that the new conflicting information overrides the preference for MA, and this is done without the listeners noticing. We can conclude, therefore that prosodic cues are used, but tend to be overridden by any morphosyntactic cues which conflict. Conclusions The three sets of data analysed in this essay all show that prosodic cues are used to resolve ambiguity in spoken language. However they each show this in a different way, and even conflict with each other in some instances. Both the off-line experiment and the shadowing experiment reported by McAllister (1997) concluded that prosody was used to resolve ambiguity in closure sentences but not in attachment sentences. This was also backed up by the off-line task on appropriateness in Marslen-Wilson and colleagues (1992) paper. However the on-line task reported in this paper indicated that in fact prosody was also used to resolve ambiguity in attachment sentences. The authors concluded that this contradiction could be resolved by assuming that morphosyntactic cues are dominant over prosodic cues and when they conflict, prosodic cues are ignored. As Marslen-Wilson and colleagues (1992) note, it is important to investigate problems with a wide range of experimental techniques. It may be true that only by contrasting data from various sources are we able to achieve a conclusive synthesis of the results. References: McAllister, J. (1997). Unpublished laboratory notes for University of Auckland paper 461.220. Marslen-Wilson, W., Tyler, L. K., Warren, P., Grenier, P., & Lee, C. S. (1992). Prosodic effects in minimal attachment. Quarterly journal of Experimental Psychology, 45A, 73-87. Endnotes: 1 Sentences can be either minimal attachment (MA) or non-minimal attachment (NMA). MA sentences have the fewest nodes in their syntactic tree. In other words a NMA sentence has a complete sentence embedded in it. For example, the seller agreed the price of the house with the agent is MA and the seller agreed the price of the house was too high is NMA. 2 Sentences can either have early closure (EC) or late closure (LC). A LC sentence is one which closes as late as possible. For example, whenever the king rides his beautiful white horse, it’s carefully groomed is LC, while whenever the king rides, his beautiful white horse is carefully groomed is EC.