Discourse Particles in Female and Male Human-Computer-Interaction
Kerstin Fischer, Britta Wrede
University of Bielefeld
This paper presents an investigation of the differences between human-to-human and human-to-computer communication with respect to discourse particles. In particular, quantitative and functional analyses were carried out for the distribution and functions discourse particles fulfill in two large German corpora. It was found that the number of discourse particles does not simply decrease as it was suggested by Hitzenberger & Womser-Hacker (1995), for instance. Especially female speech does not follow the predictions. Furthermore, many discourse particles were found to undergo a strong functional shift. Functional analyses are therefore indispensable for the study of the role of discourse particles in the human-computer interface.
During the last 20 years, interest increased in the communicative behavior of speakers in the interaction with artificial communicators. More recently, an independent register computer talk was postulated (Krause & Hitzenberger 1992, Marx 1996). The hypothesis is that an artificial communication partner consistently influences the properties of the speakers' communicative behavior. These properties concern the following aspects of human-machine communication, in contrast to human-to-human communication (Hitzenberger & Womser-Hacker 1995: 56 (translation ours)):
increase in differences compared to normal speech
modification of syntactic constructions
increasing number of overspecifications
increase in instances of formal code
decreasing number of framing elements in dialogues
decreasing number of politeness formula
decreasing number of partner-oriented dialogue signals
decreasing number of particles
The register computer talk is only defined negatively in opposition to human-to-human communication. Furthermore, the authors do not consider socio-linguistic variables, especially gender, and they do not make any distinctions within the linguistic structures they describe. Moreover, their predictions mainly concern the quantitative distribution of properties of speech.
In this paper, we want to look at the features of computer talk concerning the use of discourse particles in human-machine interaction. In the list provided by Hitzenberger & Womser-Hacker (1995), it is not entirely clear how the last four features can be distinguished and where discourse particles are located since framing, partner-orientation and the creation of a harmonious and polite atmosphere belong to the many functions discourse particles fulfill in spontaneous spoken language dialogues (Schiffrin 1987, Fischer & Drescher 1996). It can be taken for certain, however, that irrespective of whether discourse particles are included in the framing elements, politeness formula, partner-oriented signals, or in the class of particles, the computer talk hypothesis asserts that in human-machine communication the number of discourse particles decreases.
Discourse particles (Schiffrin 1987), i.e. segmentation markers and interjections such as oh, yes, well, ah, now, as well as hesitation markers like er and um, fulfill extremely many different functions in spontaneous spoken language dialogues. For example, they segment and connect utterances in spoken language and support the turn-taking system; they mark important information; they provide the speaker time to think and the hearer time to adjust to the voice quality of the speaker. Moreover, they establish a harmonious atmosphere between the communication partners and smooth possibly problematic information. They introduce new topics and help to structure the argumentation in the dialogue, as well as the construction of the dialogue itself.. In human-to-human communication, discourse particles are therefore extremely multifunctional..
The aim of this paper is firstly to determine in how far the predictions about the distribution of discourse particles are true of our corpora of German spontaneous spoken language dialogues. It will turn out that while many discourse particles are distributed in accordance with the predictions of the computer-talk hypothesis, others are not, i.e. discourse particles cannot be considered a homogeneous class in this respect.
Secondly, we want to find out whether the register computer talk can be equally identified for female and male speakers. Again, with respect to discourse particles, severe differences could be found which cast doubts on the validity of the computer talk hypothesis as it is formulated by Hitzenberger & Womser-Hacker (1995).
Two corpora serve as the basis for the following analyses (Sagerer et al. 1994, Brindöpke et al. 1995); both were recorded in the same domain and are therefore comparable. The task the participants had to fulfill was in both cases to instruct someone to build a toy-airplane. The two corpora differ however in that the constructor in the first setting was another human communicator; in the other scenario, the participants believed to talk to an automatic speech processing system. The data therefore povide a basis to compare human-to-human communication with verbal human-computer interaction.
The dialogues were recorded on DAT. The human-to-computer dialogues were transcribed according to Fink et al. (1995), the human-to-human dialogues are currently retranscribed according to the same transcription conventions. The transcriptions form the basis for the investigations presented.
The Human-to-Human Scenario (Sagerer et al. 1994)
The 22 probants in this scenario had to solve two tasks: first, they were asked to construct a toy-airplane themselves, following an illustration. Then each probant had to instruct another person to construct this toy-airplane in a relaxed face-to-face situation. In most of the dialogues, the communication partners could not see each other. Sometimes they were only restricted in having a look at the others construction, sometimes they could not see their communication partners at all. However, no systematic variation with respect to sight could be found concerning the distribution of discourse particles. Consequently, in the following the dialogues are treated as one corpus.
Furthermore, to ensure comparability with the dialogues in the human-to-machine scenario where the constructors` utterances consisted in prefabricated units, in the following only the instructors` utterances are considered.
The 22 dialogues consist of altogether 25914 words, with a mean length of 1178 words per dialogue. There was an equal number of male and female participants who were all university students.
The (Simulated) Human-to-Computer Scenario (Brindöpke et al. 1995)
The tasks the 40 probants had to solve in this experiment were almost the same as in the previous one with the difference that they had to instruct an artificial system via microphone to build the toy-airplane. In fact, the behaviour of the artificial intelligent system was simulated by two people (wizards) in another room. One person built the toy-airplane according to the participants instructions, the other selected verbal messages to simulate the speech processing system`s output.
After every instruction, a snap-shot of the resulting state of construction was taken by a camera which was controlled by the second person. The picture was then transferred to the screen in front of the probant. In the meantime, the other person could send a message to a text-to-speech synthesizer which the probant could hear over head phones.
In order to make the simulated behaviour of the artificial intelligent system more convincing, the wizards had to decrease their cooperativity according to the following restrictions:
reject instructions that contain words that one would assume an artificial system would not understand;
randomly reject a certain number of instructions to simulate recognition errors;
reject instructions that require memory;
ignore instructions concerning objects that are not precisely specified by the instructor;
reject instructions which are too global or underspecified.
In addition, the probants were asked to fill out a questionnaire concerning their opinion about the characteristics of the artificial system. Only three of the forty probants answered that they had doubted the existence of such an intelligent system during the recording. The remaining 37 probants believed that they had indeed communicated with an artificial system.
The corpus consists of 40268 words. The mean length is 1007 words per dialogue. Although the number of male and female participants is not exactly equal, the corpora are big enough to ensure reliable results. The speakers were all enrolled as students at the University of Bielefeld.
Preceding the quantitative analyses of the corpora, the normalized rate of occurrence of each discourse particle was computed for each dialogue:
normalized rate of occurrence = absolute number of occurrence * 100 / total number of words
The normalized data provide the basis for the quantitative investigations. These analyses were carried out half-automatically by means of tools which, for instance, counted the word frequencies in each dialogue and computed the distributions according to each group of speakers. Considering the total number of occurences of discourse particles in the two corpora, it turns out that although there is a difference in the total amount of discourse particles for male and female speakers, the total number of discourse particles decreased considerably in the simulated human-machine-scenario for both groups (FIG 1). The analyses of variances provide a significant effect for the different scenarios with p<.01.
FIG 1: discourse particles per 100 words
However, the data for individual discourse particles reveal that only a small portion of the eleven German discourse particles investigated (ach, äh, ähm, also, gut, hm, ja, nee, nein, oh, okay) reacted completely according to the computer talk hypthesis (for instance, ja, also, okay); again the analyses of variances show a significant effect of the variable scenario for each discourse particle with p<.01:
FIG 2: ja, also, okay
In human-to-human conversation, German ja, also, and okay typically fulfill functions with respect to the dialogue structure on the one hand, on the other they signal a positive speaker attitude (Fischer 1996). In example (1) below, ja and okay display positive feedback and can be furthermore interpreted as turn holding and yielding signals. Moreover, second position ja relates the current utterance to the previous one, connecting the utterances in the dialogue.
(1) ah ja, gut, die grüne Schraube, wenn Du es schon drangeschraubt hast <-> okay <->
`oh yes, good, the green screw, if you have fixed it already <-> okay <->´
The structuring properties of ja and also also become apparent in example (2) in which they connect the two utterances. More commonly, however, they occur initially and relate the respective utterance to the one the communication partner has just uttered, signalling contact, perception, and understanding, as well as that one is going to utter something relevant and is therefore taking the turn, as for instance in example (3).
(2) <-> von <-> äh rechts nach links, ja, also ganz re/ ganz rechts ist eins
`<-> from <-> uh right to left, yes, so very far ri/ right far right is one´
(3) So ja und jetzt habe ich noch die eine rote Schraube mit der Kerbe
Ja und den Fünferstab
`well yes and now I still have the one red screw with the notch´
`yes and the five-hole bar´
Ja is often used in this function when the content of the utterance may constitute a possibly offending speech-act, for instance in questions like (4):
(4) <-> ja was passiert mit dem zweiten Teil des Flügels?
`<-> yes what happens with the second part of the wing?´
Ja is necessary here to establish an atmosphere of basic agreement between the speakers to smooth the force of the utterance. Furthermore, in this example, ja is also used to introduce a new topic. The main functions of ja, also and okay are therefore located in the interactive as well as the discourse structuring domain.
Since the number of instances of ja, also, and okay decreases so much, it can be concluded that speakers seem to expect that either computers do not need any positive feedback and signals of positive speaker attitude, or that they do not expect computers to recognize their discourse structuring functions, or both. Further analyses show that especially the interactive functions of discourse particles are reduced in human-computer interaction.
The quantitative results so far, as well as the functional considerations, are in full accordance with the predictions by Hitzenberger & Womser-Hacker (1995).
In opposition to this, as can be seen in FIG 3, the discourse particle hm shows a completely different tendency. The use of hm increases considerably in human-computer-interaction (significance of the effect for the different settings: p<.01).
FIG 3: hm
Although this result contradicts the predicitons of the computer talk hypothesis, it is not very surprising if it is taken into account that hm signals beginning divergence (we distinguish here between hm which signals beginning divergence and mhm which is used as a positive feedback signal, usually with fall-rise intonation). In human-human interaction it is rare since politeness constraints do not normally allow speakers to express their dissatisfaction so openly (cf. Lindenfeld 1996).
The distribution of instances of gut does not provide supportive data for the computer talk hypothesis for the female speakers: for women, there is no statistically significant difference between the different scenarios. However, the two-way interaction of the variables sex and scenario is p<.01. Here, functional analyses have to be carried out to explain the results.
FIG 4: gut
That the number of instances of gut does not decrease for the female speakers in human-computer interaction can be explained by two facts: firstly almost half of the occurences of gut occur in combinations which limit the, by default, positive speaker attitude as it is conveyed by gut in human-human interaction (5), for instance na gut, schon gut, nun gut (6). Secondly, another half occurs in deliberations, i.e. uttered to oneself, not oriented towards the communication partner (6). The rest of the occurrences is used in framing functions, marking the end of a construction phase (7). This latter function can also be found in the human-to-human dialogues.
(5) Auf beiden Seiten festgeschraubt, gut.
`fixed on both sides, fine´
(6) <attrib> ach so soll das sein, na gut </attrib: leise>
`<attrib> oh that`s how it is supposed to be, oh well </attrib: quiet>´
(7) gut <-> <hum: atmen> mein Gott. Sie nehmen <-> die gelbe Schraube `okay <-> <hum: breathing> my god. You take <-> the yellow screw´
So while gut is used in human-to-human-communication to provide positive feedback to the communication partner or to conclude a topic, it is used in human-computer-interaction mainly without partner-relation, or in connection with a signal of resignation. Consequently, the functional analysis can support the predictions of the computer talk hypothesis concerning the decreasing interactive functions while the quantitative analysis cannot explain the distribution in the corpora, at least not for the female speakers.
Concerning the discourse particle nein, the quantitative analysis shows that while male and female speakers already start off from different numbers in human-to-human communication, the number of occurrences of nein does not decrease in the speech of the women in human-computer interaction. The Man-Whitney U-test shows no significant difference between the female human-human group and the female human-machine group while a difference between the male human-human group and the male human-machine group could be found with p<.05.
FIG 5: nein
In human-to-human communication, nein typically functions as an answer signal either supporting a negatively formulated proposal or as an answer to a clarifying question, for instance:
(8) ja nein <-> im Moment zeigt gar nichts zu mir irgendwie
`yes no <-> at the moment nothing is pointing to me somehow´
In the human-computer dialogues however, nein fulfills very different functions: In 38.9% of the occurrences, nein is used as a repair marker, marking problems in the formulation process:
(9) drehe die rechte Schraube nein <-> zurück
`turn the right screw no <-> undo´
In 61.1% of the cases, however, nein is uttered quietly to the speakers themselves, without displaying any interactive functions, for example:
(10) gegenüber <hum: atmen> <-> der fünfbohrigen Basisplatte <attrib> ah nein nein nein </attrib: leise>
`in front of <hum: breathing> <-> the bar with the five holes <attrib> oh no no no </attrib: quiet>´
Consequently, in the case of nein, not only the quantitative distribution does not correspond to the predictions of the register computer talk (the analysis of variance shows no effect of the scenario), the functions also change considerably. However, it is this functional shift which supports the hypotheses by Hitzenberger & Womser-Hacker (1995).
FIG 6: oh
While oh usually serves as a turn-taking signal with additional emotional content to smooth possibly face-threatening acts such as requests to undo something or to combine parts as in examples (11) and (12), two-thirds of the instances of oh occur in deliberations with expressive function without partner relation in human-to-computer communication (13). 25% furthermore function as repair markers (14).
(11) bitte? oh dann mußt Du es ja nochmal abmachen
`pardon me? oh then you´ll have to undo it again´
(12) oh es ist echt schwierig
`oh it is really difficult´
(13) <attrib> oh das ist ja (ei)ne Siebenerleiste </attrib: leise>
`<attrib> oh this is a bar with seven holes </attrib: quiet>´
(14) das ganze ist jetzt <-> am gelben Kl/ <--> oh
`the whole thing is now <-> at the yellow cu/ <--> oh´
The functional shift in the occurrences of oh in the two corpora can explain why the predictions that the number of discourse particles decreases in human-to-computer communication do not seem to be true of the distribution of oh for all speakers.
In spontaneous spoken language dialogues among human communicators, the functions of hesitation markers such as äh and ähm are the following:
They mark important words, especially nouns. In the communication with human speakers, these hesitation markers are typically employed in proposals, instructions, or requests to smooth the force of the utterance. For instance, in signalling uncertainty about which term to use to refer to a part in the request to combine something in the toy-airplane dialogues, the speakers put the request at disposition. The uncertainty about the terms displayed cannot be motivated by a lack of knowledge since it also occurs if the subjects have referred to that item many times before in the dialogue, for instance:
(15) und in die Mitte setzt Du die ähm Schraube mit der Kerbe.
`and in the middle you put the um screw with the notch´
They can be employed as turn-taking signals. Todt (1981) shows that initial hesitation markers occur after the same intervals as the first syllable would if the utterance began without hesitation. So initial discourse particles are turn-taking signals, but they also provide time for speech management if the utterance is not sufficiently planned after the usual interval between utterances.
(16) ähm jetzt hast das Teil was Du eben angeschraubt hast
`um now you have the part you have just connected´
They function as repair markers, for example:
(17) und so zwar so daß sich ähm daß sie sich <-> in auf der gleichen Linie befinden
`and in a way so that they um that they <-> are on the same line´
They mark boundaries between utterances, for instance:
(18) jetzt wird es ein bißchen schwieriger <-> ähm <-> darauf legst du jetzt mal <->
`now it'll be a little harder <-> um <-> on this you now put <->´
They provide time for speech planning at any place in the utterance, for example:
(19) und jetzt ähm <->
`and now um <->´
Concerning ähm, in human-to-human communication, for both male and female speakers these functions are almost evenly distributed, besides the role as a repair marker which is taken in only 10.4% of the cases by female and 6.2% by male participants.
In the interaction with the speech processing system, the number of instances of ähm which fulfill turn-taking functions increases to 52.9% for the men and to 42.0% for the women, while all other functions decrease; for women, for instance, the number of repair markers is reduced to 5.7%. For men, the number of fillers drops to 5.8% whereas it stays almost constant with the women.
It can be concluded that there are small differences in the functions ähm fulfills in the speech of male and female participants. However, concerning the quantitative distributions of ähm, considerable differences can be found. So although the general trend concerning the distribution of functions remains comparable between male and female speakers, in human-to-computer communication the total number of occurrences of ähm is almost four times higher for the women than for the men. The Mann-Whitney U test shows a significance of almost p<.01 between the male human-machine group and the female human-machine group.
FIG 7: ähm
The analyses of variances for äh provided a significance of p<.05 for the effect of the variable scenario. Looking at the functions äh fulfills in the two corpora, it turns out that in human-to-human communication, men use äh in all of the functions almost equally often (12.5%-16.6%), besides the occurrences in which äh marks important words in order to smooth possibly offending utterances (42.7%). For female speakers, the distribution is similar, besides the fact that fillers only occur in 6.9%, however repair markers in 21.8% of the cases.
When talking to an artificial communicator, for the male participants the functions äh is employed in change completely: 49.2% are initial (turn-taking) occurrences and further 37.3% are repair markers. There are no instances of äh in which it would segment utterances and only 1.4% in which it serves as a filler. The remaining 11.9% are markers of important information.
For women, the changes are much less dramatical: They use 38.5% of the instances of äh in connection with nouns. The number of fillers almost doubles to 12.8%.
That the number of markers of important information does not drop with female participants in human-to-computer communication is puzzling. In the human-to-human situation, these kinds of markers demonstrate which word has the highest informational content and therefore contribute to the argument structure of the dialogues. Moreover, they smooth the force of an utterance. However, throughout the paper it became apparent that the interactive functions discourse particles fulfill in human-to-computer communication are reduced. Argument structuring and smoothing however have to be regarded as partner-oriented. So either the women use äh in the interactive domain even when they are talking to a computer, in this case äh would constitute an exception with regard to the tendency the other discourse particles display in the corpora, or their use of äh in this construction is motivated differently, for instance, female speakers may really display uncertainty before chosing a term. The high number of repair marking reported for nein, äh and oh provides further evidence that speakers are much more concerned about speech management than they are with human communication partners.
Fig 8: äh
To sum up, it can be concluded that men and women behave differently in their use of discourse particles in human-to-computer communication. Generally, the quantitative distributions of discourse particles uttered by male speakers are in accordance with the postulation of a register computer talk while for female participants, the quantitative distributions often contradict the predictions for the domain. It is therefore necessary to consider gender as an important variable in the investigation of "computer talk". Functional analyses have shed some light on the different distributions of the discourse particles in the two corpora. With respect to several of these discourse particles, severe functional shifts could be found. These may serve as a basis for the further investigation of differences between male and female speech in human-computer dialogues.
In distributional analyses, it was found that although the total number of discourse particles decreases in the simulated human-to-computer scenario, only a restricted number of discourse particles is distributed according to the predictions of the computer talk hypothesis. Most discourse particles display gender-related differences: for many of them the number of instances even increase instead of decrease in female speech in the interaction with an artificial communicator. Further functional analyses show that besides the distributional differences for men and women, a considerable functional shift for most discourse particles could be observed, away from the interactive, towards the speech-management domain. Consequently, gender has to be regarded as an important socio-linguistic variable in the investigation of human-computer interaction.
It can therefore be concluded that the computer talk hypothesis has to be redefined; in particular, quantitative predictions have to be replaced by qualitative, functional assertions. So it is an oversimplification to postulate that the number of discourse particles simply decreases in human-to-computer communication. It has to be taken into account that discourse particles change their functions considerably: while in human-to-human communication they display functions with respect to discourse and argument structure, the turn-taking system, the speech management domain, but most importantly the interactive relationship between the communication partners, in human-to-computer communication they are mainly used to organize one´s speech or with an expressive function in deliberations. In the corpora, female and male speakers seem to use discourse particles partly for different tasks (as in the case of äh), and partly in differing numbers (as in the case of ähm). Therefore, functional analyses with respect to female and male speech are necessary for assertions about human-to-computer communication.
Brindöpke, C., Johanntokrax, M., Pahde, A. & Wrede, B. (1995): Darf ich Dich Marvin nennen? Instruktionsdialoge in einem Wizard-of-Oz Szenario: Materialband. Report 1995/7, SFB 360 "Situierte künstliche Kommunikatoren", University of Bielefeld.
Fink, G.A. & Johanntokrax, M. & Schaffraniez, B. (1995): A Flexible Formal Language for the Orthographic Transcription of Spontaneous Spoken Dialogues. In: Proceedings of the fourth European Conference on Speech Communication and Technology, pp. 871-874.
Fischer, K. & Johanntokrax, M. (1995): Ein linguistisches Merkmalsmodell für die Lexikalisierung von diskurssteuernden Partikeln. SFB 360 "Situierte künstliche Kommunikatoren", Report 18. University of Bielefeld.
Fischer, K. & Drescher, M. (1996): Methods for the Description of Discourse Particles: Contrastive Analysis. Language Sciences 18, 3-4, pp. 853-861.
Fischer, K. (1996): Validating Analyses of Semantic Features in Interjections. LAUD-Paper No. 276, University of Duisburg.
Hitzenberger, L. & Womser-Hacker, C. (1995): Experimentelle Untersuchungen zu multimodalen natürlichsprachigen Dialogen in der Mensch-Computer-Interaktion. Sprache und Datenverarbeitung 19, 1.
Krause, J, Hitzenberger, L. (eds.): Computertalk. Sprache und Computer 12. Hildesheim et al., 1992.
Lindenfeld, J. (1996): Cognitive Aspects of Verbal Interaction. In: Casad, Eugene H. (ed.): Cognitive Linguistics in the Redwoods. The Expansion of a New Paradigm in Linguistics. Mouton De Gruyter.
Marx, J. (1996): Die Computer-Talk-These in der Sprachgenerierung. Hinweise zur Gestaltung natürlichsprachlicher Zustandsanzeigen in multimodalen Informationssystemen. In: Gibbon, D. (ed.): Natural Language Processing and Speech Technology. Results from the third Konvens Conference, Bielefeld, October 1996. Mouton De Gruyter 1996.
Sagerer, G. & Eikmeyer, H.-J. & Rickheit, G. (1994): Wir bauen jetzt ein Flugzeug. Konstruieren im Dialog. Arbeitsmaterialien. Tech. Report, SFB 360 Situierte künstliche Kommunikatoren, University of Bielefeld.
Schiffrin, D. (1987): Discourse markers. Cambridge: Cambridge University Press.
Todt, D. (1981): Zum Auftreten von Füllauten in spontan gesprochenen Berichten. Nova Acta Leopoldina N.F. 54, 245, 597-611.
<attrib> marks the beginning of a feature
</attrib: quiet> marks the end of a feature, here: quiet
<hum> marks the beginning of human noise
</hum: breathing> marks the end of human noise, here: breathing
<hum: breathing> marks an isolated breathing event
<noise> marks the beginning of noise
</noise: micro> marks the end of some noise from the microphone
<noise: micro> marks an isolated instance of noise from the microphone
<-> marks a short pause
<--> marks a longer pause
<sil: 2> marks a pause of two seconds
() mark parts of a word that are not realized
ri/ breaking off in the middle of a word
<par></par> mark parallel speech