ISLE

 

 

 

 

The Validation Report Tools

 

 

 

 

Project: LE4-8353

Deliverable: D5.1

 

 

 

 

 

 

 

 

Version

5

Date

03.07.2000

 

 

ISLE Deliverable

 

 

 

 

Project Number

LE4-8353

Project Title

Interactive Spoken Language Education [ISLE]

Deliverable Type

Tool, Report

Distribution

Restricted

Deliverable ID

D5.1

Expected Delivery Date

 

Actual Delivery Date

066 JuneApr 2000

Title of Deliverable

The Validation ReportTools

Authors

ULeeds [Howarth], Umilan [Pezzotta, Galbiati, Bisiani]

 

 

OT

RE

SP

PR

TO

Other

Report

Specification

Prototype

Tool

 

 

 

C

P

R

Consortium

Public

Restricted

 

 

 

 

 

Revision History

 

 

Version

Date

Status

Author(s)

1

03-09-1999

Draft

U Milan [Pezzotta, Galbiati, Bisiani]

2

06-04-2000

Final Part IFinal

U Milan [Pezzotta, Galbiati, Bisiani]

3

15-05-2000

Draft Part II

U Leeds [Howarth]

4

06-06-2000

DraftFinal

edited by Menzel

5

1/7/2000

Final

R. Bisiani

 

 

 

Part I: Executive summary *

Part I: Executive summary *

Part II: The on-line evaluation *

1. Trialling *

2. Procedure *

3. Data collection *

4. Data analysis *

4.1. Native English-speaking teachers' questionnaires *

4.1.1 Evaluator's reports *

4.2 German-speaking teachers' questionnaires *

4.3. Users' questionnaires: Italian learners *

4.3.1 Evaluator's reports *

4.4. Users' questionnaires: German learners *

4.4.1 Evaluator's report *

Part III: The validation tool *

1. Annotation Table *

1.1. Definition *

1.2. Example *

2. Diagnose Table *

2.1. Definition *

2.2. Example *

3. Compare Table *

3.1. Definition *

3.2. Example *

4. The output of the validation tool *

4.1. Phone error analysis *

4.2. Stress error analysis *

5. Experiments with the recognition threshold *

5.1 Results *

6. Experiments with the localization threshold *

6.1 Results *

Appendix 1: On-line Evaluation: Instructions for the evaluator *

Appendix 2: Introductory information *

Appendix 3: Evaluator's record sheet *

Appendix 4: Sessions analyzed during the off-line evaluation *

 

 

Part I: Executive summary
*

Part II: The on-line evaluation *

1. Trialling *

2. Procedure *

3. Data collection *

4. Data analysis *

4.1. Native English-speaking teachers' questionnaires *

4.2 German-speaking teachers' questionnaires *

4.3. Users' questionnaires: Italian learners *

4.4. Users' questionnaires: German learners *

Part III: The validation tool *

1. Annotation Table *

1.1. Definition *

1.2. Example *

2. Diagnose Table *

2.1. Definition *

2.2. Example *

3. Compare Table *

3.1. Definition *

3.2. Example *

4. ISLE OCX Function-Structures for the Validation Process *

4.1. Error types returned by ISLE OCX *

4.2 Validation Process Functions *

5. The output of the validation tool *

5.1. Phone error analysis *

5.2. Stress error analysis *

6. Experiments with the recognition threshold *

6.1 Results *

7. Experiments with the localization threshold *

7.2 Results *

Appendix 1: On-line Evaluation: Instructions for the evaluator *

Appendix 2: Introductory information *

Appendix 3: Evaluator's record sheet *

Appendix 4: Sessions analyzed during the off-line evaluation *

 

 

 

Figure 1: The Structure *

Figure 2: Output’s scheme for phone error analysis *

Figure 3: PhCorrGlobal.xls *

Figure 4: PhCorrPhone.xls *

Figure 5: PhCorrPhoneType.xls *

Figure 6: PhErrGlobal.xls *

Figure 7: PhErrPhone.xls *

Figure 8: PhErrPhoneType.xls *

Figure 9: PhGenGlobal.xls *

Figure 10: PhGenPhone.xls *

Figure 11: PhGenPhoneType.xls *

Figure 12: Output scheme for stress error analysis *

Figure 13: StCorrGlobal.xls *

Figure 14: StCorrVowels.xls *

Figure 15: StErrGlobal.xls *

Figure 16: StErrVowels.xls *

Figure 17: StGenGlobal.xls *

Figure 18: StGenVowels.xls *

Figure 19: Results "Word Level" Stress for German speakers *

Figure 20: Results "Word Level" Stress for Italian speakers *

Figure 21 : Cumulative graph *

Figure 22: Frequencies graph *

Figure 23: Cumulative percentage graph *

Figure 24: Frequencies percentage graph *

Figure 25: Word level localization threshold *

Figure 26: Phone Level localization threshold *

 

 

Table 1: The definition of the MIL file *

Table 2: Annotation Table’s Keys *

Table 3: Values of TE variable *

Table 4: An example of Annotation Table *

Table 5: Diagnose Table’s Keys *

Table 6: An example of Diagnose Table *

Table 7: Compare Table’s Keys *

Table 8: Values of HMP variable *

Table 9: Values of HMS variable for "Phone Level" *

Table 10: An example of Compare Table *

Table 11: The output of the phone error analysis *

Table 12: The output of the stress error analysis *

Table 13: Phone types *

Table 14: Values of HMS variable for "Word Level" *

Table 15 : An example stress table on the word level *

Table 16: Rate formulas *

Table 17: IHAPI Alignment *

Table 18: Word confidence annotation *

Table 19 : The sessions *

 

Figure 1: The Structure *

Figure 2: Output’s scheme for phone error analysis *

Figure 3: PhCorrGlobal.xls *

Figure 4: PhCorrPhone.xls *

Figure 5: PhCorrPhoneType.xls *

Figure 6: PhErrGlobal.xls *

Figure 7: PhErrPhone.xls *

Figure 8: PhErrPhoneType.xls *

Figure 9: PhGenGlobal.xls *

Figure 10: PhGenPhone.xls *

Figure 11: PhGenPhoneType.xls *

Figure 12: Output scheme for stress error analysis *

Figure 13: StCorrGlobal.xls *

Figure 14: StCorrVowels.xls *

Figure 15: StErrGlobal.xls *

Figure 16: StErrVowels.xls *

Figure 17: StGenGlobal.xls *

Figure 18: StGenVowels.xls *

Figure 19: Results "Word Level" Stress for German speakers *

Figure 20: Results "Word Level" Stress for Italian speakers *

Figure 21 : Cumulative graph *

Figure 22: Frequencies graph *

Figure 23: Cumulative percentage graph *

Figure 24: Frequencies percentage graph *

Figure 25: Word level localization threshold *

Figure 26: Phone Level localization threshold *

 

Figure 1: The Structure *

Figure 2: Relations between the tables *

Figure 3: The structure of the system *

Figure 4: Output’s scheme for phone error analysis *

Figure 5: PhCorrGlobal.xls *

Figure 6: PhCorrPhone.xls *

Figure 7: PhCorrPhoneType.xls *

Figure 8: PhErrGlobal.xls *

Figure 9: PhErrPhone.xls *

Figure 10: PhErrPhoneType.xls *

Figure 11: PhGenGlobal.xls *

Figure 12: PhGenPhone.xls *

Figure 13: PhGenPhoneType.xls *

Figure 14: Output scheme for stress error analysis *

Figure 15: StCorrGlobal.xls *

Figure 16: StCorrVowels.xls *

Figure 17: StErrGlobal.xls *

Figure 18: StErrVowels.xls *

Figure 19: StGenGlobal.xls *

Figure 20: StGenVowels.xls *

Figure 21: Results "Word Level" Stress for German speakers *

Figure 22: Results "Word Level" Stress for Italian speakers *

Figure 23 : Cumulative graph *

Figure 24: Frequencies graph *

Figure 25: Cumulative percentage graph *

Figure 26: Frequencies percentage graph *

Figure 27: Word level localization threshold *

Figure 28: Phone Level localization threshold *

 

Table 1: The definition of the MIL file *

Table 2: Annotation Table’s Keys *

Table 3: Values of TE variable *

Table 4: An example of Annotation Table *

Table 5: Diagnose Table’s Keys *

Table 6: An example of Diagnose Table *

Table 7: Compare Table’s Keys *

Table 8: Values of HMP variable *

Table 9: Values of HMS variable for "Phone Level" *

Table 10: An example of Compare Table *

Table 11: The output of the phone error analysis *

Table 12: The output of the stress error analysis *

Table 13: Phone types *

Table 14: Values of HMS variable for "Word Level" *

Table 15 : An example stress table on the word level *

Table 16: Rate formulas *

Table 17: IHAPI Alignment *

Table 18: Correct-Wrong prompts percentages *

Table 19: Word confidence annotation *

Table 2019 : The sessions *

 

 

REPORT SUMMARY / INTRODUCTIONPart I: Executive summary

 

The goal of the ISLE project aims is to build a tool to help adult intermediate learners of English improve their pronunciation, using speech recognition technology.

This report describes:

Specifications for the data collection are provided in ISLE report D31.

The dDetailed performance results are providedcan be found in:

In particular, this report details:

 

The distribution of this report is restricted to ISLE project partners, managers and reviewers.This is a public report.

 

 

Contents

 

 

THE VALIDATOR TOOL *

1. Annotation Table *

1.1. Definition *

1.2. Example *

2. Diagnose Table *

2.1. Definition *

2.2. Example *

3. Compare Table *

3.1. Definition *

HITS *

3.2. Example *

4. ISLE OCX Function-Structures for the Validation Process *

4.1. Error types returned by ISLE OCX *

4.2 Validation Process Functions *

5. Tool’s output *

5.1. Phone analysis *

5.1.1. Results *

5.2. Stress analysis *

5.2.1. Phone level *

5.2.1.1. Results *

5.2.2. Word level *

5.2.2.1. Results *

6. Experiments with the recognition threshold *

6.1 Results *

7. Experiments with the localization threshold *

7.2 Results *

Appendices *

Sessions analyzed *

 

The Validator Tool *

1. Annotation Table *

1.1. Definition *

1.2. Example *

2. Diagnose Table *

2.1. Definition *

2.2. Example *

3. Compare Table *

3.1. Definition *

3.2. Example *

4. ISLE OCX Function-Structures for Validation Process *

4.1. Error types returned by ISLE OCX *

4.2 Validation Process Functions *

5. Tool’s output *

5.1. Phone analysis *

5.1.1. Results *

5.2. Stress analysis *

5.2.1. Phone level *

5.2.1.1. Results *

5.2.2. Word level *

5.2.2.1. Results *

6. Experiments with the recognition threshold *

6.1 Results *

7. Experiments with the localization threshold *

7.2 Results *

Appendices *

A. Sessions analyzed *

A.1. German Sessions *

A.2. Italian Sessions *

B. Graphs *

 

Figures Index

Figure 1: The Structure *

Figure 2: Relations between the tables *

Figure 3: The structure of the system *

Figure 4: Output’s scheme for Phone Analysis *

Figure 5: PhCorrGlobal.xls *

Figure 6: PhCorrPhone.xls *

Figure 7: PhCorrPhoneType.xls *

Figure 8: PhErrGlobal.xls *

Figure 9: PhErrPhone.xls *

Figure 10: PhErrPhoneType.xls *

Figure 11: PhGenGlobal.xls *

Figure 12: PhGenPhone.xls *

Figure 13: PhGenPhoneType.xls *

Figure 14: Output scheme for Stress Analysis *

Figure 15: StCorrGlobal.xls *

Figure 16: StCorrVowels.xls *

Figure 17: StErrGlobal.xls *

Figure 18: StErrVowels.xls *

Figure 19: StGenGlobal.xls *

Figure 20: StGenVowels.xls *

Figure 21: Results "Word Level" Stress for German speakers *

Figure 22: Results "Word Level" Stress for Italian speakers *

Figure 23 : Cumulative graph *

Figure 24: Frequencies graph *

Figure 25: Cumulative percentage graph *

Figure 26: Frequencies percentage graph *

Figure 27: Word level localization threshold *

Figure 28: Phone Level localization threshold *

 

Figure 1: The Structure *

Figure 2: Relations between the tables *

Figure 3: The structure of the system *

Figure 4: Output’s scheme for Phone Analysis *

Figure 5: PhCorrGlobal.xls *

Figure 6: PhCorrPhone.xls *

Figure 7: PhCorrPhoneType.xls *

Figure 8: PhErrGlobal.xls *

Figure 9: PhErrPhone.xls *

Figure 10: PhErrPhoneType.xls *

Figure 11: PhGenGlobal.xls *

Figure 12: PhGenPhone.xls *

Figure 13: PhGenPhoneType.xls *

Figure 14: Output scheme for Stress Analysis *

Figure 15: StCorrGlobal.xls *

Figure 16: StCorrVowels.xls *

Figure 17: StErrGlobal.xls *

Figure 18: StErrVowels.xls *

Figure 19: StGenGlobal.xls *

Figure 20: StGenVowels.xls *

Figure 21: Results "Word Level" Stress for German speakers *

Figure 22: Results "Word Level" Stress for Italian speakers *

Figure 23 : Cumulative graph *

Figure 24: Frequencies graph *

Figure 25: Cumulative percentage graph *

Figure 26: Frequencies percentage graph *

Figure 27: Word level localization threshold *

Figure 28: Phone Level localization threshold *

 

 

Tables Index

 

Table 1: The definition of the MIL file *

Table 2: Annotation Table’s Keys *

Table 3: Values of TE variable *

Table 4: An example of Annotation Table *

Table 5: Diagnose Table’s Keys *

Table 6: An example of Diagnose Table *

Table 7: Compare Table’s Keys *

Table 8: Values of HMP variable *

Table 9: Values of HMS variable for "Phone Level" *

Table 10: An example of Compare Table *

Table 11: The output for Phone Analysis *

Table 12: The output for Stress Analysis *

Table 13: Phone’s Type *

Table 14: Values of HMS variable for "Word Level" *

Table 15 : An example Stress "Word Level " Table *

Table 16: Rate formulas *

Table 17: IHAPI Alignement *

Table 18: Correct-Wrong prompts percentages *

Table 19:Word conf-Annotator *

Table 20 : The sessions *

 

Table 1: The definition of the MIL file *

Table 2: Annotation Table’s Keys *

Table 3: Values of TE variable *

Table 4: An example of Annotation Table *

Table 5: Diagnose Table’s Keys *

Table 6: An example of Diagnose Table *

Table 7: Compare Table’s Keys *

Table 8: Values of HMP variable *

Table 9: Values of HMS variable for "Phone Level" *

Table 10: An example of Compare Table *

Table 11: The output for Phone Analysis *

Table 12: The output for Stress Analysis *

Table 13: Phone’s Type *

Table 14: Values of HMS variable for "Word Level" *

Table 15 : An example Stress "Word Level " Table *

Table 16: Rate formulas *

Table 17: IHAPI Alignement *

Table 18: Correct-Wrong prompts percentages *

Table 19:Word conf-Annotator *

Table 20 : German sessions *

Table 21: Italian sessions *

 

 

 

Part II: The on-line evaluation

 

1. Trialling

 

For the purposes of testing the effectiveness of the ISLE demonstrator, the system was trialled with groups of adult non-native speakers of English from Italy and Germany, non-native teachers from Germany and native-speaker teachers in the UK, a total of 28 subjects:

 

University of Milan, Bicocca 6 Italian-speaking learners

Klett verlag, Stuttgart 9 German-speaking learners

8 German-speaking teachers

University of Leeds 5 English-speaking teachers

 

2. Procedure

 

The demonstrator was installed at each location from CD and tested out by the project partners. In each case an evaluator was identified, who would supervise the trialling sessions. Instructions were distributed to the supervisors (see Appendix 1) and an introduction to the ISLE project was given to each volunteer (Appendix 2).

 

3. Data collection

 

Two sources of information were used for data collection:

 

4. Data analysis

 

There follow the collated data from the various sources.

 

4.1. Native English-speaking teachers' questionnaires

 

Not all the separate comments are recorded here where there is considerable overlap.

 

1. feedback

1.1 Is the feedback easy for a learner to understand?

v.easy easy neither difficult v.difficult

4 1

Comments:

"Explanation and the chance to listen again to the native speaker as often as the student wishes is good"

"Easy but often inaccurate or vague"

"Clear but limited in scope. No suggestions are given"

 

 

1.2 Do you feel the feedback would be accurate in identifying their errors?

v.accurate accurate neither inaccurate v. inaccurate

1 3 1

Comments:

"One feedback comment confused the computer 'model' pronunciation with the speaker's pronunciation"

"I don't think it's very clear in identifying whether the error is one of stress or of sound production"

 

1.3 Did you mispronounce sounds that the program didn't identify? yes 4 no 1

Comments/examples:

"It doesn't recognise other variants"

"bratwurst"

"It picked up few consonant errors (eg d and p)"

 

1.4 Did the program falsely identify errors? yes 5 no

comments/examples:

"Deliberate mispronunciations of /EY/ were almost never picked up"

"As a native speaker, the program constantly corrected my pronunciation, which, I suppose, is RP! A little worrying."

"This is rather difficult to prove, but the speaker's pronunciation of /EH/ in one waord was understood as /AE/ by the computer."

"eg in 'wonderful' /UH/ for /AH/"

 

1.5 Would the feedback help learners to improve their pronunciation?

v.well well neither badly v. badly

1 1 2 1

Comments:

"I feel it depends entirely on a particular example. The speaker's pronunciation of 'glass' was very similar to the computer's, but was seen as a 'problem'. When repeated 5 times in 5 different ways the screen comment was 'good try'"

"For some 'problem' words there was no concrete model to listen to, only advice to keep practising. Not very helpful."

"It seems to be useful for specific sentences and sounds"

"Identifying the error, explaining it, isolating the sound, repeating it has to be useful"

"Often but not always a mispronounced word was correctly identified, but the diagnosis was either vague, or focused on the wrong syllable, or on a vowel instead of a consonant."

 

2. material

2.1 Is the language users have to speak realistic?

v. realistic realistic neither unrealistic v. unrealistic

3 1 1

Comments:

"The language itself is realistic but the delivery is unconvincing in the dialogues. The speakers sound bored."

"Going on a barge trip isn't very common".

"It seemed relatively realistic, although I'm not sure I would say 'I'll have a pizza and a soda to drink'"

 

2.2 Are the instructions clear?

v. clear clear neither unclear v. unclear

1 2 1 1

Comments:

"They vary. Many of the buttons are hard to find, in odd places or confusingly named ('micrtophone' can be either the microphone or the head in profile"

"Initially the first interface is very confusing. The other screens are a bit confusing"

"It does depend on how well the listener understands the symbols, which are not always clear eg the side panel"

"Extremely difficult to follow. Not obvious."

"Many things missing. The initial 'Start Program' button was above the general blurb, which you had to read first. Needs much more careful thought"

 

3. design

3.1 Are the exercises/activities interesting?

v.interesting interesting neither uninteresting v.uninteresting

1 2 1 1

Comments:

"Repetitive but I suppose this is hard to avoid"

"They are neutral- standard and bland- but clear enough."

"I don't think the interest level is very high as the situational dialogues are very conventional."

"How 'interesting' can you expect an exercise to be?"

"Limited exercise types. Not particularly stimulating sentences"

 

3.2 Is the program visually attractive?

v.attractive attractive neither unattractive v.unattractive

4 1

Comments:

"The colours are repetitive"

"Quite easy on the eye and friendly if a little esoteric."

"I can't see the rationale behind the opening web page design"

"Stylish to look at, but isn't always easy to find the buttons to click"

"Attractive but I found it hard to follow logically. I prefer a linear design, rather than a globular flowing design. The colours are a bit insipid"

 

3.3 Is the language varied enough? yes 1 no 2 don't know 2

Comments:

"There seems to be a model and variants away from this may be considered incorrect"

"If it's only dialogues, then obviously it isn't varied enough for pronunciation purposes"

 

3.4 Would you recommend your learners to use the program (again / more than once)? yes 2 no 3

Comments:

"Yes, because repetition plus clear exemplification is important with pronunciation. Students can go over each item as often as they like"

"Yes, if they were having problems with individual sounds"

"No, nowhere near accurate enough"

"Not yet, needs much more work"

"No, it is too unreliable for students to work with on their own"

 

4. learning

4.1 Does the program cover the most important pronunciation features? yes 3 no don't know 2

Comments:

"Yes, stress, minimal pairs"

 

4.2 What is missing?

Comments/examples:

"Links between words"

"weak forms versus full forms"

"Rhythm patterns, linking"

"Intonation patterns"

 

4.3 Is the target pronunciation appropriate? yes 5 no

Comments:

"It's the one most students want"

"I heard 2 different accents. How does the computer feedback differentiate between 'errors' and variants? What is the standard"

"Yes, however scope for variations wold be useful"

 

4.4 Is the practice at the right level for intermediate learners? yes 4 not sure 1

Comments:

"A bit easy?"

 

4.5 Would this material contribute to the development of your students' spoken English? yes 3 no 2

Comments:

"Yes, if it were seen as a resource with certain limitations"

"No, too many variables and unsolved problems"

 

4.6 What additional features would improve the program?

Comments:

"A link between 'teacher's' demo, pronunciation of a phrase and the diagnosis of a problem would be essential- occasionally the 'teacher' seemed to be making the same 'mistake' as the student e.g. using the weak form of 'than'"

"Using the phonetic alphabet instead of highlighting a letter in the word e.g. PROBLEM: an Italian speaker might think it should be pronounced /with /EH/ not /AX/."

"Ability for students to enter his/her own speech (language examples) and have feedback"

"Clearer symbols/buttons- perhaps fewer"

"Easier initial stage: reading individual sentences for the computer to adjust to the speaker's accent is tedious and the sentences are disjointed when separated by time gaps"

 

4.1.1 Evaluator's reports

 

1. Problems experienced by the user:

"Constant problems with the volume level"

"The position of the start button at the top of the opening page is confusing"

"What to do now? Frequent pauses to find out what to do next"

"No warning that sound is going to start"

"One user didn't discover the text of the dialogue"

"Took a long time to discover the exercises"

"Not easy to move smoothly between the dialogues and the exercises"

 

2. Reactions of the user (visual or verbal):

"The adaptation is tedious. Why so long?"

"Which is the settings button? What is the difference between the settings? Not clear what the arrow buttons do"

"The colour of the buttons means they don't stand out"

"Tried using an Italian accent in the adaptation phase"

"Why are text exercises included?"

"Slow response caused multiple clicking"

"A lot of clicking at the wrong time"

"Odd design"

"Liked the layout and background; user-friendly appearance"

"not clear what to do after listening to the dialogue"

 

3. Components of the program used:

All users used all available components.

 

4. Other comments:

"It was an exercise in discovery. Without the evaluator's help it takes a very long time to find out how to use the program"

"A high level of strictness results in a lot of non-errors"

"Very hard to hear difference between teacher and student versions"

"Does 'than' in the system's dictionary have a weak variant?

"Very unreliable/inaccurate performance doesn't inspire confidence"

"'In the Office' dialogue: not possible to have text and listen to dialogue at the same time"

"The length of the adaptation phase is disproportionate to the amount of practice material available"

"Necessary to go back to the text (not easy) to answer the questions"

 

4.2 German-speaking teachers' questionnaires

 

1. feedback

1.1 Is the feedback easy for a learner to understand?

v.easy easy neither difficult v.difficult

4 3 1

 

1.2 Do you feel the feedback would be accurate in identifying their errors?

v.accurate accurate neither inaccurate v. inaccurate

5 3

 

1.3 Did you mispronounce sounds that the program didn't identify? yes 4 no 3

 

1.4 Did the program falsely identify errors? yes 5 no 3

comments/examples:

 

 

1.5 Would the feedback help learners to improve their pronunciation?

v.well well neither badly v. badly

8

 

2. material

2.1 Is the language users have to speak realistic?

v. realistic realistic neither unrealistic v. unrealistic

6 2

 

2.2 Are the instructions clear?

v. clear clear neither unclear v. unclear

2 6

 

3. design

3.1 Are the exercises/activities interesting?

v.interesting interesting neither uninteresting v.uninteresting

1 6 1

 

3.2 Is the program visually attractive?

v.attractive attractive neither unattractive v.unattractive

5 2 1

 

3.3 Is the language varied enough? yes 6 no 1 don't know 1

 

3.4 Would you recommend your learners to use the program (again / more than once)? yes 8 no

 

4. learning

4.1 Does the program cover the most important pronunciation features? yes 5 no don't know 3

 

4.3 Is the target pronunciation appropriate? yes 5 don't know 3

 

4.4 Is the practice at the right level for intermediate learners? yes 7 not sure 1

 

4.5 Would this material contribute to the development of your students' spoken English? yes 5 don't know 3

 

4.6 What additional features would improve the program?

Comments:

"More female voices"

"Analysis of free spoken text production"

 

4.3. Users' questionnaires: Italian learners

 

1. feedback

1.1 Is the feedback easy to understand?

v.easy easy neither difficult v.difficult

2 3 1

 

1.2 Do you feel the feedback is accurate in identifying your errors?

v.accurate accurate neither inaccurate v. inaccurate

3 3

Comments:

"I like very much the feedback in which I can understand the way I have to pronounce a phone, reading another word with the same phone"

 

1.3 Did you make errors that the program didn't identify? yes no 6

 

1.4 Did the program identify errors you thought were correct? yes 3 no 3

Examples:

'business'

'photographer'

 

1.5 Does the feedback help you to improve your pronunciation?

v.well well neither badly v. badly

2 4

Comments:

"To me it is very useful the fact that it is possible to see the errors I made and to repeat similar words (IMPROVE window) to learn the pronunciation of a single phone"

"I think that 'IMPROVE' and 'Practice the phones you have the most problems with' are really great ideas"

 

2. material

2.1 Is the language you have to speak realistic?

v. realistic realistic neither unrealistic v. unrealistic

6

Comments:

"In the demo I saw Lesson 5 and I think that it describes a realistic situation"

 

2.2 Are the instructions clear?

v. clear clear neither unclear v. unclear

6

Comments:

"I have only a consideration to do: In "Oral Exercise" there is an instruction that tells: "Please click on the microphone and read the sentence in the box" but on the window there are two "Microphone" buttons, and this can do muddle. "

"In Standard exercises there is not legend that tell me the meaning of the GREEN/RED feedback."

 

 

3. design

3.1 Are the exercises/activities interesting?

v.interesting interesting neither uninteresting v.uninteresting

1 4 1

Comments/examples:

"In particular I like very much the fact that I can run different kinds of exercises….this make the demo not boring."

" To me it is very boring the Standard Exercise TRUE/FALSE."

" The exercise I prefer is "Listen and Repeat"."

 

3.2 Is the program visually attractive?

v.attractive attractive neither unattractive v.unattractive

4 2

Comments:

"I like very much the menu to choose the exercise"

"The interface is very beautiful"

" I like very much the window "Arrival to Manchester", in which I can select the lessons."

"I like very much the colours defined for the windows (the background colours): they are very relaxing."

 

3.3 Is the language varied enough? yes 6 no

Comments:

" I think that there will be no tool enough detailed so that it can be considered enough to learn English perfectly. But, to me it is great the function associated with the "ABC" button, through which I can pronounce a lot of words associated with a particular phone."

 

3.4 Would you use the program again / more than once? yes 6 no

Comments:

" For my English it would be a good thing to use this demo again. In particular I like FREE CHOICE Oral exercise."

" For my English it would be a good thing to use this demo again."

" It is a good idea the fact that I can see and read the dialogue. Besides it is great the way the lesson is introduced."

" I like very much this demo, in particular the fact the it is able to correct my pronunciation at phone level."

" I like it very much: I saw other tools and I think this is the best!!!"

 

4. learning

4.1 Does the program cover the most important pronunciation features? yes 4 no

 

[no answers were given to 4.2 and 4.3]

 

4.4 Is the practice at the right level for you? yes 5 no 1

Comments:

" No, because my English is really bad, and so the level is really high to me."

" My English is not so good and this test confirms it. So I think the level is right"

 

 

4.5 What additional features would improve the program?

Comments/examples:

"ToolTips for the buttons would be very useful."

"Change the Navigational arrows to make them more intuitive: for example the arrows to change the exercise (the external ones) can be vertical. The background of the windows is too homogeneous: the windows would be more attractive putting more colours on it (changing for example changing the colour of the windows’ frame)."

"Put a legend to the STANDARD exercises to tell the user the meaning of the GREEN/RED feedback.

Define a button "Give me the Correct Translation " in the Translate Exercises.

"The buttons on the windows seem image and not buttons: it would be better that when the user pushes on a button, it changes its appearance."

" Add a progress–bar in READ-REPEAT exercise to tell me the velocity through which I have to pronounce the phrase.

Define a new type of exercise as mix of "READ-REPEAT " and "LISTEN-REPEAT", in which I can hear the utterance but I can have the text of the phrase to help to repeat."

 

 

4.3.1 Evaluator's reports

 

1. Problems experienced by the user:

"There are some problems about the voice-feedback for the diagnose errors. For example in the "Read and Repeat" exercise with the phrase "They asked if I wanted to come along on the barge trip" the speaker make an error on the word "ASKED" and when I pushed the "teacher" option on the pop-menu’ the demo "said" "THEY ASKED" and not only "ASKED"."

" In some occasions the "Teacher" button doesn’t run."

" In the dialogue the Mr. Rossetti’s voice is low and so difficult to understand.

To do ORAL exercises why does the user must listen for the dialogue?

Buttons aren’t intuitive."

" In a occasion in the "Fill in the blank" exercise the inserted word covers the fixed text.

"Fill in the blank" option in the menu’ doesn’t work: the user access this exercise only through the navigational arrows."

" In "Standard Exercise"

The exercise "Translate" doesn’t run;

The external navigation arrow doesn’t run

In "TRUE/FALSE" exercise there is written:

"After listening the dialogue, please answer these questions with YES or NO "

But if the user hasn’t listened for the dialogue before, there is no way to listen for it."

" In some occasions the student had difficult to understand how to come to the previous window and, in general, to capture the meaning of the buttons."

 

2. Reactions of the user (visual or verbal):

"My impression was that the speaker seemed to be very enthusiastic…he told me that the demo’s interface are beautiful, and in particular he liked very much the diagnose-feedback of the error (possibility to listen for the correct pronunciation and to practice on a wrong word [IMPROVE window])"

" She told me that the demo is very interesting, but she was very perplexed, because the meaning of the buttons often it is not clear."

" The speakers told to me that the ORAL exercises are great, but he was very boring to do the STANDARD exercises."

"The user seemed to be enthusiastic about the demo’s interface and, in particular, she liked the diagnose-feedback of the error."

" He had some trouble with the Oral Feedback in "Improve": he told me that it was often too fast to understand."

"She told me that the demo is very beautiful, but sometimes she was in difficulty to understand the meaning of the buttons.

Her precise words: "The buttons are intuitive for nothing"."

 

3. Components of the program used:

All Oral and standard exercises were used by all subjects.

 

 

4. Other comments:

"The speaker was very worried to test the demo, because he said that his english was not so good.

So he made a lot of errors only because he spoke on the microphone very slow to try to pronounce correctly the phrases….but the demo often didn’t wait for him."

"This student speaks English very well: so she can be considered a good tester to understand if the demo really finds the right pronunciation errors."

 

 

4.4. Users' questionnaires: German learners

 

1. feedback

1.1 Is the feedback easy to understand?

v.easy easy neither difficult v.difficult

1 7 1

comments/examples:

" The feedback omitted the same pronunciations that were slightly different"

 

1.2 Do you feel the feedback is accurate in identifying your errors?

v.accurate accurate neither inaccurate v. inaccurate

1 3 5

comments/examples:

" Often the comment is inspecific, the problem is not explained"

" System failed to identify utterances which were quite different from each other"

"Sometimes surprising!"

" The program should stress the main mistakes"

 

1.3 Did you make errors that the program didn't identify? yes 5 no 4

comments/examples:

" If you are reading the words of the word list (Improve) the program only identifies the error you are improving in that exercise, but neglects any other errors you make."

 

1.4 Did the program identify errors you thought were correct? yes 5 no 4

comments/examples:

"pan, bag"

"Of course!"

" I couldn't detect any difference between the "teacher" and "student""

 

1.5 Does the feedback help you to improve your pronunciation?

v.well well neither badly v. badly

9

comments/examples:

"The possibility of listening to the examples in the diagnosis is helpful"

"It is a problem that I'm not quite fixed to either British or American English, so the pronunciation might be correct in another context, but is wrong here"

"Asked!"

"asked, wanted, address"

 

 

2. material

2.1 Is the language you have to speak realistic?

v. realistic realistic neither unrealistic v. unrealistic

1 8

 

2.2 Are the instructions clear?

v. clear clear neither unclear v. unclear

2 4 2 1

comments/examples:

"How the user is guided by the program could be better (e.g. If you click on a button, an explanation could be displayed). Clear buttons (without explanation are not useful!)"

"The spoken instructions are clear, the written ones are sometimes not (e.g. lesson 2, "Click on the microphone", which microphone? There are two of them"

"After accommodation to buttons and instructions, the handling was unclear"

 

 

3. design

3.1 Are the exercises/activities interesting?

v.interesting interesting neither uninteresting v.uninteresting

2 5 2

comments/examples:

"Because it is realistic"

 

3.2 Is the program visually attractive?

v.attractive attractive neither unattractive v.unattractive

7 1 1

comments/examples:

"Buttons do not use clearly understandable metaphors, different metaphors / symbols are used in different contexts of the program"

"I prefer clear lines, not this bubble-gum outfit"

"A simple interface that points out the main functions"

 

3.3 Is the language varied enough? yes 6 no 1 don't know 2

 

3.4 Would you use the program again / more than once? yes 7 no 1

comments/examples:

"for an advanced speaker / learner the program is too detailed. I would prefer to play in an English speaking country and adapt to what I hear."

"It would be better if there were more dialogues per unit offered, as often they are repeated"

 

4. learning

4.1 Does the program cover the most important pronunciation features? yes 7 don't know 2

 

 

4.2 What is missing?

comments/examples:

"One word is often repeated one after another, therefore there is a lacking in the possibility to listening again"

 

4.3 Is the target pronunciation appropriate? yes 7 don't know 2

 

4.4 Is the practice at the right level for you? yes 7 no 2

comments/examples:

"Too difficult (in the middle sector)"

"I had not learnt some of the vocabulary"

"Sometimes too hard"

"The vocabulary could be harder as it is too easy"

"Could be a useful help to correct the pronunciation"

 

4.5 What additional features would improve the program?

comments/examples:

"Different speakers and speeds of the teacher"

"Translations / Dictionary in the background"

"The possibility of taking out short sequences from the whole sentence and then it works!!"

"more pictures, more interface features using the pc, the actual version is similar to a tape exercise"

 

 

4.4.1 Evaluator's report

 

1. Problems experienced by the user:

"Open the feedback was "I wouldn't understand""

"The texts for adaptation are too small - difficult to read"

"Difficulty to speak an unknown word"

"Sometimes it couldn't be defined where the problem is (no specific diagnose given)"

"Listen and repeat: If the sentence couldn't be understood it is difficult to repeat and got a qualified reaction!"

"User tried to adapt to the way of speaking of teacher (faster), but this is not accepted"

"Examples for wrong and correct pronunciation in diagnose stage often hard to understand"

"Adjusting the microphone not found"

"Once the introduction text didn't end, it took some trials to jump to the dialogue"

" Sometimes forgets to press the speak button, or presses it and doesn't speak."

" The pop-up menus are sometimes too small, it is difficult to hit them accurately with the mouse."

"Dialogue overall is not easy to understand"

"Lesson 2, "Click the microphone", could be understood as to click the oral exercises"

"To pronounce unknown words"

"Click-Speak co-ordination"

"Dialogue interface, not clear"

"Exit from sub-chapters"

"Directions for user have been ignored/overlooked"

 

2. Reactions of the user (visual or verbal):

"likes clicking"

"repeats the exercise many times to improve outout / result"

"uses seldom the diagnose function"

"Experienced that a click on a blue word (improve stage) will get the teacher to speak the word"

"Impression that the user speaks with less melody than the user and then it causes unspecified problems."

"Long sentences cause problems: If the user concentrates on specific problems and is to improve, a problem may arise at another place. Would it be possible to practice parts of a sentence?"

"Laughs when own voice is heard"

"Happy with success"

" Was happy with good results"

"After some experience, more often repeated the sentences than look for diagnose and do improvement"

"Suprised, how differentiated the reaction of the system is."

"Immediately changed the cursor of "How strict should I judge?", to a lower position"

 

3. Components of the program used:

All subjects did Lessons 1 and 2.

 

 

4. Other comments:

"Impatient clicking confuses the system and takes a long time to decide what should be the next step. But never crashes."

"Stopping the introduction and changing to the dialogue is not explained to the user. Also not good: if the button (read and listen) is pressed, the dialogue stops."

"Sometimes the words spoken as examples are to short and difficult too analyse"

"Now and then frustrating when many mistakes are in one sentence"

"Minimal pairs ( ae/ and /e/ - /ae/ seems to ask for an /a/ sound."

" Very often the diagnose couldn't give special hints"

" Problems with the system:

a). Free choice: sound of the sentence components became inactive

b). The teacher's voice was not active"

" Sometimes (apart from clicking open to "teacher") the spoken word was different to the one displayed as wrong"

""Listen and Repeat" exercises are difficult, because often the student doesn't understand correctly the content of the sentence."

"If the German speaker mimicks the teacher, and speaks as fast as he, the system doesn't understand. But the student doesn't get a hint what's the reason for this problem."

"Minimal pairs, it would be more comfortable for the user, if already the first word would appear in blue, so that the student knows which word to read and speak."

"The program was not quite correct: student said, "eightieth" instead of "eighteenth" and the feedback was the incorrect "th""

"Problem: do the exercise "Free Choice", if the student crosses one of the words, with the mouse, the text dissapprears (greys)"

"Difficulties because questions and answers don't fit correctly to the texts, (causes frustration to the user)"

"Part of a sentence which was spoken completely was analysed!"

""Build the sentence", in case the answer is not the correct one this should be mentioned, but nevertheless the pronunciation should be scored and corrected"

"Help was necessary, because interface is not clear (controls had been explained)"

"Speak-Click interaction needs practice/adjustment??

 

Part III: TThe vValidationror tTool

The vValidationr tTool is written in Visual Basic 6.0 , and stores with the aim to insert all data in an Access 97 database for further analysis. The database permits to access to the data even outside the Tool.

Tool’s The tTool’s functions are:

 

Figure 111: The Structure

Figure 1: The Structure

Where:

There are two types of comparison: pPhone and sStress -comparison;

1. Annotation Table

1.1. Definition

The aAnnotation tTable is filled starting from .REF file and .LAB file produced by University of Leeds, merged into a text file, with MIL extension with this values :

 

Key

Description

ONSET

Onset (msec) of the phone[.LAB file].

OFFSET

Offset (msec) of the phone by annotator [.LAB file].

WORD

text of the word [.LAB file].

CANONICAL PHONE

Original (expected or "correct") phone in UK phone set.[.REF file].

ANNOTATOR PHONE

Perceived phone in UK set [.LAB file].

CANONICAL STRESS [.REF file]:

  • . : Symbol to indicate the NO-stress for consonants;
  • 1 : Primary Stress (Vowels);
  • U : Unstressed Vowel.

ANNOTATOR STRESS [.LAB file]:

 

  • . : Symbol to indicate the NO-stress for consonants;
  • 1 : Stressed Vowel;
  • U : Unstressed Vowel.

.LAB FILE

Original (expected or "correct") phone perceived by annotators.

Table 111: The definition of the MIL file

 

This is the structure of the aAnnotation tTable

 

Key

Type

DescriptionDescription

PHRASE

String

Session Name and file name of the phrase.

WD

Integer

index of the word in a phrase.

Wd Î [1, #words], where #words is the maximum number of words in an utterance.

WORD

String

text of the word.

PH

Integer

Index of a phone within a word.

Ph Î [1, #phones], where #phones is the maximum phones’ number in a word.

OP

String

Original (expected or "correct") phone in UK phone set. [.REF file].

CMA

String

Closest match of perceived phone in UK set [.LAB file].

CS

Byte

Canonical Stress [.REF file]

0: NO-stress for consonants

1: Primary Stress (Vowels)

99: Unstressed Vowel

PSA

Byte

Perceived Stress [.LAB file]

0: NO-stress for consonants

1: Primary Stress (Vowels)

99: Unstressed Vowel

TE

String

Phone-Type error: values defined in Table 3.

Table 222: Annotation Table’s Keys

Key

Description

Example

SUB

Substitution

A substitution of /AE/ with /EH/

INS

Insertion

A schwa inserted at the end of "DARK"

DEL

Deletion

A deletion of the /T/ at the end of "SUIT"

Table 333: Values of TE variable

 

1.2. Example

This is a .MIL file example (Session: 0132, file: BLOCKD02_60.txt):

 

I 000000000 003000000 # . . . . .

I 003000000 004400000 WHAT'S W W . . W

I 004400000 005600000 . OH OH P P OH

I 005600000 005900000 . T __ . . __ (1)

I 005900000 006600000 . S S . . S

I 006600000 006600000 # . . . . .

I 006600000 007100000 IN IH IH P P IH

I 007100000 007400000 . N N . . N

I 007400000 007400000 # . . . . .

I 007400000 007800000 THE DH DH . . DH

I 007800000 008400000 . IY IH P P IH (2)

I 008400000 008400000 # . . . . .

I 008400000 009100000 PICTURE P P . . P

I 009100000 010300000 . IH IY P P IY (2)

I 010300000 011000000 . K K . . K

I 011000000 011800000 . CH CH . . CH

I 011800000 014600000 . ER ER-R U U ER-=R (3)

I 014600000 017800000 # . . . . .

I 017800000 019100000 A AX HH-AX P P HH-AX (3)

I 019100000 019100000 # . . . . .

I 019100000 019600000 MOUTH M M . . M

I 019600000 021900000 . AW AW P P AW

I 021900000 024300000 . TH T . . T (2)

I 024300000 046700000 # . . . . .

The MIL file is inserted like this:

 

Phrase

Wd

Word

Ph

OP

CMA

CS

PSA

TE

SESS0132_BLOCKD02_60

1

WHAT'S

1

W

W

0

0

SESS0132_BLOCKD02_60

1

WHAT'S

2

OH

OH

1

1

SESS0132_BLOCKD02_60

1

WHAT'S

3

T

__

0

0

DEL

(1)

SESS0132_BLOCKD02_60

1

WHAT'S

4

S

S

0

0

SESS0132_BLOCKD02_60

2

IN

1

IH

IH

1

1

SESS0132_BLOCKD02_60

2

IN

2

N

N

0

0

SESS0132_BLOCKD02_60

3

THE

1

DH

DH

0

0

SESS0132_BLOCKD02_60

3

THE

2

IY

IH

1

1

SUB

(2)

SESS0132_BLOCKD02_60

4

PICTURE

1

P

P

0

0

SESS0132_BLOCKD02_60

4

PICTURE

2

IH

IY

1

1

SUB

(2)

SESS0132_BLOCKD02_60

4

PICTURE

3

K

K

0

0

SESS0132_BLOCKD02_60

4

PICTURE

4

CH

CH

0

0

SESS0132_BLOCKD02_60

4

PICTURE

5

ER

ER

99

99

SESS0132_BLOCKD02_60

4

PICTURE

6

__

R

99

99

INS

(3)

SESS0132_BLOCKD02_60

5

A

1

__

HH

1

1

INS

(3)

SESS0132_BLOCKD02_60

5

A

2

AX

AX

1

1

SESS0132_BLOCKD02_60

6

MOUTH

1

M

M

0

0

SESS0132_BLOCKD02_60

6

MOUTH

2

AW

AW

1

1

SESS0132_BLOCKD02_60

6

MOUTH

3

TH

T

0

0

SUB

(2)

Table 444: An example of Annotation Table

2. Diagnose Table

 

2.1. Definition

 

 

Key

Type

Description

Phrase

String

Session Name and file name of the phrase.

Wd

Integer

index of the word in a phrase.

Wd Î [1, #words], where #words is the maximum words’ number in an utterance.

Word

String

text of the word.

Ph

Integer

Index of a phone within a word.

Ph Î [1, #phones], where #phones is the maximum phones’ number in a word.

OP

String

Original (expected or "correct") phone (UK phone set)

[generated by IHAPI].

CMD

String

Closest match of perceived phone in UK set

[generated by ISLE DLLs].

PSD

Byte

Perceived Stress [generated by ISLE DLLs]:

0: Don’t Care or Unstressed

1: Primary Stress

RecCon

Single

Value of Confidence generated by the Recognition phase [IHAPI]

LocCon

Single

Confidence’s value generated by the Localization phase [IHAPI]

DiagCon

Single

Value of Confidence by ISLE DLLs.

Table 555: Diagnose Table’s Keys

 

NOTES:

TlocalizeWordErrors [Reference: ISLE report D4.4.: Integrated diagnosis component]

 

Again, filling the dDiagnose table, some words and/or phones can be skipped due to:

ANNOTATE TABLE

Phrase Wd Word Ph OP CMA CS PSA TE

2 15 FOR 1 F *F* 0 0

2 15 FOR 2 AO *AO* 1 1

2 15 FOR 3 R *R* 0 0

DIAGNOSE TABLE

Phrase Wd Word Ph OP CMD PSD RecConf LocConf DiagConf

2 15 FOR 1 f f 0 0.651815

2 15 FOR 2 er er 0 0.649284

 

In this step, the Tool generates 2 two tables containing the phonetic and the stress errors found by the diagnose DLL.

2.2. Example

 

Phrase

Wd

Word

Ph

OP

CMD

PSD

RecConf

LocConf

DiagConf

SESS0132_BLOCKD02_60

1

WHAT'S

1

W

W

0.998

0.530631

SESS0132_BLOCKD02_60

1

WHAT'S

2

OH

OH

0.998

0.952386

SESS0132_BLOCKD02_60

1

WHAT'S

3

T

T

0.998

0.855423

SESS0132_BLOCKD02_60

1

WHAT'S

4

S

S

0.998

0.917665

SESS0132_BLOCKD02_60

2

IN

1

IH

IH

0.998

0.9048

SESS0132_BLOCKD02_60

2

IN

2

N

N

0.998

0.712996

SESS0132_BLOCKD02_60

3

THE

1

DH

DH

0.998

0.932807

SESS0132_BLOCKD02_60

3

THE

2

AX

AX

0.998

0.576472

SESS0132_BLOCKD02_60

4

PICTURE

1

P

P

0.998

0.830555

SESS0132_BLOCKD02_60

4

PICTURE

2

IH

IY

0

0.998

0.982329

0.984924

SESS0132_BLOCKD02_60

4

PICTURE

3

K

K

0.998

0.998

SESS0132_BLOCKD02_60

4

PICTURE

4

CH

CH

0.998

0.966422

SESS0132_BLOCKD02_60

4

PICTURE

5

ER

ER

1

0.998

0.929955

SESS0132_BLOCKD02_60

5

A

1

AX

AX

0.998

0.909092

SESS0132_BLOCKD02_60

6

MOUTH

1

M

M

0.998

0.940577

SESS0132_BLOCKD02_60

6

MOUTH

2

AW

AW

0.998

0.998

SESS0132_BLOCKD02_60

6

MOUTH

3

TH

T

0.996949

0.998

0.825416

Table 666: An example of Diagnose Table

3. Compare Table

3.1. Definition

 

 

Key

Description

Phrase

Index of the phrase in a session. Phrase Î [1,#Phrase], where #Phrase is the maximum Phrase’s number in a session.

Wd

index of the word in a phrase.

Wd Î [1,#words], where #words is the maximum words’ number in a utterance.

Word

text of the word.

Ph

Index of a phone within a word.

Ph Î [1,#phones], where #phones is the maximum phones’ number in a word.

OP

Original (expected or "correct") phone (UK phone set) [.REF file].

CMA

Closest match of perceived phone in UK [generated by Annotation].

CMD

Closest match of perceived phone in UK set [generated by Diagnose].

HMP

Phone Comparison: values defined in Table 8.

CS

Canonical Stress

0 : Don’t Care or Unstressed

1: Primary Stress

PSA

Perceived Stress by Annotators

0 : Don’t Care or Unstressed

1: Primary Stress

PSD

Perceived Stress by Diagnose DLLs:

0 : Don’t Care or Unstressed

1: Primary Stress

HMS

Stress Comparison: values defined in Table 9.

RecCon

Single

Value of Confidence generated by the Recognition phase [IHAPI]

LocCon

Single

Confidence’s value generated by the Localization phase [IHAPI]

DiagCon

Single

Value of Confidence by ISLE DLLs.

Table 777: Compare Table’s Keys

Match Type: Values of the variables HMP and HMS

 

KEY

OP

CMA

CMD

HMP

HITS

X

Y

Y

HITS

NEAR HITS

X

Y

Z

NH

MISS

X

Y

X

MISS

FALSE ALARM

X

X

Y

FA

CORRECT

X

X

X

 

Table 888: Values of HMP variable

where X, Y, Z are phones.

 

KEY

CS

PSA

PSD

HMS

HITSHITS

X

Y

Y

HITS

MISS

X

Y

X

MISS

FALSE ALARM

X

X

Y

FA

CORRECT

X

X

X

 

Table 999: Values of HMS variable for "Phone Level"

where X and Y are phone’s stress value, that can be 0 or 1.

The cCompare tTable reassumes summarizes all the information for the aAnalysis. The table columns are filled in this way:

 

Figure 2: Relations between the tables

Surely, we need to ‘align’ the data from the aAnnotation and cCompare table, because some words and/or some phrases can’t be aligned due to several causes:

3.2. Example

 

 

Phrase

Wd

Word

Ph

OP

CMA

CMD

HMP

CS

PSA

PSD

HMS

RecCon

LocCon

DiagConf

SESS0132_BLOCKD02_60

1

WHAT'S

1

W

W

W

0

0

0

0.998

0.530631

SESS0132_BLOCKD02_60

1

WHAT'S

2

OH

OH

OH

1

1

1

0.998

0.952386

SESS0132_BLOCKD02_60

1

WHAT'S

3

T

__

T

MISS

0

0

0

0.998

0.855423

SESS0132_BLOCKD02_60

1

WHAT'S

4

S

S

S

0

0

0

0.998

0.917665

SESS0132_BLOCKD02_60

2

IN

1

IH

IH

IH

1

1

1

0.998

0.9048

SESS0132_BLOCKD02_60

2

IN

2

N

N

N

0

0

0

0.998

0.712996

SESS0132_BLOCKD02_60

3

PICTURE

1

P

P

P

0

0

0

0.998

0.830555

SESS0132_BLOCKD02_60

3

PICTURE

2

IH

IY

IY

HITS

1

1

0

0.998

0.982329

0.9849

SESS0132_BLOCKD02_60

3

PICTURE

3

K

K

K

0

0

0

0.998

0.998

SESS0132_BLOCKD02_60

3

PICTURE

4

CH

CH

CH

0

0

0

0.998

0.966422

SESS0132_BLOCKD02_60

3

PICTURE

5

ER

ER

ER

99

99

1

FA

0.998

0.929955

SESS0132_BLOCKD02_60

3

PICTURE

6

__

R

__

MISS

99

99

0

0

SESS0132_BLOCKD02_60

4

MOUTH

1

M

M

M

0

0

0

0.998

0.940577

SESS0132_BLOCKD02_60

4

MOUTH

2

AW

AW

AW

1

1

1

0.998

0.998

SESS0132_BLOCKD02_60

4

MOUTH

3

TH

T

T

HITS

0

0

0

0.9969

0.998

0.8254

Table 101010: An example of Compare Table

 

4. ISLE OCX Function-Structures for the Validation Process

 

 

 

Figure 3: The structure of the system

 

 

 

4.1. Error types returned by ISLE OCX

 

ErrorType

Number

Tag

Example error

Example feedback

0

ErrNoErr

None

none

1

ErrPhoneDel

/t/ → /_/

"NO*T*"

2

ErrPhoneSub

/ih/ → /iy/

"L*I*VE"

3

ErrPhoneInsSil

/_/ → /p/

"CU*P*BOARD"

4

ErrPhoneLeftIns

/ow/ → /hh ow/

"*O*VER"

5

ErrPhoneRightIns

/g/ → /g ax/

"DO*G*"

6

ErrPhoneSub2For1

/th/ → /t hh/

"BA*TH*ING"

7

ErrPhoneSwap2Phones

/s t/ → /t s/

"CA*ST*"

8

ErrPhoneSubNForN

/a b/ → /x y/

/m/ → /m b ax/

"A*BC*DE"

THU*m b ax*B

ErrPhoneSubNForN is the most-general (least specific) error class: when an error is not classified as one beetween 1 and 7, then the Error Type is 8.

9

ErrStressGeneric

DessERT → DESSert

"D*E*SS#E#RT"

10

ErrStressNV

CONflict → conFLICT

"C#O#NFL*I*CT"

 

4.2 Validation Process Functions

All the OCX’s functions for the vValidation tool tTool begin with the letter "V".

 

 

4.2.1. VLogLevel

 

Long CIsleOCXCtrl:: VlogLevel (long level)

Parameters:

Level

Log Verbosity of IHAPI’s debugging .

 

LOG-LEVELS

DESCRIPTION

ihNONE

Minimum debugging info.

ihFAIL

A bad thing happened, probably fatal

ihWARN

A warning, like "failed to diagnose word %d"

ihNOTE

Messages like "FUNCTIONNAME: entering" or ": returning"

ihALL

Maximum debugging info: a really common message, very non-important

 

Returns:

Kok

Always

 

Purpose:

Call the idSetAlertLevel function to fix the log verbosity.

 

 

4.2.2. VGetRecRecord

 

Long CisleOCXCtrl:: VgetRecRecord (Long Append, LPCTSTR diagnoseFileName)

Parameters:

Append

The mode in which the diagnose.txt file is filled

 

DiagnoseFileName

The file’s name produced by ISLE DLLs

 

Append Value

DESCRIPTION

-1

Close file

0

Nothing

1

Open file in Append mode

2

Open file in Write mode

 

Returns:

Kok

Always

 

Purpose:

This function is used to write the data defined in the RecRecord structure in the file diagnoseFileName, used by the Verification tool to fill the Diagnose Table.

The definition of diagnoseFileName variable permits to manage the concurrent elaboration.

 

 

4.2.3. VGetOCXStatus

 

Long CIsleOCXCtrl:: VgetOCXStatus(LPCTSTR Name)

Parameters:

Name

Name of a element of OCXStatus structure

Returns:

0 or 1

On success (value of the element)

 

KERR

On failure

 

Purpose:

This function is used to show to Validation Tool (and to Top level, too) the value of the elements of OCXStatus structure, defined as property in the OCX.

 

 

4.2.4. VEnableValidation

 

Long CIsleOCXCtrl:: VenableValidation(LPCTSTR fullPath)

Parameters:

FullPath

If running in the Validator contains the path of the session to analyse.

If in Demonstrator fullPath = "0"

 

Returns:

0

Always

 

Purpose:

This function is used to set the variable InValidation, defined in the OCXStatus structure.

 

 

4.2.5. VGetErrorDetail

 

 

Long CisleOCXCtrl:: VgetErrorDetail(long wordIndex,long errorIndex)

Parameters:

WordIndex

WordIndex Î [1;#words], where #words is the number of words in the utterance.

 

ErrorIndex

ErrorIndex Î [1;#errors], where #errors is the number of errors reported in the word, indexed by WordIndex.

 

Returns:

>0

Class of error found.

 

KERR

On failure.

 

Purpose:

To find out what type of error occurred.

Modifies:

The variable VerrInfo for Phone errors (Errors between 1 and 8: refer to Paragraph 5.).

VerrInfo is a "tab" separated list of the following items.

Items that do not apply are set to 0 (zero) for numeric variables, or to "" for strings.

Type

Tag

Describes

Long

ErrorType

The type of error, 1-(#errTypes).

Integer

Word-index

The index of the word [1-#words] in the utterance.

String

Word

The word (series of chars), 1-#words.

Long

OrigPhoneOnset

First phone wrong

Long

OrigPhoneOffset

Last phone wrong

String

CorrectPhone

Correct phone (or phones)

String

WrongPhone

Incorrect phone (or phones)

Float

Confidence

Value of Confidence (0.0 --> 1.0)

Example:

Error

Type

Word index

Word

OrigPhone

Onset

OrigPhone

Offset

CorrectPhone

Wrong

Phone

Conf

2

2

DOG

3

3

g

g ax

0.85

Modifies:

The variable VerrInfoStress for Stress errors (Errors 9 and 10: refer to Paragraph 5.).

VerrInfoStress is a "tab" separated list of the following items.

Items that do not apply are set to 0 (zero) for numeric variables, or to "" for strings.

Type

Tag

Describes

Long

ErrorType

The type of error, (9 or 10)

Integer

Word-index

The index of the word [1-#words] in the utterance.

String

Word

The word (series of chars), 1-#phones.

Long

CorrectStress

The correct stress phone in the word

Long

ErrorStress

The stress phone found by the DLLs

Example:

ErrorType

Word-index

Word

CorrectStress

ErrorStress

9

4

DRINKING

2

5

Notes:

VgetErrorDetail acts as TgetErrorDetail, but it generates the variables VerrInfo and VerrInfoStress variables, necessary for fill the Diagnose table of the Validation tool.

 

 

 

 

 

45. Theool’s output of the validation tool

 

The validation tTool can produce graphs for pPhone and sStress error analysis.

 

 

45.1. Phone error analysis

 

 

The graphics for the Phone analysis are built through the value of the HMP variable (see Table 8). COSA VUOLE DIRE???

 

 

 

Figure 24: Output’s scheme for pPhone error aAnalysis

 

 

In the Figure 4 it is described how we built the graphs for the pPhone analysis.

In particular we consider three groups of graphs:

ISLE DLLs can:

 

 

ISLE DLLs:

These graphs are generated with all the values of the HMP variable (see Table 8).

 

In Table 11 are reported all the information about the graph.

 

 

 

 

 

 

 

 

 

 

 

LEVEL OF ANALYSIS

 

TABLE NAME

 

NAME OF OUTPUT FILE

 

NAME OF MASTER FILE

 

HMP’S VALUES USED TO BUILD THE GRAPHIC

GENERAL ANALYSIS

Global

RptGeneralPhone

PhGenGlobal.xls

PhGeneralMaster.xls

All HMP’s values.

Phone’s Type

RptPhoneType

PhGenPhoneType.xls

PhGeneralMaster.xls

All HMP’s values for Phone’s Type.

Phones

RptPhone

PhGenPhone.xls

PhGeneralMaster.xls

All HMP’s values for each phone.

ANALYSIS OF CORRECT PHONES

Global

RptGenCorrectPH

PhCorrGlobal.xls

PhCorrectMaster.xls

CORRECT and FA.

Phone’s Type

RptPHCorrectType

PhCorrPhoneType.xls

PhCorrectMaster.xls

CORRECT and FA for Phone’s Type

Phones

RptCorrectPhone

PhCorrPhone.xls

PhCorrectMaster.xls

CORRECT and FA for each phone.

ANALYSIS of error PHONES

Global

RptGenErrorPH

PhErrGlobal.xls

PhErrMaster.xls

HITS, NEAR HITS and MISS.

Phone’s Type

RptPHErrorType

PhErrPhoneType.xls

PhErrMaster.xls

HITS, NEAR HITS and MISS for Phone’s Type.

Phones

RptPhoneError

PhErrPhone.xls

PhErrMaster.xls

HITS, NEAR HITS and MISS for each phone.

Table 11111111: The output of thefor pPhone error aAnalysis

 

 

45.1.1. Results

 

INTERVAL

%

CORRECT

92.7

FA

7.29

 

Number of istances

TOTAL

76690

CORRECT

71096

FA

5594

 

 

 

 

 

 

 

 

 

 

 

Figure 35: PhCorrGlobal.xls

 

INTERVAL

%

CORRECT

93.24

FA

6.75

 

Number of istances

TOTAL

711

CORRECT

663

FA

48

 

 

 

 

Figure 46: PhCorrPhone.xls

 

 

INTERVAL

%

CORRECT

88.77

FA

11.22

 

Number of istances

TOTAL

28628

CORRECT

25415

FA

3213

 

 

 

 

 

 

 

 

 

 

 

Figure 57: PhCorrPhoneType.xls

 

 

INTERVAL

%

MISS

69.08

HITS

20.06

NEAR HITS

10.84

 

Number of istances

TOTAL

3319

MISS

2293

HITS

666

NEAR HITS

360

 

 

 

Figure 68: PhErrGlobal.xls

 

 

INTERVAL

%

MISS

26.31

HITS

29.82

NEAR HITS

43.85

 

Number of istances

TOTAL

57

MISS

15

HITS

17

NEAR HITS

25

 

 

 

 

 

 

 

 

Figure 79: PhErrPhone.xls

 

 

INTERVAL

%

MISS

78.67

HITS

12.05

NEAR HITS

9.27

 

Number of instances

TOTAL

647

MISS

509

HITS

78

NEAR HITS

60

 

 

 

 

 

 

 

 

Figure 810: PhErrPhoneType.xls

 

INTERVAL

%

MISS

2.86

FA

6.99

HITS

0.83

CORRECT

88.86

NEAR HITS

0.44

 

Number of instances

TOTAL

80009

MISS

2293

FA

5594

HITS

666

CORRECT

71096

NEAR HITS

360

 

 

 

Figure 911: PhGenGlobal.xls

 

INTERVAL

%

MISS

1.95

FA

6.25

HITS

2.21

CORRECT

86.32

NEAR HITS

3.25

 

Number of instances

%

TOTAL

768

MISS

15

FA

48

HITS

17

CORRECT

663

NEAR HITS

25

 

 

 

 

 

 

 

Figure 1012: PhGenPhone.xls

 

 

INTERVAL

%

MISS

4.88

FA

3.9

HITS

0.74

CORRECT

89.89

NEAR HITS

0.57

 

Number of instances

%

TOTAL

10427

MISS

509

FA

407

HITS

78

CORRECT

9373

NEAR HITS

60

 

 

 

 

 

 

 

Figure 1113: PhGenPhoneType.xls

 

 

45.2. Stress error analysis

 

45.2.1. Phone level

The graphics for the Stress analysis are built through the value of the HMS variable (see Table 9). ANCHE QUI?????

 

 

 

 

Figure 1214: Output scheme for sStress error aAnalysis

 

Following the same reasoning used to define pPhone error -analysis (Paragraph 5.1), we generated the graphs for stress errors.

It is important to observe that for the pPhone-level stress -analysis the NH value of the HMS variable is not defined. (See Table 9)

 

 

 

 

 

LEVEL OF ANALYSIS

 

TABLE NAME

 

NAME OF OUTPUT FILE

 

NAME OF MASTER FILE

 

HMP’S VALUES USED TO BUILD THE GRAPHIC

GENERAL ANALYSIS

Global

RptGeneralStress

StGenGlobal.xls

StGeneralMaster.xls

All HMS’s values.

English vowels

RptPhoneStress

StGenVowels.xls

StGeneralMaster.xls

All HMS’s values for English vowels.

ANALYSIS of corrects

Global

RptGenCorrectStress

StCorrGlobal.xls

StCorrectMaster.xls

CORRECT and FA.

English vowels

RptCorrectStress

StCorrVowels.xls

StCorrectMaster.xls

CORRECT and FA for English vowels.

ANALYSIS of errors

Global

RptGenErrorStress

StErrGlobal.xls

StErrorMaster.xls

HITS and MISS.

English vowels

RptErrorStress

StErrVowels.xls

StErrorMaster.xls

HITS and MISS for English vowels.

Table 12121212: The output of the sfor Stress error aAnalysis

 

 

PHONE’S TYPES

TYPE

PhoneS

VOWELS

aa ae ah ao aw ax ay eh er ey ih iy oh ow oy uh uw

STOP CONSONANTS

p b d t f v g k

FRICATIVES

dh th s z sh ch jh zh

LIQUIDS

r l m n ng

SEMI-VOWELS

y w hh

Table 13131313: Phone’s tTypes

 

 

45.2.1.1. Results

 

INTERVAL

%

CORRECT

98.67

FA

1.32

 

 

Number of instances

TOTAL

30087

CORRECT

29689

FA

398

 

 

 

 

 

 

 

 

 

 

Figure 1315: StCorrGlobal.xls

 

 

 

INTERVAL

%

CORRECT

99.21

FA

0.78

 

 

Number of instances

TOTAL

760

CORRECT

754

FA

6

 

 

 

 

 

 

 

 

 

Figure 1416: StCorrVowels.xls

 

INTERVAL

%

MISS

69.51

HITS

30.48

Number of instances

TOTAL

410

MISS

285

HITS

125

 

 

 

 

 

 

 

 

 

 

 

Figure 1517: StErrGlobal.xls

 

INTERVAL

%

MISS

75

HITS

25

 

Number of instances

TOTAL

8

MISS

6

HITS

2

 

 

 

 

 

 

 

 

 

 

 

Figure 1618: StErrVowels.xls

 

 

INTERVAL

%

MISS

0.35

FA

0.49

HITS

0.15

CORRECT

98.98

 

Number of instances

TOTAL

79845

MISS

285

FA

398

HITS

125

CORRECT

79037

 

 

 

 

 

 

Figure 1719: StGenGlobal.xls

 

 

INTERVAL

%

MISS

0.78

FA

0.78

HITS

0.26

CORRECT

98.17

 

Number of instances

TOTAL

768

MISS

6

FA

6

HITS

2

CORRECT

754

 

 

 

 

 

 

 

Figure 1820: StGenVowels.xls

 

 

 

 

45.2.2. Word level

To generate these results we use an external tool tool (sStress gGenerator tTool) that extracts and elaborate the data from the COMPARE table.

 

Values of HMS variable for "Word Level"

 

KEY

CS

PSA

PSD

HMS

HITS

X

Y

Y

HITS

NEAR HITS

X

Y

Z

NH

MISS

X

Y

X

MISS

FALSE ALARM

X

X

Y

FA

CORRECT

X

X

X

 

Table 14141414: Values of HMS variable for "Word Level"

where X, Y, Z are the stress position (value of OP variable in Compare Table, see Table 10) in a word.

 

Example of word level stress analysis:

 

Phrase

Wd

CS

PSA

PSD

HMS

SESS0003_BLOCKE_01

1

6

6

6

SESS0003_BLOCKE_01

2

2

2

2

SESS0003_BLOCKE_01

4

4

4

4

SESS0003_BLOCKE_01

5

1

4

1

MISS

SESS0003_BLOCKE_01

6

1

1

1

SESS0003_BLOCKE_02

2

4

4

4

SESS0003_BLOCKE_02

3

2

2

4

FA

SESS0003_BLOCKE_02

4

4

4

4

SESS0003_BLOCKE_02

8

2

2

2

SESS0003_BLOCKE_03

4

3

3

3

SESS0003_BLOCKE_03

6

2

2

2

SESS0003_BLOCKE_04

2

2

2

2

SESS0003_BLOCKE_04

4

5

5

5

SESS0003_BLOCKE_04

5

3

3

3

SESS0003_BLOCKE_04

6

4

4

4

SESS0003_BLOCKE_05

5

7

7

7

SESS0003_BLOCKE_05

6

4

4

4

SESS0003_BLOCKE_07

2

2

5

2

MISS

SESS0003_BLOCKE_07

3

6

6

6

SESS0003_BLOCKE_07

4

1

1

4

FA

SESS0003_BLOCKE_07

7

2

2

2

SESS0003_BLOCKE_08

3

5

5

5

SESS0003_BLOCKE_08

5

2

2

2

SESS0003_BLOCKE_09

3

2

5

5

HITS

SESS0003_BLOCKE_11

1

2

2

2

SESS0003_BLOCKE_11

2

5

5

5

SESS0003_BLOCKE_11

4

5

5

5

SESS0003_BLOCKE_11

5

2

2

2

SESS0003_BLOCKE_12

2

2

2

2

SESS0003_BLOCKE_12

4

1

1

6

FA

SESS0003_BLOCKE_12

6

3

3

3

SESS0003_BLOCKE_12

8

1

1

1

SESS0003_BLOCKE_13

1

4

6

2

NH

SESS0003_BLOCKE_13

3

5

5

5

SESS0003_BLOCKE_13

6

9

9

9

SESS0003_BLOCKE_13

7

1

1

1

SESS0003_BLOCKE_13

9

4

4

4

SESS0003_BLOCKE_14

1

1

1

3

FA

SESS0003_BLOCKE_14

8

5

5

5

SESS0003_BLOCKE_15

7

3

3

3

SESS0003_BLOCKE_34

6

4

4

1

FA

Table 15151515 : An example sStress "Word Level " tTable on the word level

 

In this pictures the Rate are calculated in this way (# means number of):

 

FA RATE

HITS RATE

NH RATE

#FA / ( #FA + #CR )

#HITS / ( #HITS + #NH + #MISS )

( #HITS + #NH ) / ( #HITS + #NH + #MISS )

Table 16161616: Rate formulas

 

45.2.2.1. Results

 

HMS

Total

CountOfHMS

CR

6040

5233

FA

6040

398

HITS

6040

125

MISS

6040

282

NH

6040

2

 

 

FA RATE

HITS RATE

NH

RATE

7.07%

30.56%

31.05%

 

 

 

 

 

Figure 1921: Results "Word Level" Stress for German speakers

 

HMS

Total

CountOfHMS

CR

5210

4277

FA

5210

531

HITS

5210

100

MISS

5210

297

NH

5210

5

 

FA RATE

HITS RATE

NH

RATE

11.04%

24.88%

26.12%

 

 

 

 

 

Figure 2022: Results "Word Level" Stress for Italian speakers

 

56. Experiments with the recognition threshold

 

Recognition with IHAPI can have one of two results:

 

  1. It recognizes and so aligns the speech signal with the some path through the syntax.;
  2. It fails to recognize.

 

The input can also be:

 

  1. GOOD: it’s matches what IHAPI has been told to recognize;
  2. BAD: the words in the utterance are different or this is nonsense noise.

 

So there are the following possibilities:

 

 

Utterance matches prompt

Utterance is bad or pure noise

IHAPI aligns utterance to prompt

HIT

FALSE ACCEPT

IHAPI fails to align utterance to prompt

MISS

CORRECT REJECT

Table 17171717: IHAPI Alignement

 

In order to get as many HITS as possible, the ISLE demonstrator uses anthe aAdaptation process, so that the recognizer is better able to handle the differences between the student’s speech and the trained, UK models.

The problem is to avoid FALSE ALARMS and MISSES: clearly, if the recognizer is made very strict, it will reject almost every utterance, giving as results very few FALSE ACCEPTS, but also very many MISSES. Thus it is necessary to tune the available parameters, so that we strike a reasonable balance between the two.

The parameters that we have to adjustplay with are:

 

  1. Recognition parameters of the HMM, which affect

 

  1. Confidence measures: the confidence measures available in IHAPI

 

To do this it is computes the "average word confidence" across the sentence after recognition and if this value is below some threshold, we pretend (to the top level) that recognition failed.

Thus even if the recognizer successfully aligns

 

 

‘fuffa fuffa’

with the prompt

 

"they asked if I wanted to come along

on the barge trip",

 

we should be able to reject the utterance.

Table 18: Correct-Wrong prompts percentages

 

For each sentence we will then get two average confidence values:

 

Of course, sometimes the recognizer will actually fail to align a prompt to an utterance: in this case we set the word-confidence to zero.

 

These confidence values are available after the rRecognition stage, calling the OCX function idGetRecResults. :

 

i

If recognition fails, output a zero for that trial, .

 

We generate four types of ‘incorrect’ prompts:

  1. Type 1 : Non- sense phrase as syntax
  2. Example:

    Syntax: ANYTHING COFFEE MANY WAY TRADITIONAL

    Wav file: I SAID THROUGH NOT THOUGH

     

  3. Type 2 : Two words exchanged one another
  4. Example:

    Syntax: HE HAS HIS OWN STUDIO PHOTOGRAPHIC

    Wav file: HE HAS HIS OWN PHOTOGRAPHIC STUDIO

     

  5. Type 3 : 1 word repeated twice
  6. Example:

    Syntax: A STUDENT VISA PERMITS PERMITS THEM TO STAY LONGER

    Wav file: A STUDENT VISA PERMITS THEM TO STAY LONGER

     

  7. BAD SESSIONS:

These ‘incorrect’ prompts are realistic errors, that people make when reading sentences.

Our italianItalian/germanGerman speakers were asked to record hundreds of prompts. In most cases, they read the prompt word-for-word as expected.

Sometimes, though, they inserted, deleted, or repeated words, or in other ways mangled the sentences.

Thus, these ‘incorrect’ prompts are subsets of our corpus, for which the original (expected) and the actual (corrected) prompts are different.

 

Example 1:

Syntax: SAID THROUGH NOT THOUGH

Wav file: I SAID THROUGH NOT THOUGH

 

Example 2:

Syntax: SINGERS LEARN HOW TO PROJECT THEIR VOICES

Wav file: SINGERS LEARN HOW TO PROTECT THEIR VOICES

 

 

56.1 Results

 

Figure 2123 : Cumulative graph

 

Figure 2224: Frequencies graph

 

 

Figure 2325: Cumulative percentage graph

 

Figure 2426: Frequencies percentage graph

 

 

 

67. Experiments with the localization thresholdld

 

The question we want to ask is how well the system is we are able to find words/phones with errors.

Errors are defined as words/phones that the annotators scored as incorrect (see Table 3).

Because in the current demonstrator we only highlight entire words and not single phones, we limit these tests to trying to find a threshold that lets us automatically find as many of the words with real errors as possible, while mis-localizing as few as possible of the ‘good’ words.

 

In the vValidation tTool the lLocalization process is carried out in a second pass, after the rRecognition.

In the rRecognition stage (unless we recognize the utterance), we will determine the sequence of words spoken by the student.

We will then re-recognize the same audio file (.wav file), allowing only that sequence of words as in the prompt (i.e., in a multiple choice exercise, the rRecognition decides between various answers, but localization only focuses on the one spoken by the student.)

 

Localization will also use adapted models, but not the phone-level adapted models used in recognition.

In recognition we don’t care about how the student spoke, only about what she spoke.

 

In localization we want to know how well she spoke the words, and thus we do not want to use models that ‘make it easier’ on the student by allowing for differences between her pronunciation and the target UK accent. But Nevertheless we would like to eliminate the variability due to microphone, room conditions, and general properties of her vocal apparatus. Thus we use the so-called ‘globally adapted’ models that are created simultaneously with the ‘fully adapted’ models used in recognition.

In the real system, localization will mean computing a confidence score for each word and comparing it to a threshold; the list of those words with confidences below the threshold is then returned as the ‘bad’ words to the top-level.

 

Forom the vValidation point of view all we want to know is:

 

What threshold best distinguishes between those words that were somehow

wrong according to the human annotationrs and those that were ok?

 

So, we should recognize a large number of sentences, using only the correct prompt this time, and extract the localization confidence scores for words and phones.

In the case of word confidences, we generate a table like as you can seelike the one in Table 2018.

For phones, you could have a similar table, but the "OK"/"BAD" decision could refer either to the phone or to the entire word (the latter is simpler to analyze and more immediately useful to us, but less interesting.)

The aim of all this is to find the best threshold, i.e., the one that is greater than most "BAD" words and smaller than most "OK" words.

 

Table 1918: Word confidence a-Annotationr

67.12 Results

 

Comparing the confidence scores for "OK" and "BAD" words (see Table 189) we generate the following figures for the w"Word" and the p"Phone" level.

 

Figure 2527: Word level localization threshold

 

The phone-level threshold could even be specific to particular phones or classes of phones, although this should not, in theory, make much difference using the new Ggaussian classifiers.

 

Figure 2628: Phone Level localization threshold

 

Appendices

 

 

Appendix 1: On-line Evaluation: Instructions for the evaluator

 

1. try to ensure low background noise and distractions

2. allow the user to open the program from closed

3. observe and note down any problems experienced by the user

4. note down any reactions of the user (visual or verbal)

5. allow users to proceed through the program as they wish

6. allow users to stop and exit when they want

7. note down the components of the program used

8. record the total time spent

9. complete the questionnaire with the user after the session

 

Appendix 2: Introductory information

 

ISLE is a 2-year project funded by the EU (ending in March 2000), which aims to develop computer-based training for language learners wishing to improve their pronunciation. The main features are:

 

The version that is being evaluated at the end of the project is for demonstration purposes only. It is not intended for sale and is far from being a marketable piece of software. The recording of the dialogues, for example, was not made under professional conditions. The design was mainly determined by technical not pedagogical considerations. The aim of the on-line evaluation procedure is to get the reaction of learners and teachers towards the above features, rather than its value as a finished product.

 

For each evaluation session an evaluator will be present to

 

Each session should take a minimum of ¾ hour to use the program and ¼ hour for the questionnaire.

 

Appendix 3: Evaluator's record sheet

 

Location: __________________

 

Name of user: __________________________ Name of evaluator: _____________

 

Date of session: _____________ Time started: ________ Time ended: _____________

 

Problems experienced by the user:

 

_______________________________________________________________________________

 

_______________________________________________________________________________

 

_______________________________________________________________________________

 

 

Reactions of the user (visual or verbal):

 

_______________________________________________________________________________

 

_______________________________________________________________________________

 

_______________________________________________________________________________

 

 

Components of the program used:

 

_______________________________________________________________________________

 

_______________________________________________________________________________

 

_______________________________________________________________________________

 

_______________________________________________________________________________

 

 

 

Other comments:

 

_______________________________________________________________________________

 

_______________________________________________________________________________

 

_______________________________________________________________________________

 

Appendix 4: A. Sessions analyzed during the off-line evaluation

A.1. German Sessions

 

German Sessions

 

 

Italian Sessions

 

 

SESSION NAME

SPEAKER SEX

PARTNER

SESSION NAME

SPEAKER SEX

PARTNER

SESS0006

Female

ULeeds

SESS0003

Male

Dida El.

SESS0011

Male

ULeeds

SESS0040

Male

Dida El.

SESS0012

Male

ULeeds

SESS0041

Female

Dida El.

SESS0015

Male

ULeeds

SESS0121

Male

UMilan

SESS0020

Male

ULeeds

SESS0122

Female

UMilan

SESS0021

Female

ULeeds

SESS0123

Male

UMilan

SESS0161

Male

UHam

SESS0124

Male

UMilan

SESS0162

Male

UHam

SESS0125

Male

UMilan

SESS0163

Female

UHam

SESS0126

Male

UMilan

SESS0164

Male

UHam

SESS0127

Male

UMilan

SESS0181

Female

Klett

SESS0128

Female

UMilan

SESS0182

Male

Klett

SESS0129

Female

UMilan

SESS0183

Female

Klett

SESS0131

Male

UMilan

SESS0184

Female

Klett

SESS0130

Male

UMilan

SESS0185

Male

Klett

SESS0132

Male

UMilan

SESS0186

Male

Klett

SESS0133

Male

UMilan

SESS0187

Male

Klett

SESS0134

Male

UMilan

SESS0188

Male

Klett

SESS0135

Male

UMilan

SESS0189

Male

Klett

SESS0136

Male

UMilan

SESS0190

Female

Klett

SESS0137

Male

UMilan

SESS0191

Female

Klett

SESS0138

Male

UMilan

SESS0192

Female

Klett

SESS0139

Male

UMilan

SESS0193

Male

Klett

SESS0140

Male

UMilan

Table 20 19 : German The sessions

 

A.2. Italian Sessions

 

SESSION NAME

SPEAKER SEX

PARTNER

SESS0003

Male

Dida El.

SESS0040

Male

Dida El.

SESS0041

Female

Dida El.

SESS0121

Male

UMilan

SESS0122

Female

UMilan

SESS0123

Male

UMilan

SESS0124

Male

UMilan

SESS0125

Male

UMilan

SESS0126

Male

UMilan

SESS0127

Male

UMilan

SESS0128

Female

UMilan

SESS0129

Female

UMilan

SESS0131

Male

UMilan

SESS0130

Male

UMilan

SESS0132

Male

UMilan

SESS0133

Male

UMilan

SESS0134

Male

UMilan

SESS0135

Male

UMilan

SESS0136

Male

UMilan

SESS0137

Male

UMilan

SESS0138

Male

UMilan

SESS0139

Male

UMilan

SESS0140

Male

UMilan

Table 21: Italian sessions

 

B. Graphs

 

All the graph are available in FORMATO ELETTRONICOas files. OVVIO, COME SARANNO DISPONIBILI? Inutile mettere quest’appendice se non dice dove sono i file.