
The Validation Report Tools
Project: LE4-8353
Deliverable: D5.1
|
Version |
5 |
|
Date |
03.07.2000 |
ISLE Deliverable
|
Project Number |
LE4-8353 |
|
Project Title |
Interactive Spoken Language Education [ISLE] |
|
Deliverable Type |
Tool, Report |
|
Distribution |
Restricted |
|
Deliverable ID |
D5.1 |
|
Expected Delivery Date |
|
|
Actual Delivery Date |
066 JuneApr 2000 |
|
Title of Deliverable |
The Validation ReportTools |
|
Authors |
ULeeds [Howarth], Umilan [Pezzotta, Galbiati, Bisiani] |
|
OT |
RE |
SP |
PR |
TO |
|
Other |
Report |
Specification |
Prototype |
Tool |
|
C |
P |
R |
|
Consortium |
Public |
Restricted |
Revision History
|
Version |
Date |
Status |
Author(s) |
|
1 |
03-09-1999 |
Draft |
U Milan [Pezzotta, Galbiati, Bisiani] |
|
2 |
06-04-2000 |
Final Part IFinal |
U Milan [Pezzotta, Galbiati, Bisiani] |
|
3 |
15-05-2000 |
Draft Part II |
U Leeds [Howarth] |
|
4 |
06-06-2000 |
DraftFinal |
edited by Menzel |
|
5 |
1/7/2000 |
Final |
R. Bisiani |
Part I: Executive summary * Part I: Executive summary * Part II: The on-line evaluation * 1. Trialling * 2. Procedure * 3. Data collection * 4. Data analysis *
Part I: Executive summary *
Part II: The on-line evaluation
*1. Trialling
*2. Procedure
*3. Data collection
*4. Data analysis
*4.1. Native English-speaking teachers' questionnaires
*4.2 German-speaking teachers' questionnaires
*4.3. Users' questionnaires: Italian learners
*4.4. Users' questionnaires: German learners
*Part III: The validation tool
*1. Annotation Table
*1.1. Definition
*1.2. Example
*2. Diagnose Table
*2.1. Definition
*2.2. Example
*3. Compare Table
*3.1. Definition
*3.2. Example
*4. ISLE OCX Function-Structures for the Validation Process
*4.1. Error types returned by ISLE OCX
*4.2 Validation Process Functions
*5. The output of the validation tool
*5.1. Phone error analysis
*5.2. Stress error analysis
*6. Experiments with the recognition threshold
*6.1 Results
*7. Experiments with the localization threshold
*7.2 Results
*Appendix 1: On-line Evaluation: Instructions for the evaluator
*Appendix 2: Introductory information
*Appendix 3: Evaluator's record sheet
*Appendix 4: Sessions analyzed during the off-line evaluation
*
Figure 1: The Structure
Figure 2: Output’s scheme for phone error analysis
*Figure 3: PhCorrGlobal.xls
*Figure 4: PhCorrPhone.xls
*Figure 5: PhCorrPhoneType.xls
*Figure 6: PhErrGlobal.xls
*Figure 7: PhErrPhone.xls
*Figure 8: PhErrPhoneType.xls
*Figure 9: PhGenGlobal.xls
*Figure 10: PhGenPhone.xls
*Figure 11: PhGenPhoneType.xls
*Figure 12: Output scheme for stress error analysis
*Figure 13: StCorrGlobal.xls
*Figure 14: StCorrVowels.xls
*Figure 15: StErrGlobal.xls
*Figure 16: StErrVowels.xls
*Figure 17: StGenGlobal.xls
*Figure 18: StGenVowels.xls
*Figure 19: Results "Word Level" Stress for German speakers
*Figure 20: Results "Word Level" Stress for Italian speakers
*Figure 21 : Cumulative graph
*Figure 22: Frequencies graph
*Figure 23: Cumulative percentage graph
*Figure 24: Frequencies percentage graph
*Figure 25: Word level localization threshold
*Figure 26: Phone Level localization threshold
*
Table 1: The definition of the MIL file
Table 2: Annotation Table’s Keys
*Table 3: Values of TE variable
*Table 4: An example of Annotation Table
*Table 5: Diagnose Table’s Keys
*Table 6: An example of Diagnose Table
*Table 7: Compare Table’s Keys
*Table 8: Values of HMP variable
*Table 9: Values of HMS variable for "Phone Level"
*Table 10: An example of Compare Table
*Table 11: The output of the phone error analysis
*Table 12: The output of the stress error analysis
*Table 13: Phone types
*Table 14: Values of HMS variable for "Word Level"
*Table 15 : An example stress table on the word level
*Table 16: Rate formulas
*Table 17: IHAPI Alignment
*Table 18: Word confidence annotation
*Table 19 : The sessions
*Figure 1: The Structure *
Figure 2: Output’s scheme for phone error analysis
*Figure 3: PhCorrGlobal.xls
*Figure 4: PhCorrPhone.xls
*Figure 5: PhCorrPhoneType.xls
*Figure 6: PhErrGlobal.xls
*Figure 7: PhErrPhone.xls
*Figure 8: PhErrPhoneType.xls
*Figure 9: PhGenGlobal.xls
*Figure 10: PhGenPhone.xls
*Figure 11: PhGenPhoneType.xls
*Figure 12: Output scheme for stress error analysis
*Figure 13: StCorrGlobal.xls
*Figure 14: StCorrVowels.xls
*Figure 15: StErrGlobal.xls
*Figure 16: StErrVowels.xls
*Figure 17: StGenGlobal.xls
*Figure 18: StGenVowels.xls
*Figure 19: Results "Word Level" Stress for German speakers
*Figure 20: Results "Word Level" Stress for Italian speakers
*Figure 21 : Cumulative graph
*Figure 22: Frequencies graph
*Figure 23: Cumulative percentage graph
*Figure 24: Frequencies percentage graph
*Figure 25: Word level localization threshold
*Figure 26: Phone Level localization threshold
*
Figure 1: The Structure *
Figure 2: Relations between the tables
*Figure 3: The structure of the system
*Figure 4: Output’s scheme for phone error analysis
*Figure 5: PhCorrGlobal.xls
*Figure 6: PhCorrPhone.xls
*Figure 7: PhCorrPhoneType.xls
*Figure 8: PhErrGlobal.xls
*Figure 9: PhErrPhone.xls
*Figure 10: PhErrPhoneType.xls
*Figure 11: PhGenGlobal.xls
*Figure 12: PhGenPhone.xls
*Figure 13: PhGenPhoneType.xls
*Figure 14: Output scheme for stress error analysis
*Figure 15: StCorrGlobal.xls
*Figure 16: StCorrVowels.xls
*Figure 17: StErrGlobal.xls
*Figure 18: StErrVowels.xls
*Figure 19: StGenGlobal.xls
*Figure 20: StGenVowels.xls
*Figure 21: Results "Word Level" Stress for German speakers
*Figure 22: Results "Word Level" Stress for Italian speakers
*Figure 23 : Cumulative graph
*Figure 24: Frequencies graph
*Figure 25: Cumulative percentage graph
*Figure 26: Frequencies percentage graph
*Figure 27: Word level localization threshold
*Figure 28: Phone Level localization threshold
*
Table 1: The definition of the MIL file *
Table 2: Annotation Table’s Keys
*Table 3: Values of TE variable
*Table 4: An example of Annotation Table
*Table 5: Diagnose Table’s Keys
*Table 6: An example of Diagnose Table
*Table 7: Compare Table’s Keys
*Table 8: Values of HMP variable
*Table 9: Values of HMS variable for "Phone Level"
*Table 10: An example of Compare Table
*Table 11: The output of the phone error analysis
*Table 12: The output of the stress error analysis
*Table 13: Phone types
*Table 14: Values of HMS variable for "Word Level"
*Table 15 : An example stress table on the word level
*Table 16: Rate formulas
*Table 17: IHAPI Alignment
*Table 18: Correct-Wrong prompts percentages
*Table 19: Word confidence annotation
*Table 2019 : The sessions
*
REPORT SUMMARY / INTRODUCTIONPart I: Executive summary
The goal of the ISLE project aims is to build a tool to help adult intermediate learners of English improve their pronunciation, using speech recognition technology.
This report describes:
Specifications for the data collection are provided in ISLE report D31.
The dDetailed performance results are providedcan be found in:
In particular, this report details:
The distribution of this report is restricted to ISLE project partners, managers and reviewers.This is a public report.
THE VALIDATOR TOOL
*1. Annotation Table
*1.1. Definition
*1.2. Example
*2. Diagnose Table
*2.1. Definition
*2.2. Example
*3. Compare Table
*3.1. Definition
*HITS
*3.2. Example
*4. ISLE OCX Function-Structures for the Validation Process
*4.1. Error types returned by ISLE OCX
*4.2 Validation Process Functions
*5. Tool’s output
*5.1. Phone analysis
*5.1.1. Results
*5.2. Stress analysis
*5.2.1. Phone level
*5.2.1.1. Results
*5.2.2. Word level
*5.2.2.1. Results
*6. Experiments with the recognition threshold
*6.1 Results
*7. Experiments with the localization threshold
*7.2 Results
*Appendices
*Sessions analyzed
*
The Validator Tool
*1. Annotation Table
*1.1. Definition
*1.2. Example
*2. Diagnose Table
*2.1. Definition
*2.2. Example
*3. Compare Table
*3.1. Definition
*3.2. Example
*4. ISLE OCX Function-Structures for Validation Process
*4.1. Error types returned by ISLE OCX
*4.2 Validation Process Functions
*5. Tool’s output
*5.1. Phone analysis
*5.1.1. Results
*5.2. Stress analysis
*5.2.1. Phone level
*5.2.1.1. Results
*5.2.2. Word level
*5.2.2.1. Results
*6. Experiments with the recognition threshold
*6.1 Results
*7. Experiments with the localization threshold
*7.2 Results
*Appendices
*A. Sessions analyzed
*A.1. German Sessions
*A.2. Italian Sessions
*B. Graphs
*
Figure 1: The Structure
*Figure 2: Relations between the tables
*Figure 3: The structure of the system
*Figure 4: Output’s scheme for Phone Analysis
*Figure 5: PhCorrGlobal.xls
*Figure 6: PhCorrPhone.xls
*Figure 7: PhCorrPhoneType.xls
*Figure 8: PhErrGlobal.xls
*Figure 9: PhErrPhone.xls
*Figure 10: PhErrPhoneType.xls
*Figure 11: PhGenGlobal.xls
*Figure 12: PhGenPhone.xls
*Figure 13: PhGenPhoneType.xls
*Figure 14: Output scheme for Stress Analysis
*Figure 15: StCorrGlobal.xls
*Figure 16: StCorrVowels.xls
*Figure 17: StErrGlobal.xls
*Figure 18: StErrVowels.xls
*Figure 19: StGenGlobal.xls
*Figure 20: StGenVowels.xls
*Figure 21: Results "Word Level" Stress for German speakers
*Figure 22: Results "Word Level" Stress for Italian speakers
*Figure 23 : Cumulative graph
*Figure 24: Frequencies graph
*Figure 25: Cumulative percentage graph
*Figure 26: Frequencies percentage graph
*Figure 27: Word level localization threshold
*Figure 28: Phone Level localization threshold
*
Figure 2: Relations between the tables
*Figure 3: The structure of the system
*Figure 4: Output’s scheme for Phone Analysis
*Figure 5: PhCorrGlobal.xls
*Figure 6: PhCorrPhone.xls
*Figure 7: PhCorrPhoneType.xls
*Figure 8: PhErrGlobal.xls
*Figure 9: PhErrPhone.xls
*Figure 10: PhErrPhoneType.xls
*Figure 11: PhGenGlobal.xls
*Figure 12: PhGenPhone.xls
*Figure 13: PhGenPhoneType.xls
*Figure 14: Output scheme for Stress Analysis
*Figure 15: StCorrGlobal.xls
*Figure 16: StCorrVowels.xls
*Figure 17: StErrGlobal.xls
*Figure 18: StErrVowels.xls
*Figure 19: StGenGlobal.xls
*Figure 20: StGenVowels.xls
*Figure 21: Results "Word Level" Stress for German speakers
*Figure 22: Results "Word Level" Stress for Italian speakers
*Figure 23 : Cumulative graph
*Figure 24: Frequencies graph
*Figure 25: Cumulative percentage graph
*Figure 26: Frequencies percentage graph
*Figure 27: Word level localization threshold
*Figure 28: Phone Level localization threshold
*
Table 1: The definition of the MIL file
*Table 2: Annotation Table’s Keys
*Table 3: Values of TE variable
*Table 4: An example of Annotation Table
*Table 5: Diagnose Table’s Keys
*Table 6: An example of Diagnose Table
*Table 7: Compare Table’s Keys
*Table 8: Values of HMP variable
*Table 9: Values of HMS variable for "Phone Level"
*Table 10: An example of Compare Table
*Table 11: The output for Phone Analysis
*Table 12: The output for Stress Analysis
*Table 13: Phone’s Type
*Table 14: Values of HMS variable for "Word Level"
*Table 15 : An example Stress "Word Level " Table
*Table 16: Rate formulas
*Table 17: IHAPI Alignement
*Table 18: Correct-Wrong prompts percentages
*Table 19:Word conf-Annotator
*Table 20 : The sessions
*
Table 1: The definition of the MIL file *
Table 2: Annotation Table’s Keys
*Table 3: Values of TE variable
*Table 4: An example of Annotation Table
*Table 5: Diagnose Table’s Keys
*Table 6: An example of Diagnose Table
*Table 7: Compare Table’s Keys
*Table 8: Values of HMP variable
*Table 9: Values of HMS variable for "Phone Level"
*Table 10: An example of Compare Table
*Table 11: The output for Phone Analysis
*Table 12: The output for Stress Analysis
*Table 13: Phone’s Type
*Table 14: Values of HMS variable for "Word Level"
*Table 15 : An example Stress "Word Level " Table
*Table 16: Rate formulas
*Table 17: IHAPI Alignement
*Table 18: Correct-Wrong prompts percentages
*Table 19:Word conf-Annotator
*Table 20 : German sessions
*Table 21: Italian sessions
*
Part II: The on-line evaluation
For the purposes of testing the effectiveness of the ISLE demonstrator, the system was trialled with groups of adult non-native speakers of English from Italy and Germany, non-native teachers from Germany and native-speaker teachers in the UK, a total of 28 subjects:
University of Milan, Bicocca 6 Italian-speaking learners
Klett verlag, Stuttgart 9 German-speaking learners
8 German-speaking teachers
University of Leeds 5 English-speaking teachers
The demonstrator was installed at each location from CD and tested out by the project partners. In each case an evaluator was identified, who would supervise the trialling sessions. Instructions were distributed to the supervisors (see Appendix 1) and an introduction to the ISLE project was given to each volunteer (Appendix 2).
Two sources of information were used for data collection:
There follow the collated data from the various sources.
4.1. Native English-speaking teachers' questionnaires
Not all the separate comments are recorded here where there is considerable overlap.
1. feedback
1.1 Is the feedback easy for a learner to understand?
v.easy easy neither difficult v.difficult
4 1
Comments:
"Explanation and the chance to listen again to the native speaker as often as the student wishes is good"
"Easy but often inaccurate or vague"
"Clear but limited in scope. No suggestions are given"
1.2 Do you feel the feedback would be accurate in identifying their errors?
v.accurate accurate neither inaccurate v. inaccurate
1 3 1
Comments:
"One feedback comment confused the computer 'model' pronunciation with the speaker's pronunciation"
"I don't think it's very clear in identifying whether the error is one of stress or of sound production"
1.3 Did you mispronounce sounds that the program didn't identify? yes 4 no 1
Comments/examples:
"It doesn't recognise other variants"
"bratwurst"
"It picked up few consonant errors (eg d and p)"
1.4 Did the program falsely identify errors? yes 5 no
comments/examples:
"Deliberate mispronunciations of /EY/ were almost never picked up"
"As a native speaker, the program constantly corrected my pronunciation, which, I suppose, is RP! A little worrying."
"This is rather difficult to prove, but the speaker's pronunciation of /EH/ in one waord was understood as /AE/ by the computer."
"eg in 'wonderful' /UH/ for /AH/"
1.5 Would the feedback help learners to improve their pronunciation?
v.well well neither badly v. badly
1 1 2 1
Comments:
"I feel it depends entirely on a particular example. The speaker's pronunciation of 'glass' was very similar to the computer's, but was seen as a 'problem'. When repeated 5 times in 5 different ways the screen comment was 'good try'"
"For some 'problem' words there was no concrete model to listen to, only advice to keep practising. Not very helpful."
"It seems to be useful for specific sentences and sounds"
"Identifying the error, explaining it, isolating the sound, repeating it has to be useful"
"Often but not always a mispronounced word was correctly identified, but the diagnosis was either vague, or focused on the wrong syllable, or on a vowel instead of a consonant."
2. material
2.1 Is the language users have to speak realistic?
v. realistic realistic neither unrealistic v. unrealistic
3 1 1
Comments:
"The language itself is realistic but the delivery is unconvincing in the dialogues. The speakers sound bored."
"Going on a barge trip isn't very common".
"It seemed relatively realistic, although I'm not sure I would say 'I'll have a pizza and a soda to drink'"
2.2 Are the instructions clear?
v. clear clear neither unclear v. unclear
1 2 1 1
Comments:
"They vary. Many of the buttons are hard to find, in odd places or confusingly named ('micrtophone' can be either the microphone or the head in profile"
"Initially the first interface is very confusing. The other screens are a bit confusing"
"It does depend on how well the listener understands the symbols, which are not always clear eg the side panel"
"Extremely difficult to follow. Not obvious."
"Many things missing. The initial 'Start Program' button was above the general blurb, which you had to read first. Needs much more careful thought"
3. design
3.1 Are the exercises/activities interesting?
v.interesting interesting neither uninteresting v.uninteresting
1 2 1 1
Comments:
"Repetitive but I suppose this is hard to avoid"
"They are neutral- standard and bland- but clear enough."
"I don't think the interest level is very high as the situational dialogues are very conventional."
"How 'interesting' can you expect an exercise to be?"
"Limited exercise types. Not particularly stimulating sentences"
3.2 Is the program visually attractive?
v.attractive attractive neither unattractive v.unattractive
4 1
Comments:
"The colours are repetitive"
"Quite easy on the eye and friendly if a little esoteric."
"I can't see the rationale behind the opening web page design"
"Stylish to look at, but isn't always easy to find the buttons to click"
"Attractive but I found it hard to follow logically. I prefer a linear design, rather than a globular flowing design. The colours are a bit insipid"
3.3 Is the language varied enough? yes 1 no 2 don't know 2
Comments:
"There seems to be a model and variants away from this may be considered incorrect"
"If it's only dialogues, then obviously it isn't varied enough for pronunciation purposes"
3.4 Would you recommend your learners to use the program (again / more than once)? yes 2 no 3
Comments:
"Yes, because repetition plus clear exemplification is important with pronunciation. Students can go over each item as often as they like"
"Yes, if they were having problems with individual sounds"
"No, nowhere near accurate enough"
"Not yet, needs much more work"
"No, it is too unreliable for students to work with on their own"
4. learning
4.1 Does the program cover the most important pronunciation features? yes 3 no don't know 2
Comments:
"Yes, stress, minimal pairs"
4.2 What is missing?
Comments/examples:
"Links between words"
"weak forms versus full forms"
"Rhythm patterns, linking"
"Intonation patterns"
4.3 Is the target pronunciation appropriate? yes 5 no
Comments:
"It's the one most students want"
"I heard 2 different accents. How does the computer feedback differentiate between 'errors' and variants? What is the standard"
"Yes, however scope for variations wold be useful"
4.4 Is the practice at the right level for intermediate learners? yes 4 not sure 1
Comments:
"A bit easy?"
4.5 Would this material contribute to the development of your students' spoken English? yes 3 no 2
Comments:
"Yes, if it were seen as a resource with certain limitations"
"No, too many variables and unsolved problems"
4.6 What additional features would improve the program?
Comments:
"A link between 'teacher's' demo, pronunciation of a phrase and the diagnosis of a problem would be essential- occasionally the 'teacher' seemed to be making the same 'mistake' as the student e.g. using the weak form of 'than'"
"Using the phonetic alphabet instead of highlighting a letter in the word e.g. PROBLEM: an Italian speaker might think it should be pronounced /with /EH/ not /AX/."
"Ability for students to enter his/her own speech (language examples) and have feedback"
"Clearer symbols/buttons- perhaps fewer"
"Easier initial stage: reading individual sentences for the computer to adjust to the speaker's accent is tedious and the sentences are disjointed when separated by time gaps"
1. Problems experienced by the user:
"Constant problems with the volume level"
"The position of the start button at the top of the opening page is confusing"
"What to do now? Frequent pauses to find out what to do next"
"No warning that sound is going to start"
"One user didn't discover the text of the dialogue"
"Took a long time to discover the exercises"
"Not easy to move smoothly between the dialogues and the exercises"
2. Reactions of the user (visual or verbal):
"The adaptation is tedious. Why so long?"
"Which is the settings button? What is the difference between the settings? Not clear what the arrow buttons do"
"The colour of the buttons means they don't stand out"
"Tried using an Italian accent in the adaptation phase"
"Why are text exercises included?"
"Slow response caused multiple clicking"
"A lot of clicking at the wrong time"
"Odd design"
"Liked the layout and background; user-friendly appearance"
"not clear what to do after listening to the dialogue"
3. Components of the program used:
All users used all available components.
4. Other comments:
"It was an exercise in discovery. Without the evaluator's help it takes a very long time to find out how to use the program"
"A high level of strictness results in a lot of non-errors"
"Very hard to hear difference between teacher and student versions"
"Does 'than' in the system's dictionary have a weak variant?
"Very unreliable/inaccurate performance doesn't inspire confidence"
"'In the Office' dialogue: not possible to have text and listen to dialogue at the same time"
"The length of the adaptation phase is disproportionate to the amount of practice material available"
"Necessary to go back to the text (not easy) to answer the questions"
4.2 German-speaking teachers' questionnaires
1. feedback
1.1 Is the feedback easy for a learner to understand?
v.easy easy neither difficult v.difficult
4 3 1
1.2 Do you feel the feedback would be accurate in identifying their errors?
v.accurate accurate neither inaccurate v. inaccurate
5 3
1.3 Did you mispronounce sounds that the program didn't identify? yes 4 no 3
1.4 Did the program falsely identify errors? yes 5 no 3
comments/examples:
1.5 Would the feedback help learners to improve their pronunciation?
v.well well neither badly v. badly
8
2. material
2.1 Is the language users have to speak realistic?
v. realistic realistic neither unrealistic v. unrealistic
6 2
2.2 Are the instructions clear?
v. clear clear neither unclear v. unclear
2 6
3. design
3.1 Are the exercises/activities interesting?
v.interesting interesting neither uninteresting v.uninteresting
1 6 1
3.2 Is the program visually attractive?
v.attractive attractive neither unattractive v.unattractive
5 2 1
3.3 Is the language varied enough? yes 6 no 1 don't know 1
3.4 Would you recommend your learners to use the program (again / more than once)? yes 8 no
4. learning
4.1 Does the program cover the most important pronunciation features? yes 5 no don't know 3
4.3 Is the target pronunciation appropriate? yes 5 don't know 3
4.4 Is the practice at the right level for intermediate learners? yes 7 not sure 1
4.5 Would this material contribute to the development of your students' spoken English? yes 5 don't know 3
4.6 What additional features would improve the program?
Comments:
"More female voices"
"Analysis of free spoken text production"
4.3. Users' questionnaires: Italian learners
1. feedback
1.1 Is the feedback easy to understand?
v.easy easy neither difficult v.difficult
2 3 1
1.2 Do you feel the feedback is accurate in identifying your errors?
v.accurate accurate neither inaccurate v. inaccurate
3 3
Comments:
"I like very much the feedback in which I can understand the way I have to pronounce a phone, reading another word with the same phone"
1.3 Did you make errors that the program didn't identify? yes no 6
1.4 Did the program identify errors you thought were correct? yes 3 no 3
Examples:
'business'
'photographer'
1.5 Does the feedback help you to improve your pronunciation?
v.well well neither badly v. badly
2 4
Comments:
"To me it is very useful the fact that it is possible to see the errors I made and to repeat similar words (IMPROVE window) to learn the pronunciation of a single phone"
"I think that 'IMPROVE' and 'Practice the phones you have the most problems with' are really great ideas"
2. material
2.1 Is the language you have to speak realistic?
v. realistic realistic neither unrealistic v. unrealistic
6
Comments:
"In the demo I saw Lesson 5 and I think that it describes a realistic situation"
2.2 Are the instructions clear?
v. clear clear neither unclear v. unclear
6
Comments:
"I have only a consideration to do: In "Oral Exercise" there is an instruction that tells: "Please click on the microphone and read the sentence in the box" but on the window there are two "Microphone" buttons, and this can do muddle. "
"In Standard exercises there is not legend that tell me the meaning of the GREEN/RED feedback."
3. design
3.1 Are the exercises/activities interesting?
v.interesting interesting neither uninteresting v.uninteresting
1 4 1
Comments/examples:
"In particular I like very much the fact that I can run different kinds of exercises….this make the demo not boring."
" To me it is very boring the Standard Exercise TRUE/FALSE."
" The exercise I prefer is "Listen and Repeat"."
3.2 Is the program visually attractive?
v.attractive attractive neither unattractive v.unattractive
4 2
Comments:
"I like very much the menu to choose the exercise"
"The interface is very beautiful"
" I like very much the window "Arrival to Manchester", in which I can select the lessons."
"I like very much the colours defined for the windows (the background colours): they are very relaxing."
3.3 Is the language varied enough? yes 6 no
Comments:
" I think that there will be no tool enough detailed so that it can be considered enough to learn English perfectly. But, to me it is great the function associated with the "ABC" button, through which I can pronounce a lot of words associated with a particular phone."
3.4 Would you use the program again / more than once? yes 6 no
Comments:
" For my English it would be a good thing to use this demo again. In particular I like FREE CHOICE Oral exercise."
" For my English it would be a good thing to use this demo again."
" It is a good idea the fact that I can see and read the dialogue. Besides it is great the way the lesson is introduced."
" I like very much this demo, in particular the fact the it is able to correct my pronunciation at phone level."
" I like it very much: I saw other tools and I think this is the best!!!"
4. learning
4.1 Does the program cover the most important pronunciation features? yes 4 no
[no answers were given to 4.2 and 4.3]
4.4 Is the practice at the right level for you? yes 5 no 1
Comments:
" No, because my English is really bad, and so the level is really high to me."
" My English is not so good and this test confirms it. So I think the level is right"
4.5 What additional features would improve the program?
Comments/examples:
"ToolTips for the buttons would be very useful."
"Change the Navigational arrows to make them more intuitive: for example the arrows to change the exercise (the external ones) can be vertical. The background of the windows is too homogeneous: the windows would be more attractive putting more colours on it (changing for example changing the colour of the windows’ frame)."
"Put a legend to the STANDARD exercises to tell the user the meaning of the GREEN/RED feedback.
Define a button "Give me the Correct Translation " in the Translate Exercises.
"The buttons on the windows seem image and not buttons: it would be better that when the user pushes on a button, it changes its appearance."
" Add a progress–bar in READ-REPEAT exercise to tell me the velocity through which I have to pronounce the phrase.
Define a new type of exercise as mix of "READ-REPEAT " and "LISTEN-REPEAT", in which I can hear the utterance but I can have the text of the phrase to help to repeat."
1. Problems experienced by the user:
"There are some problems about the voice-feedback for the diagnose errors. For example in the "Read and Repeat" exercise with the phrase "They asked if I wanted to come along on the barge trip" the speaker make an error on the word "ASKED" and when I pushed the "teacher" option on the pop-menu’ the demo "said" "THEY ASKED" and not only "ASKED"."
" In some occasions the "Teacher" button doesn’t run."
" In the dialogue the Mr. Rossetti’s voice is low and so difficult to understand.
To do ORAL exercises why does the user must listen for the dialogue?
Buttons aren’t intuitive."
" In a occasion in the "Fill in the blank" exercise the inserted word covers the fixed text.
"Fill in the blank" option in the menu’ doesn’t work: the user access this exercise only through the navigational arrows."
" In "Standard Exercise"
The exercise "Translate" doesn’t run;
The external navigation arrow doesn’t run
In "TRUE/FALSE" exercise there is written:
"After listening the dialogue, please answer these questions with YES or NO "
But if the user hasn’t listened for the dialogue before, there is no way to listen for it."
" In some occasions the student had difficult to understand how to come to the previous window and, in general, to capture the meaning of the buttons."
2. Reactions of the user (visual or verbal):
"My impression was that the speaker seemed to be very enthusiastic…he told me that the demo’s interface are beautiful, and in particular he liked very much the diagnose-feedback of the error (possibility to listen for the correct pronunciation and to practice on a wrong word [IMPROVE window])"
" She told me that the demo is very interesting, but she was very perplexed, because the meaning of the buttons often it is not clear."
" The speakers told to me that the ORAL exercises are great, but he was very boring to do the STANDARD exercises."
"The user seemed to be enthusiastic about the demo’s interface and, in particular, she liked the diagnose-feedback of the error."
" He had some trouble with the Oral Feedback in "Improve": he told me that it was often too fast to understand."
"She told me that the demo is very beautiful, but sometimes she was in difficulty to understand the meaning of the buttons.
Her precise words: "The buttons are intuitive for nothing"."
3. Components of the program used:
All Oral and standard exercises were used by all subjects.
4. Other comments:
"The speaker was very worried to test the demo, because he said that his english was not so good.
So he made a lot of errors only because he spoke on the microphone very slow to try to pronounce correctly the phrases….but the demo often didn’t wait for him."
"This student speaks English very well: so she can be considered a good tester to understand if the demo really finds the right pronunciation errors."
4.4. Users' questionnaires: German learners
1. feedback
1.1 Is the feedback easy to understand?
v.easy easy neither difficult v.difficult
1 7 1
comments/examples:
" The feedback omitted the same pronunciations that were slightly different"
1.2 Do you feel the feedback is accurate in identifying your errors?
v.accurate accurate neither inaccurate v. inaccurate
1 3 5
comments/examples:
" Often the comment is inspecific, the problem is not explained"
" System failed to identify utterances which were quite different from each other"
"Sometimes surprising!"
" The program should stress the main mistakes"
1.3 Did you make errors that the program didn't identify? yes 5 no 4
comments/examples:
" If you are reading the words of the word list (Improve) the program only identifies the error you are improving in that exercise, but neglects any other errors you make."
1.4 Did the program identify errors you thought were correct? yes 5 no 4
comments/examples:
"pan, bag"
"Of course!"
" I couldn't detect any difference between the "teacher" and "student""
1.5 Does the feedback help you to improve your pronunciation?
v.well well neither badly v. badly
9
comments/examples:
"The possibility of listening to the examples in the diagnosis is helpful"
"It is a problem that I'm not quite fixed to either British or American English, so the pronunciation might be correct in another context, but is wrong here"
"Asked!"
"asked, wanted, address"
2. material
2.1 Is the language you have to speak realistic?
v. realistic realistic neither unrealistic v. unrealistic
1 8
2.2 Are the instructions clear?
v. clear clear neither unclear v. unclear
2 4 2 1
comments/examples:
"How the user is guided by the program could be better (e.g. If you click on a button, an explanation could be displayed). Clear buttons (without explanation are not useful!)"
"The spoken instructions are clear, the written ones are sometimes not (e.g. lesson 2, "Click on the microphone", which microphone? There are two of them"
"After accommodation to buttons and instructions, the handling was unclear"
3. design
3.1 Are the exercises/activities interesting?
v.interesting interesting neither uninteresting v.uninteresting
2 5 2
comments/examples:
"Because it is realistic"
3.2 Is the program visually attractive?
v.attractive attractive neither unattractive v.unattractive
7 1 1
comments/examples:
"Buttons do not use clearly understandable metaphors, different metaphors / symbols are used in different contexts of the program"
"I prefer clear lines, not this bubble-gum outfit"
"A simple interface that points out the main functions"
3.3 Is the language varied enough? yes 6 no 1 don't know 2
3.4 Would you use the program again / more than once? yes 7 no 1
comments/examples:
"for an advanced speaker / learner the program is too detailed. I would prefer to play in an English speaking country and adapt to what I hear."
"It would be better if there were more dialogues per unit offered, as often they are repeated"
4. learning
4.1 Does the program cover the most important pronunciation features? yes 7 don't know 2
4.2 What is missing?
comments/examples:
"One word is often repeated one after another, therefore there is a lacking in the possibility to listening again"
4.3 Is the target pronunciation appropriate? yes 7 don't know 2
4.4 Is the practice at the right level for you? yes 7 no 2
comments/examples:
"Too difficult (in the middle sector)"
"I had not learnt some of the vocabulary"
"Sometimes too hard"
"The vocabulary could be harder as it is too easy"
"Could be a useful help to correct the pronunciation"
4.5 What additional features would improve the program?
comments/examples:
"Different speakers and speeds of the teacher"
"Translations / Dictionary in the background"
"The possibility of taking out short sequences from the whole sentence and then it works!!"
"more pictures, more interface features using the pc, the actual version is similar to a tape exercise"
1. Problems experienced by the user:
"Open the feedback was "I wouldn't understand""
"The texts for adaptation are too small - difficult to read"
"Difficulty to speak an unknown word"
"Sometimes it couldn't be defined where the problem is (no specific diagnose given)"
"Listen and repeat: If the sentence couldn't be understood it is difficult to repeat and got a qualified reaction!"
"User tried to adapt to the way of speaking of teacher (faster), but this is not accepted"
"Examples for wrong and correct pronunciation in diagnose stage often hard to understand"
"Adjusting the microphone not found"
"Once the introduction text didn't end, it took some trials to jump to the dialogue"
" Sometimes forgets to press the speak button, or presses it and doesn't speak."
" The pop-up menus are sometimes too small, it is difficult to hit them accurately with the mouse."
"Dialogue overall is not easy to understand"
"Lesson 2, "Click the microphone", could be understood as to click the oral exercises"
"To pronounce unknown words"
"Click-Speak co-ordination"
"Dialogue interface, not clear"
"Exit from sub-chapters"
"Directions for user have been ignored/overlooked"
2. Reactions of the user (visual or verbal):
"likes clicking"
"repeats the exercise many times to improve outout / result"
"uses seldom the diagnose function"
"Experienced that a click on a blue word (improve stage) will get the teacher to speak the word"
"Impression that the user speaks with less melody than the user and then it causes unspecified problems."
"Long sentences cause problems: If the user concentrates on specific problems and is to improve, a problem may arise at another place. Would it be possible to practice parts of a sentence?"
"Laughs when own voice is heard"
"Happy with success"
" Was happy with good results"
"After some experience, more often repeated the sentences than look for diagnose and do improvement"
"Suprised, how differentiated the reaction of the system is."
"Immediately changed the cursor of "How strict should I judge?", to a lower position"
3. Components of the program used:
All subjects did Lessons 1 and 2.
4. Other comments:
"Impatient clicking confuses the system and takes a long time to decide what should be the next step. But never crashes."
"Stopping the introduction and changing to the dialogue is not explained to the user. Also not good: if the button (read and listen) is pressed, the dialogue stops."
"Sometimes the words spoken as examples are to short and difficult too analyse"
"Now and then frustrating when many mistakes are in one sentence"
"Minimal pairs ( ae/ and /e/ - /ae/ seems to ask for an /a/ sound."
" Very often the diagnose couldn't give special hints"
" Problems with the system:
a). Free choice: sound of the sentence components became inactive
b). The teacher's voice was not active"
" Sometimes (apart from clicking open to "teacher") the spoken word was different to the one displayed as wrong"
""Listen and Repeat" exercises are difficult, because often the student doesn't understand correctly the content of the sentence."
"If the German speaker mimicks the teacher, and speaks as fast as he, the system doesn't understand. But the student doesn't get a hint what's the reason for this problem."
"Minimal pairs, it would be more comfortable for the user, if already the first word would appear in blue, so that the student knows which word to read and speak."
"The program was not quite correct: student said, "eightieth" instead of "eighteenth" and the feedback was the incorrect "th""
"Problem: do the exercise "Free Choice", if the student crosses one of the words, with the mouse, the text dissapprears (greys)"
"Difficulties because questions and answers don't fit correctly to the texts, (causes frustration to the user)"
"Part of a sentence which was spoken completely was analysed!"
""Build the sentence", in case the answer is not the correct one this should be mentioned, but nevertheless the pronunciation should be scored and corrected"
"Help was necessary, because interface is not clear (controls had been explained)"
"Speak-Click interaction needs practice/adjustment??
Part III: TThe vValidationror tTool
The vValidationr tTool is written in Visual Basic 6.0 , and stores with the aim to insert all data in an Access 97 database for further analysis. The database permits to access to the data even outside the Tool.
Tool’s The tTool’s functions are:

Figure 1: The Structure
Where:
There are two types of comparison: pPhone and sStress -comparison;
The aAnnotation tTable is filled starting from .REF file and .LAB file produced by University of Leeds, merged into a text file, with MIL extension with this values :
|
Key |
|
|
Onset (msec) of the phone[.LAB file]. |
|
|
OFFSET |
Offset (msec) of the phone by annotator [.LAB file]. |
|
WORD |
text of the word [.LAB file]. |
|
CANONICAL PHONE |
Original (expected or "correct") phone in UK phone set.[.REF file]. |
|
ANNOTATOR PHONE |
Perceived phone in UK set [.LAB file]. |
|
CANONICAL STRESS [.REF file]: |
|
|
ANNOTATOR STRESS [.LAB file]:
|
|
|
.LAB FILE |
Original (expected or "correct") phone perceived by annotators. |
Table 111: The definition of the MIL file
This is the structure of the aAnnotation tTable
|
Key |
Type |
Description | ||
|
String |
Session Name and file name of the phrase. |
|||
|
WD |
Integer |
index of the word in a phrase. Wd Î [1, #words], where #words is the maximum number of words in an utterance. |
||
|
WORD |
String |
text of the word. |
||
|
PH |
Integer |
Index of a phone within a word. Ph Î [1, #phones], where #phones is the maximum phones’ number in a word. |
||
|
OP |
String |
Original (expected or "correct") phone in UK phone set. [.REF file]. |
||
|
CMA |
String |
Closest match of perceived phone in UK set [.LAB file]. |
||
|
CS |
Byte |
Canonical Stress [.REF file] |
||
|
0: NO-stress for consonants |
1 : Primary Stress (Vowels) |
99: Unstressed Vowel |
||
|
PSA |
Byte |
Perceived Stress [.LAB file] |
||
|
0: NO-stress for consonants |
1 : Primary Stress (Vowels) |
99: Unstressed Vowel |
||
|
TE |
String |
Phone-Type error: values defined in Table 3. |
||
Table 222: Annotation Table’s Keys
|
Key |
Description |
Example |
||
|
SUB |
Substitution |
A substitution of /AE/ with /EH/ |
||
|
INS |
Insertion |
A schwa inserted at the end of "DARK" |
||
|
DEL |
Deletion |
A deletion of the /T/ at the end of "SUIT" |
||
Table 333: Values of TE variable
This is a .MIL file example (Session: 0132, file: BLOCKD02_60.txt):
I 000000000 003000000 # . . . . .
I 003000000 004400000 WHAT'S W W . . W
I 004400000 005600000 . OH OH P P OH
I 005600000 005900000 . T __ . . __
(1)I 005900000 006600000 . S S . . S
I 006600000 006600000 # . . . . .
I 006600000 007100000 IN IH IH P P IH
I 007100000 007400000 . N N . . N
I 007400000 007400000 # . . . . .
I 007400000 007800000 THE DH DH . . DH
I 007800000 008400000 . IY IH P P IH
(2)I 008400000 008400000 # . . . . .
I 008400000 009100000 PICTURE P P . . P
I 009100000 010300000 . IH IY P P IY
(2)I 010300000 011000000 . K K . . K
I 011000000 011800000 . CH CH . . CH
I 011800000 014600000 . ER ER-R U U ER-=R
(3)I 014600000 017800000 # . . . . .
I 017800000 019100000 A AX HH-AX P P HH-AX
(3)I 019100000 019100000 # . . . . .
I 019100000 019600000 MOUTH M M . . M
I 019600000 021900000 . AW AW P P AW
I 021900000 024300000 . TH T . . T
(2)I 024300000 046700000 # . . . . .
The MIL file is inserted like this:
|
Phrase |
Wd |
Word |
Ph |
OP |
CMA |
CS |
PSA |
TE |
|
|
SESS0132_BLOCKD02_60 |
1 |
WHAT'S |
1 |
W |
W |
0 |
0 |
||
|
SESS0132_BLOCKD02_60 |
1 |
WHAT'S |
2 |
OH |
OH |
1 |
1 |
||
|
SESS0132_BLOCKD02_60 |
1 |
WHAT'S |
3 |
T |
__ |
0 |
0 |
DEL |
(1) |
|
SESS0132_BLOCKD02_60 |
1 |
WHAT'S |
4 |
S |
S |
0 |
0 |
||
|
SESS0132_BLOCKD02_60 |
2 |
IN |
1 |
IH |
IH |
1 |
1 |
||
|
SESS0132_BLOCKD02_60 |
2 |
IN |
2 |
N |
N |
0 |
0 |
||
|
SESS0132_BLOCKD02_60 |
3 |
THE |
1 |
DH |
DH |
0 |
0 |
||
|
SESS0132_BLOCKD02_60 |
3 |
THE |
2 |
IY |
IH |
1 |
1 |
SUB |
(2) |
|
SESS0132_BLOCKD02_60 |
4 |
PICTURE |
1 |
P |
P |
0 |
0 |
||
|
SESS0132_BLOCKD02_60 |
4 |
PICTURE |
2 |
IH |
IY |
1 |
1 |
SUB |
(2) |
|
SESS0132_BLOCKD02_60 |
4 |
PICTURE |
3 |
K |
K |
0 |
0 |
||
|
SESS0132_BLOCKD02_60 |
4 |
PICTURE |
4 |
CH |
CH |
0 |
0 |
||
|
SESS0132_BLOCKD02_60 |
4 |
PICTURE |
5 |
ER |
ER |
99 |
99 |
||
|
SESS0132_BLOCKD02_60 |
4 |
PICTURE |
6 |
__ |
R |
99 |
99 |
INS |
(3) |
|
SESS0132_BLOCKD02_60 |
5 |
A |
1 |
__ |
HH |
1 |
1 |
INS |
(3) |
|
SESS0132_BLOCKD02_60 |
5 |
A |
2 |
AX |
AX |
1 |
1 |
||
|
SESS0132_BLOCKD02_60 |
6 |
MOUTH |
1 |
M |
M |
0 |
0 |
||
|
SESS0132_BLOCKD02_60 |
6 |
MOUTH |
2 |
AW |
AW |
1 |
1 |
||
|
SESS0132_BLOCKD02_60 |
6 |
MOUTH |
3 |
TH |
T |
0 |
0 |
SUB |
(2) |
Table 444: An example of Annotation Table
|
Key |
Type |
||
|
String |
Session Name and file name of the phrase. |
||
|
Wd |
Integer |
index of the word in a phrase. Wd Î [1, #words], where #words is the maximum words’ number in an utterance. |
|
|
Word |
String |
text of the word. |
|
|
Ph |
Integer |
Index of a phone within a word. Ph Î [1, #phones], where #phones is the maximum phones’ number in a word. |
|
|
OP |
String |
Original (expected or "correct") phone (UK phone set) [generated by IHAPI]. |
|
|
CMD |
String |
Closest match of perceived phone in UK set [generated by ISLE DLLs]. |
|
|
PSD |
Byte |
Perceived Stress [generated by ISLE DLLs]: |
|
|
0: Don’t Care or Unstressed |
1: Primary Stress |
||
|
RecCon |
Single |
Value of Confidence generated by the Recognition phase [IHAPI] |
|
|
LocCon |
Single |
Confidence’s value generated by the Localization phase [IHAPI] |
|
|
DiagCon |
Single |
Value of Confidence by ISLE DLLs. |
|
Table 555: Diagnose Table’s Keys
TlocalizeWordErrors [Reference: ISLE report D4.4.: Integrated diagnosis component]
Again, filling the dDiagnose table, some words and/or phones can be skipped due to:
ANNOTATE TABLE
Phrase Wd Word Ph OP CMA CS PSA TE
2 15 FOR 1 F *F* 0 0
2 15 FOR 2 AO *AO* 1 1
2 15 FOR 3 R *R* 0 0