Training environments
for spoken language learning systems
Project: LE4-8353
Deliverable: D 2.4
|
Version |
Draft |
|
Date |
24/9/99 |
Isle Deliverable
|
Project Number |
LE4-8353 |
|
Project Title |
Interactive Spoken Language Education [ISLE] |
|
Deliverable Type |
RE |
|
Distribution |
P |
|
Deliverable ID |
D 2.4 |
|
Expected Delivery Date |
T13 |
|
Actual Delivery Date |
T18 |
|
Title of Deliverable |
Training environments for spoken language learning systems |
|
Author(s) |
Dida*El (F. Daneluzzi, A. Fabiano, A. Petitti) |
|
OT |
RE |
SP |
PR |
TO |
|
Other |
Report |
Specification |
Prototype |
Tool |
|
C |
P |
R |
|
Consortium |
Public |
Restricted |
|
Version |
Date |
Status |
Author(s) |
|
1 |
25/6/1999 |
First draft |
FD |
|
2 |
15/9/1999 |
Second draft |
AP |
|
3 |
24/9/1999 |
Final draft |
DH |
1 Report Summary
*2 Introduction
*2.1 Techniques for a Multimedia language learning courseware
*2.2 Educational Strategies
*2.3 Navigation styles
*3 The ISLE demonstrator
*3.1 Presentation
*3.2 Phases of Testbed
*3.3 Composition of MENU: Units and Lessons
*3.4 Interface of the tutorial pages
*3.5 Buttons
*3.6 Text / Dialogue
*3.7 The Exercises
*3.8 Feedback connected to the diagnosis functions
*3.9 Notes associated to feedback
*3.10 Training
*The purpose of this report is to describe the functionalities of a training environment for spoken language learning system. In this report we will describe the set of speech-driven exercises that have been designed and how these will be integrated in a complete courseware aiming at developing the students’ capabilities to autonomously communicate in the four fundamental linguistic skills: listening, reading, writing and talking.
The main purpose of this document is to describe the features that a training environment for spoken language learning should have in order to train or improve students’ spoken skills.
It is our opinion that the learning process should aim to the development of global communication capabilities. A multimedia foreign language course should lead gradually the student to be autonomously able to communicate in the four fundamental linguistic skills: listening, reading, writing and talking. Therefore, in order to be effective, a training environment willing to develop oral abilities will have to provide a comprehensive learning environment dealing also with the other fundamental linguistic skills. In addition, it should also offer some understanding on the culture and civilisation of the country concerned.
Up to now, existing foreign language courseware, has been rather ineffective as far as the oral capabilities development are concerned. This is mainly due to two different aspects: from one side, the limited diffusion of computer with enough computational power and with satisfactory multimedia capabilities, and from the other, the unsatisfactory performance of the speech recognition systems especially as far as foreign speakers are concerned.
During the development of the first prototype, the work of the consortium has been focused on two major points: a) to demonstrate that the available speech recognition technology has the potential to set up a learning environment with good teaching performances and b) solve the technical issues concerning the integration of the speech-driven exercises within existing foreign language learning courses.
In this document we will therefore describe all the features that a multimedia foreign language course should have in order to successfully integrate oral training activities.
In the second chapter we will describe the features that traditional multimedia foreign language courseware should have, while in the third chapter we will give the description of the voice driven exercises we are developing to perform oral activities. Particular attention will be given to the description of the feedback provided to the users and on the "recovery" actions that the system will take based on the measured users’ performance.
In the last chapter a description of the final version of the ISLE prototype will be proposed including the structure of the course and the graphical layout.
Techniques and Educational Strategies for a Multimedia language learning courseware
Given these objectives, we think it may be useful to point out the techniques we are going to use to show their most distinctive aspects without any classificatory pretence.
By Direct presentation we mean an information’s communication technique that reproduces and transfers on software the typical slide show technique: short test, drawings, graphics. Its main function is to support the tutor in his/her explanation putting at his/her disposal some visual material to comment. There is neither educational dialogue nor exercise nor learning test. Menus, buttons and active areas allow the student to personalise the learning path.
The main function, and the added value that is given by motion in comparison with still images is the time dimension. The user has a direct perception of how events take place and which elements are involved without following a textual description and look for the objects which the description is referred to each time.
We consider hypertext as a way to organise information that is transversal to all above mentioned typologies. The term "hypertext" was coined by Ted Nelson around 1965 for a collection of documents (or "nodes") containing cross-references or "links" which, with the aid of an interactive browser program, allow the reader to move easily from one document to another. An extension of hypertext to include graphics, sound, video and other kinds of data is a "hypermedia".
Of course these techniques have to mixed in a way coherent with a specific Educational Strategy that is defined by the kind of relationship that we manage to set up with the learner. From a practical point of view this is implemented identifying evaluating and self-evaluating techniques that concur to determine a specific sequences of learning units.
As far as the target course is concerned, we think it would be better to resort to active strategies. These consider the learner as an agent able to influence the educational process and to decide in a certain way its procedures.
A language learning course should be based on expositive strategies as well, and these should be used for those parts which seem to be linked to secondary objectives. Even if software will implement expositive strategies, this leaves the tutor free to integrated this basic strategy with whatever strategy and whatever interaction with students. A typical method used is to ask learners to formulate definitions and to predict what will happen before explaining the correct version. Instead of receiving the piece of information passively, the learner will tend to activate a mechanism that will compare what he thought before and what the course is telling at present.
In order to be effective a language learning course should also supports discovery activities. The basic assumption in this case, which dates back to the Socratic model of the dialogue or conversation, is that the learner already knows something or has already his/her own experience about the topic or the activity involved; his/her knowledge may be weak and it may need to be taken on to surface, to be rationalised and widened. Learning discovery may be free or guided. The main difference between guided and free discovery consists in the fact that in the former the learner follows a scheme according to logical/time steps set by the program in advance, along a sequence that reproduces the questions that he/she could be able to ask him/herself to grammar rules and vocabulary.
In the free discovery the sequence is not set in advance. The learner follows the scheme freely according to his/her curiosity, evaluating which are the points that he/she knows best/least.
In the end the choice of an educational technique must take educational strategies into account. The educational technique allow in turn to implement some strategies that otherwise couldn’t be made via computer. However fixed and coded relationship between a specific strategy and a technology doesn’t exist: the same technology may in fact implement different strategies.
To navigate in a hypermedia means "finding one’s way around": the style and type of this navigation depends on the hypertext structure and on the different design principles that may guide the development of the multimedia application. The user may be led on a set path or he may be more free to explore. A brief summary of different hypertext and multimedia structures is hereby provided. The choice of the architecture of a hypermedia teaching application may be relevant if related with the pedagogical needs the instrument is developed for: it’s useful to let the user free to explore an encyclopædia, but what if he has to improve his knowledge about a working procedure? Providing a set path with verification tests might be more useful in such a case.
According to the linear model, the contents of the course are divided in units or lessons, from the easier topics to the most difficult.
Intermediate assessment and evaluation of the learner’s achievement are provided through the execution of tests such as simple multiple choice, multiple choice with several answers, true/false.

Figure 1 The linear model
This model has a hierarchical structure: one or more main topics allow to access more information in different pages, but each page is separated from the others. The user must turn back to one of the main topics to change the direction of his/her navigation. Such a structure requires a very rigid determination of main topics and pages of the second level.

Figure 2 The hierarchical model
The modality model assures a mixture between a linear and a hypertextual structure based on two or more main "modalities". This structure, largely used in Dida*El courses, implies that different types of pages run parallel dealing with the same topic from different points of view. For example there are courses with four modalities, that may be described as "Descriptive" (what is it, how does it look like), "Operative" (How do I…), "Simulating" (Try to do it ) "Testing" (Answer these questions). This structure may be useful in a language course where there may be "grammar", "listening", "reading", "exercises", etc. The user can therefore:
Navigate by topic, choosing, for each setting or lesson ("In the Office", "Shopping", "Holidays") to start from grammar and finish with exercises;
Navigate by modality, reading only "grammar pages" from one lesson to another, for example.
Skip from one page to another (pages have to be very "well-knit" together).
Giving the user a full navigating freedom in this kind of structure may result rather complicated because different buttons and functions have to be provided: a good interface design is required.

Figure 3 The modality model
According to the needs of each multimedia course different mixed solutions may be adopted, at different hierarchical levels. For example the user may be forced to go through a lesson before passing to the next one (linear), but he may be free to choose the listening section before grammar (hypertextual and modal), or he may be forced to go back the "dialogue" page to choose the "reading" section (hierarchical, at a lower level).

Figure 4 The mixed hypertextual structure
In this section we are going to describe how we intend to exploit the experiences done with the first ISLE prototype by producing a full English course embedding spoken language learning activities. As already stated in the previous sectionswe think that in order to achieve satisfactory "learning performances" it is necessary to train the users in all the four fundamental oral skills (listening, reading, writing and talkingxxx, xxx, xxx, and xxx).
The spoken language learning activities will therefore be included in an English course we are currently developing. The course is currently structured in 10 units. Each unit is in turn made of 5 lessons. The lesson is therefore a sort of atomic learning entity that is characterised by a dialogue around which all the learning activities will have to be organised. The contents are organised following a practical approach and are centred around the "adventures" of Paolo during a working travel around the StatesBritain.
It will not be necessary, for the purpose of the project (i.e. validation with the users) to set up exercises related to all the 50 lessons; 10 5 lessons will be enough to support the validation activities.
PresentationThe SCENARIOS of the course will be:
The course has 5 MAIN UNITS. Each one comprises 7/8 lessons. Each lesson is made of :
General functions are always available, among which:
Each unit of the course is associated to an image and a colour.
Each page of the course has one "content" frame, the colour of this frame depends on the selected unit.
For the whole course, "active text" is mouse sensitive. A coloured textbox pops up when the mouse is over it.
The feedback to exercises is expressed with the following coloured code:
The course is intended for adults (not for children), whose mother tongue is Italian and who have a good knowledge of the PC.
When the application is launched, the learner has to choice to follow the course in
Once the modality of execution is chosen, the learner has TO RECORD his or her own data (NAME, LAST NAME, USERID, PASSWORD, MOTHER TONGUE) if he is not registered.
If the course is resumed after a previous recording, the learner is only asked his USER ID and PASSWORD.
The application administrative system records all the information relative to every customer and select and initialise accordingly the diagnosis rule set. Depending on the mother tongue of the learner, a different rule set is used to diagnose his errors.
This registration step also sets the files and folders needed for the execution of the application.
After the registration step, it is necessary TO CALIBRATE the SYSTEM with the help of the microphone, to optimise the quality of the recordings used in the oral exercises, especially with respect to the background noises. Thus the system asks the customer to push the "speak" button and to say one sentence in the microphone: depending on how the sentence to read was recorded, the system gives feedback to ease the calibration, for instance
Once the calibration operation is completed, the system gives access to the MAIN MENU of the course. The learner can then choose one of the 5 main units and the DIRECTORY (MENU) of the LESSONS relative to each unit (approximately 6/7 for each unit) is presented.
From there, the learner can truly start the course.
Composition of MENU: Units and LessonsThe MAIN MENU is a collage of images that reproduce the scenarios developed in each unit. For instance, recall that the first scenario is about the arrival of the leading character of the history to Manchester, the graphical elements associated to this scenario are an image of an aeroplane in flight and the grey colour.
Each image of the MAIN MENU is mouse sensitive: when the cursor passed over the image, a short description of the contents in the unit pops up.
Once the learner has clicked on a scenario, he sees the MENU of LESSONS. The background of the menu (and also for all the pages of the unit) is the image that identifies the unit, magnified and rendered opaque. The voices of the lessons are on the side of a textbox. When the mouse passes over a voice, the content of the lesson is displayed in the textbox.
Once a lesson is selected, the learner has at hand a standard interface for reading the texts, listening to the conversations and doing the exercises. Every PAGE is composed by:
Also present will be buttons for:
Buttons with functions related to lessons
The learner can resume the listening of the dialogue after having seen the text with simple click on the key "Dialogue".

The exercises are divided into two main categories:
The standard exercises are text-based. The learner does the exercises using the keyboard and the mouse. These exercises are intended to improve the grammatical abilities of reading and writing. They are actually classified in the following categories:
The standard exercises do not contain any speech recognition component.
The oral exercises are planned to improve the pronunciation and understanding ability of the learner. They include the following kinds of exercise:
A single lesson may not include all the five types of exercises. The kind of the exercise is rather selected on the basis of the text presented in the dialogue. The pronunciation exercises use speech recognition.
Interface and functionality of the exercises
General structure and exercises menu
From the standard interface, the learner can always go the exercises using the two buttons: "Visual Exercises" and "Oral Exercises ". The interface is similar for both types of exercises:
A pop-up menu is connected to the button. This menu displays the available kinds of exercises.
A smaller version of the image of the lesson is moved up to the right over the MENU of available EXERCISES.
On the left appears a sentence referring to the current dialogue and inviting the learner to carry out the exercises.

The learner can navigate in the exercises menu with the following keys: "Next Exercise", "Previous Exercise", "Fast Forward", "Fast Rewind". He can thus carry out in sequence, the exercises of a category or go to the next/previous category.
The items of the menu may be in one of the following states:
Once the learner has selected a category of exercises, images referring to the exercises are displayed to the left of the menu. These images also describe the specific functions needed to carry out the exercise.
The exercises of pronunciation, instead, require speech recognition and diagnosis functions. To achieve them, the learner has at hand two buttons:
Theses exercises are divided into two main categories:
The pronunciation exercises are:
LISTEN / READ and REPEAT
QUESTION and ANSWER
The "Question and Answer" exercises ask the learner an audio or a text question in relation to a displayed image. First the learner push the button "Listen" to hear the question. Then he must push the button "Speak" and answers orally selecting one of the possible answer in the set of answers displayed as text.
A question will be formulated to the customer (through the audio or in text format) inherent to the image at video. To listen to the question, the customer will have to press the key "Listen". To continue, he’ll have to press the key pronouncing the answer that thinks correct and that he has to choose between a limited series of answers visualized at video in text format.
For example, the image shows a woman who is drinking tea. To the question "What is she drinking?", the system proposes four different answers: "A cup of tea", "a glass of wine", "a cup of coffee", "a glass of water".
In the "Building Sentence" exercises, there is a written question on the screen. The learner must compose a sentence using suggested text elements. The text elements to choose among are grouped together for each part of the sentence. The learner pushes the "Speak" button to answer to a written question and say the sentence in the voice recognition system.
FREE CHOICE
To do "Role Play" exercises, the learner plays the role of one of the figures appearing in the displayed image and says his parts of the dialogue. The learner may use two buttons: "Listen" or "Answer". The button "Listen" triggers a question to the student. The text of the question is not displayed. To hear again the question, the learner may push again the button "Listen". When the learner feels it is useful, he may record his answer by pushing the "Answer" button.
The answer must be chosen among a limited number of answers available in text format. After the first answer, the learner has to push again the key "Listen" and repeat the sequence until the dialogue is completed. A dialogue has at most three questions and three answers. At the end, the questions and the answers are supplied in text format to provide for some feedback.
Feedback for the pronunciation exercises
Feedback is provided for any exercise, it is easy to achieve since the type of errors are similar for each category of exercises. The difference in feedback lies in the category to which the exercises belongs, either FREE ANSWER EXERCISES or OBLIGED ANSWER EXERCISES. The feedback associated to FREE ANSWER EXERCISES is divided into two branches of preliminary feedback:
The feedback associated to OBLIGED ANSWER EXERCISES is divided into three branches of preliminary feedback:
GENERAL FEEDBACK is given on the screen by writing one of the following messages;
Activation of the diagnosis functions
For both groups of exercises (FREE ANSWER EXERCISES or OBLIGED ANSWER EXERCISES), the DIAGNOSIS FUNCTIONS are triggered only when the answer has been recorded correctly and there is only one possible answer. If the answer is wrong or the recogniser does not understand what was said, the DIAGNOSIS FUNCTIONS are not triggered.
When the diagnosis functions are triggered, the feedback given depends on the type of errors diagnosed: PHONETIC ERRORS (Phone Analysis) or ACCENT ERRORS (Stress Analysis). These two kinds of errors are always taken into account in the feedback given. The learner cannot choose to have information only about the phonetic errors or about the accent errors.
After a recording of the learner, if the sentence said is recognised, the application writes it in a text box.
If the answer is correct, the wrongly pronounced words and the wrongly accentuated phonemes are displayed in red and are mouse sensitive. A message is also displayed to encourage the learner to check his or her errors:
"I think you had trouble with the words in red; click on them for more information"
Clicking on the red word trigger a pop-up box where the wrong phonemes and words are highlighted in red.

Together with the DISPLAY of the error message, there is also a NOTE and, when available, an access to SPECIFIC FEEDBACK with STRESS ANALYSIS

When the learner click on a red word, a pop-up box is displayed with the erroneous word, highlighting the mistake. The syllable or the letter on which the learner has put the wrong accent is highlighted in red and the correctly accentuated syllable or the letter are highlighted in green.
Together with the DISPLAY of the error message, there is also a NOTE and, when available, an access to SPECIFIC REINFORCEMENT EXERCISES.
The notes associated to feedback are displayed in text format beside the pop-up box that points out the error.
NOTES related to PHONE ANALYSIS
With respect to the analysis of the phonemes, the notes contains:

NOTE related to STRESS ANALYSIS
With respect to the analysis of the accent, the note will indicate that "You put the stress on the wrong syllable; instead of minutes, you should say minutes".

The TRAINING suggested in feedback for both kinds of diagnosis provide access to a specific section based on a personalised interface, whose structure is the same for every chapter. From this interface, the learner can solely return to the calling page. Multimodal navigation is not planned: this page is only called from the feedback pages. When the learner clicks on the button "Close", he returns to the exercise carried out.
Training to improve the abilities related to the phonemes pronunciation.
In order to learn the correct phonemes pronunciation, the learner may use the section dealing with reinforce exercises. There are two possibilities: MINIMAL PAIRS exercises or "FIND A PHONE"-like exercises. Both are displayed on the interface.

MINIMAL PAIRS exercises ask for coupling terms with similar phonemes. If for example the learner pronounces the phoneme / iy / instead of / ih /, then the following list of phonemes is displayed
The first couple of the list is always the one to pronounce correctly, the others are generated starting from the pronunciation error.
The learner does not have control on the actions to do. He is asked to listen and repeat each couple using with same features than those of the exercise Listen/Read and Repeat. The feedback to these pronunciation exercises is only a text of the form
The FIND A PHONE-like exercises present a list of words including terms containing the phoneme badly pronounced. The learner, through check boxes, has to identify the words that contain the same phoneme. The feedback is given immediately through a button called "Verification". The bad answers are highlighted in red. The words containing the phoneme (selected or not by the learner) are highlighted in green. To cancel a choice, the learner has to click again on the checked box.

.
To correct the accent errors, exercises consist only of repeating the wrong word. The interface used has the same modalities of interaction as the Listen/Read and Repeat exercises. Beside the exercises, a visual feedback is given to indicate quality of pronunciation.