Wolfgang recalls that similar arguments have been made in Saarbrücken in the late 1980s (pre Verbmobil), albeit not in the realm of SDSs but in information presentation systems. The term used by Wolfgang Wahlster at that time was ‘anticipation feedback loops’.
The questions arises at which level(s) of SDSs tigher integration is wanted. Sharing is part of the PTT dialogue theory (‘Poesio Traum Theory’), but only on the level of semantics.
Dynamic syntax (developed by Ruth Kempson and colleagues at KCL) is a syntactic/semantic theory that uses the same formalism and mechanism for parsing and generation. It is claimed that it can account for the phenomena of surface level alignment and shared utterances.
Srini’s work on taking the level of expertise of the interlocutor (estimated from the his/her utterances) into account (ENLG, 2009); Hendrik’s work on an alignment-capable microplanner for NLG (ENLG, 2009)
Real world SDSs
Surface behaviour in commercial systems adheres to the phenomenon of lexical alignment by design, based on the inutition of the designers.
SDSs usually built upon rules: What is the system supposed to say, what might the user say?
Current systems: Natural language generation usually much more simplistic than parsing/understanding.
Simplistic ‘horizontal’ coupling does not necessarily improve system behaviour (example: meaningless backchannel behaviour)
System capabilities and expectations: don't produce smart utterances that you could not understand yourself.
Questions
How many SDSs actually have a full pipeline architecture? How many systems realise the tight integration at least partially?
Is such a tigher coupling really needed for SDS? Is it just interesting for researchers in the field of AI/cognitive modelling?
Wings of an airplane don't have feathers and so on – but they don't need to be designed in a way that other birds like them.
Is there a general conceptual framework (going beyond modules) for I/O coupling?
What are we gaining if we reduce the system’s capabilities to do I/O in parallel?
Misc
Predicting what your interlocutor says, using a distinct simulator and not just your own processes.
Conclusions
There is a general consensus that tighter I/O coupling is desireable for conversationally more competent SDSs. The open questions are to which extend a coupling is needed and how this coupling could be achieved architecturally and technically.
Low hanging fruits: Harness your input processors by looking at your output processors, you can at least do timing, lexical alignment
Natural language generation and parsing/understanding are distinct and difficult problems, and separate fields. There should be more interaction between these fields. Sidenote: Hamburg’s constraint dependency grammar parser (JWCDG) seems to use similar mechanisms that the SPUD microplanner (Stone et al., 2003, Computational Intelligence) uses. The shared method could provide an interface that might be worth exploring.