Problem: In natural language generation or dialog research, there is often not a single acceptable system output and it is not obvious to the system developer what kind of behavior the system should model.
Discussion: Is this problem specific to dialog research? Are there any NLP domains in which human developers know with certainty what the right output is? Many problems are inherently ambiguous even if gold standard annotations exist (e.g. opinion annotation)
Solutions: Infinitely much data from which the system can learn? Crowd-sourcing annotations? More computing power? Principal component analysis?
Generalizing and operating in a different domain
Problem: It is not obvious how to develop a system that can work in different domains
Discussion: Caroline Sporleder has been looking into the domain adaptation problem
Solutions: Can a dialog system learn to generalize by asking the user explicitly what they want to do (e.g. restaurant vs. ice-cream parlor)? Some related projects may be Tom Mitchell's NELL (Never-Ending Language Learning) or CYC
Finding a user group for testing that is representative of your actual target group
Problem: The people you can recruit for lab or internet-based testing often represent quite different user models than the ones your system will be working with
Discussion: It can be hard to find the right user group. How to generalize from the results you get with a particular user group?
Related problem: How to design experiments that control for hard-to-predict external factors?
Approaching other people that work on the same problems as you from different perspectives
Problem: Different disciplines use different tools/terminologies/models. It can be hard to overcome these barriers in communication.
Discussion: How to apply methodologies to a different area of research? Verena Rieser has had such an experience.
Solution: Can people be motivated to come out of their comfort zone and try something different from what they know?