Session: InproTK

Participants: Okko, Nina, Kilian, Miroslav, Timo, some of the time: Jana, David; anybody else?

Notes taken by Timo

  • InproTK's data handling:
    • information storage in the IU network: same-level-links (SLL), grounded-in-links (GRIN);
    • distributed data storage, mostly normalized data (example: getStartTime())
    • single-memory architecture
    • bridging processors would be possible (though limit access to SLL/GRIN information)
    • =IU=s may be added, revoked, and finally comitted;
    • the IU network is not necessarily bound to 1-best-processing (but see below)
    • unified handling of input data (that mostly relates to the past) and output data (that relates to the future or the past)
  • InproTK's processing methods:
    • conceptually: IU processors with left-buffer and right-buffer: update messages about changes to the left buffer, then notify next processor (via the right-buffer) about its own changes
    • in fact this is currently limited to pure left-to-right processing, top-down/right-to-left processing is not yet implemented
    • (some ad-hoc top-down stuff using =Signal=s)
    • processing modules currently do not support n-best-processing (at least not for input processing)
    • threading issues: InproTK has improved a lot but you may still face threading issues; it's possible to have all modules in separate threads (though this is currently not done/not needed for most processors)
    • potential processing with "active" IU=s (example: a =WordIU may actively determine prosodic stress by querying syllables→phonemes→pitch-track)
    • UpdateListener=s may register with =IU=s to be notified about certain changes that happen to an =IU (e.g. gradual change from UPCOMING via ONGOING to COMPLETED in synthesis)
  • InproTK's incremental speech recognition module
    • based on Sphinx, 1-best, no confidence measure but stability measure, highly effective filters for reducing incremental jitter of hypotheses
    • final result is always as good as non-incremental result
  • InproTK's incremental speech synthesis module
    • works in real-time
    • enables previously unseen system behaviours
  • taking decisions
    • is a serious problem (better wait or better decide?)
    • Okko's work on dealing with revokes in dialogue managing (public information can't be revoked but has to be undone instead)
    • alternative: top-down commit (which InproTK's ASR doesn't yet support) to avoid revokes.
  • how to build your incremental module
    • Okko proposes to sit down and to think about the possible edits that may happen;
    • do not forget revoke messages while thinking about edits (and their consequences)
    • if your code is a "module", then integration will be easy; if you have a partial implementation of a "system", then integrating with InproTK is harder (especially input, easier for incremental output)
    • think about your data types (you will likely want to sub-class IU for your data
  • discussion of Nina's use-case (integrate an incremental NLG component):
    • InproTK works well for English and German
    • any task needs its proper statistical language model; build one and use the -lm switch
    • a relatively dumb incremental DM (e.g. keyword/keyconcept spotting) is easy to achieve
    • integrating incremental NLG has been achived by Hendrik&Timo (SigDial 2012)
  • looked at many demos
  • Open-source available at http://inprotk.sourceforge.net, more information on the project at http://inpro.tk.
  • alternatives to InproTK:

Edit this page -- TimoBaumann -- 06 Oct 2012
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback