USC-SIPI REPORT #391

Technical Report USC-SIPI-391

“User Modeling for Human-Machine Spoken Interaction and Mediation Systems”

by JongHo Shin

May 2008

One of the most fundamental challenges in building speech-enabled systems is "knowing the users." This information about users is captured in what is usually called a "user model." This study investigates user models for speech-enabled systems, which include both human-machine spoken interaction and machine-mediated human-human interaction systems. Because of the intrinsic error-prone property of statistical processing of human speech technology, errors are inevitable during the interactions to/through the speech-enabled systems. In this regard, this dissertation studies four different user models under uncertain error conditions of spoken dialog systems and spoken mediation systems. The user models were driven based on the data of mixed-initiative spoken dialogs, and multimodal (speech and visual) interactions of a spoken mediation system. The user models of this dissertation aims to contribute to accelerate the optimization of dialog management of the speech-enabled systems.The addressed user models are about: (1) user behaviors under error conditions of a spoken dialog system; (2) multimodal user behaviors under uncertainty in two persons communication using a speech-to-speech translation system; (3) user behavioral changes over time in uncertain communication when using a multimodal interface of a speech-to-speech translation system; and (4) user level of tolerating errors implemented with a dynamic Bayesian network and possible speech Accommodation between two interlocutors. The model of dynamic Bayesian network was validated offline with the multimodal interaction data of a speech-to-speech translation system, and online with agent feedback used in a multimodal interface of a speech-to-speech translation system.

Technical Report USC-SIPI-391

To download the report in PDF format click here: USC-SIPI-391.pdf (1.2Mb)