“Enhancing Speech to Speech Translation Through Exploitation of Bilingual Resources and Paralinguistic”

by Andreas Tsiartas

May 2014

This thesis focuses on developing a speech-to-speech (S2S) translation system that utilizes paralinguistic acoustic cues for achieving successful cross-lingual interaction. To enable that goal, research is needed at both the foundational speech and language processing, as well as in applying and validating the extracted rich information in translation. Techniques have been developed that enable more robust signal acquisition through robust voice activity detection (VAD) and cross-talk detection. This can enable hands free S2S communication. The benefits are shown on multiple datasets. To support rapid technology translation in new language pairs, I have developed novel techniques for extracting parallel audio and text from commonly available bilingual resources such as movies. Also, I have developed a method for aligning subtitles and show performance benefits for translation of spoken utterances by exploiting the timing information of the subtitles to extract high-quality bilingual pairs. Paralinguistic cues are a big part of spoken communication. To investigate the importance of such cues, I have developed a method to extract bilingual audio pairs from dubbed movies by exploiting the parallel nature of the audio signals and the show performance on English dubbed movies in French. Using these and acted data, I show through perceptual experiments that transfer of paralinguistic acoustic cues from a source language to a target language is correlated with the quality of the spoken translation for a case study of English-Spanish pair. In addition, a method to represent bilingual paralinguistic acoustic codes is presented

