The USC Andrew and Erna Viterbi School of Engineering USC Signal and Image Processing Institute USC Ming Hsieh Department of Electrical and Computer Engineering University of Southern California

Technical Report USC-SIPI-428

“Modeling Expert Assessment of Empathy through Multimodal Signal Cues”

by Bo Xiao

May 2016

Empathy is an important psychological process facilitating human interaction through emotional simulation, perspective taking, and emotion regulation mechanisms. Higher empathy level of the care-provider relates to better outcome of interactions in scenarios such as psychotherapy and medical care. However, traditional manual assessment of empathy is not scalable in practice, leaving the quality of services largely unknown. Computational modeling of empathy is a novel approach providing useful information to aid human decision making.

Empathy is a latent process that is difficult to measure directly. Human expert assesses empathy level through the observation of human interactive behaviors. Taking addiction counseling as an example scenario, this dissertation analyzes therapist empathy computationally based on the observed behavioral signals. Specifically, this dissertation proposes a fully automatic system to predict expert assessment of empathy based on modeling of therapist language cues. This system integrates Voice Activity Detection, Diarization, Automatic Speech Recognition, and speaker role matching modules to obtain machine generated transcripts of therapist language. It then employs Natural Language Processing methods including Maximum Entropy model, Maximum Likelihood model, and decoding lattice rescoring to estimate empathy. It finally predicts expert assessment by integrating the output of these methods.

This dissertation also proposes modeling of empathy through prosodic, speech rate entrainment, and turn-taking cues. These cues are correlated with expert assessment of empathy, including interaction session level joint distribution of a group of prosodic features; behavioral entrainment cues based on averaged turn-by-turn similarity of speech rates; and turn taking cues based on therapist and client speech ratio.

Experiments of empathy assessment prediction are conducted on audio recordings of real addiction counseling sessions in a particular treatment type named Motivational Interviewing. Results of the experiments demonstrate that the proposed automatic system and the multimodal cues can predict expert assessments of empathy in a machine-learning framework. Fusion of these cues improves the prediction accuracy. These findings suggest the feasibility of quantifying empathy via automated behavioral analysis, and may offer new insights in understanding empathy in human interactions.

To download the report in PDF format click here: USC-SIPI-428.pdf (0.9Mb)