Standard

Continuous affect recognition with weakly supervised learning. / Pei, Ercheng; Jiang, Dongmei; Alioscha-Perez, Mitchel; Sahli, Hichem.

In: Multimedia Tools and Applications, Vol. 78, No. 14, 30.07.2019, p. 19387-19412.

Research output: Contribution to journalArticle

Harvard

APA

Vancouver

Author

Pei, Ercheng ; Jiang, Dongmei ; Alioscha-Perez, Mitchel ; Sahli, Hichem. / Continuous affect recognition with weakly supervised learning. In: Multimedia Tools and Applications. 2019 ; Vol. 78, No. 14. pp. 19387-19412.

BibTeX

@article{5836d0581adb495e931bcf06d4ac18c5,
title = "Continuous affect recognition with weakly supervised learning",
abstract = "Recognizing a person’s affective state from audio-visual signals is an essential capability for intelligent interaction. Insufficient training data and the unreliable labels of affective dimensions (e.g., valence and arousal) are two major challenges in continuous affect recognition. In this paper, we propose a weakly supervised learning approach based on hybrid deep neural network and bidirectional long short-term memory recurrent neural network (DNN-BLSTM). It firstly maps the audio/visual features into a more discriminative space via the powerful modelling capacities of DNN, then models the temporal dynamics of affect via BLSTM. To reduce the negative impact of the unreliable labels, we utilize a temporal label (TL) along with a robust loss function (RL) for incorporating weak supervision into the learning process of the DNN-BLSTM model. Therefore, the proposed method not only has a simpler structure than the deep BLSTM model in He et al. (24) which requires more training data, but also is robust to noisy and unreliable labels. Single modal and multimodal affect recognition experiments have been carried out on the RECOLA dataset. Single modal recognition results show that the proposed method with TL and RL obtains remarkable improvements on both arousal and valence in terms of concordance correlation coefficient (CCC), while multimodal recognition results show that with less feature streams, our proposed approach obtains better or comparable results with the state-of-the-art methods.",
keywords = "Continuous affect recognition, DNN-BLSTM, Weak supervision",
author = "Ercheng Pei and Dongmei Jiang and Mitchel Alioscha-Perez and Hichem Sahli",
year = "2019",
month = "7",
day = "30",
doi = "https://doi.org/10.1007/s11042-019-7313-1",
language = "English",
volume = "78",
pages = "19387--19412",
journal = "Multimedia Tools & Applications",
issn = "1380-7501",
publisher = "Springer Netherlands",
number = "14",

}

RIS

TY - JOUR

T1 - Continuous affect recognition with weakly supervised learning

AU - Pei, Ercheng

AU - Jiang, Dongmei

AU - Alioscha-Perez, Mitchel

AU - Sahli, Hichem

PY - 2019/7/30

Y1 - 2019/7/30

N2 - Recognizing a person’s affective state from audio-visual signals is an essential capability for intelligent interaction. Insufficient training data and the unreliable labels of affective dimensions (e.g., valence and arousal) are two major challenges in continuous affect recognition. In this paper, we propose a weakly supervised learning approach based on hybrid deep neural network and bidirectional long short-term memory recurrent neural network (DNN-BLSTM). It firstly maps the audio/visual features into a more discriminative space via the powerful modelling capacities of DNN, then models the temporal dynamics of affect via BLSTM. To reduce the negative impact of the unreliable labels, we utilize a temporal label (TL) along with a robust loss function (RL) for incorporating weak supervision into the learning process of the DNN-BLSTM model. Therefore, the proposed method not only has a simpler structure than the deep BLSTM model in He et al. (24) which requires more training data, but also is robust to noisy and unreliable labels. Single modal and multimodal affect recognition experiments have been carried out on the RECOLA dataset. Single modal recognition results show that the proposed method with TL and RL obtains remarkable improvements on both arousal and valence in terms of concordance correlation coefficient (CCC), while multimodal recognition results show that with less feature streams, our proposed approach obtains better or comparable results with the state-of-the-art methods.

AB - Recognizing a person’s affective state from audio-visual signals is an essential capability for intelligent interaction. Insufficient training data and the unreliable labels of affective dimensions (e.g., valence and arousal) are two major challenges in continuous affect recognition. In this paper, we propose a weakly supervised learning approach based on hybrid deep neural network and bidirectional long short-term memory recurrent neural network (DNN-BLSTM). It firstly maps the audio/visual features into a more discriminative space via the powerful modelling capacities of DNN, then models the temporal dynamics of affect via BLSTM. To reduce the negative impact of the unreliable labels, we utilize a temporal label (TL) along with a robust loss function (RL) for incorporating weak supervision into the learning process of the DNN-BLSTM model. Therefore, the proposed method not only has a simpler structure than the deep BLSTM model in He et al. (24) which requires more training data, but also is robust to noisy and unreliable labels. Single modal and multimodal affect recognition experiments have been carried out on the RECOLA dataset. Single modal recognition results show that the proposed method with TL and RL obtains remarkable improvements on both arousal and valence in terms of concordance correlation coefficient (CCC), while multimodal recognition results show that with less feature streams, our proposed approach obtains better or comparable results with the state-of-the-art methods.

KW - Continuous affect recognition

KW - DNN-BLSTM

KW - Weak supervision

UR - http://www.scopus.com/inward/record.url?scp=85061404479&partnerID=8YFLogxK

U2 - https://doi.org/10.1007/s11042-019-7313-1

DO - https://doi.org/10.1007/s11042-019-7313-1

M3 - Article

VL - 78

SP - 19387

EP - 19412

JO - Multimedia Tools & Applications

JF - Multimedia Tools & Applications

SN - 1380-7501

IS - 14

ER -

ID: 45840188