Knowledge enhancement for speech emotion recognition via multi-level acoustic feature

Huan ZhaoCollege of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, People's Republic of ChinaView further author information

Nianxin HuangCollege of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, People's Republic of ChinaView further author information

Haijiao ChenCollege of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, People's Republic of ChinaCorrespondence[email protected]
View further author information

Abstract

Speech emotion recognition (SER) has become an increasingly attractive machine learning task for domain applications. It aims to improve the discriminative capacity of speech emotion utilising a certain type of features (e.g. MFCC, Spectrograms, Wav2vec2) or multi-type combination features. However, the potential of acoustic-related deep features is frequently overlooked in existing approaches that rely solely on a single type of feature or employ a basic combination of multiple feature types. To address this challenge, a multi-level acoustic feature cross-fusion approach is proposed, aiming to compensate for missing information between various features. It helps to enhance the SER performance by integrating different types of knowledge through the cross-fusion mechanism. Moreover, multi-task learning is utilised to share useful information through gender recognition, which can also obtain multiple common representations in a fine-grained space. Experimental results show that the fusion approach can capture the inner connections between multilevel acoustic features to refine the knowledge. The SOTA results were obtained under the same experimental conditions.

Keywords:

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the National Natural Science Foundation of China under Grant 62076092.

Knowledge enhancement for speech emotion recognition via multi-level acoustic feature

Information for

Open access

Opportunities

Help and information

Knowledge enhancement for speech emotion recognition via multi-level acoustic feature

Abstract

Disclosure statement

Additional information

Funding

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature