MULTIMODAL DEPRESSION DETECTION FROM FACIAL LANDMARK FEATURES USING LSTM MODEL

D.SYLVIA SHARON; J ANGEL OZNI; S. SOMALAKSHMI

doi:10.17148/IJARCCE.2023.125123

← Back to VOLUME 12, ISSUE 5, MAY 2023

MULTIMODAL DEPRESSION DETECTION FROM FACIAL LANDMARK FEATURES USING LSTM MODEL

D.SYLVIA SHARON, J ANGEL OZNI , S. SOMALAKSHMI

Downloads: Download PDF|DOI: 10.17148/IJARCCE.2023.125123

👁 37 views📥 0 downloads

Abstract: This paper proposes massive and growing burden imposed on modern society by depression has motivated investigations into early detection through automated, scalable, and non-invasive methods, including those based on speech. However, speech-based methods that capture articulatory information effectively across different recording devices and in naturalistic environments are still needed. This article presents a novel multi-level attention-based network for multi-modal depression prediction that fuses features from audio, video, and text modalities while learning the intra and inter modality relevance. Multi-level attention reinforces overall learning by selecting the most influential features within each modality for decision-making. We perform exhaustive experimentation to create different regression models for audio, video, and text modalities. Evaluations of both landmark duration features and landmark n-gram features on the DAIC-WOZ and SH2 datasets show that they are highly effective, either alone or fused, relative to existing approaches.

Keywords: Depression classification, landmark n-grams, speech articulation, smartphone speech, naturalistic environments

How to Cite:

[1] D.SYLVIA SHARON, J ANGEL OZNI , S. SOMALAKSHMI, “MULTIMODAL DEPRESSION DETECTION FROM FACIAL LANDMARK FEATURES USING LSTM MODEL,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2023.125123

This work is licensed under a Creative Commons Attribution 4.0 International License.