Speech Emotion Mapping Using Deep Learning
Author : G Anitha, Aiswarya Lakshmi MK and Geethika P
Abstract :
Speech Emotion Recognition (SER) is crucial for improving human-machine interactions, enabling machines to understand and react to human emotions. This study delves into deep learning-driven SER approaches, employing sophisticated feature extraction methods such as Mel Frequency Cepstral Coefficients (MFCC), Mel Spectrogram, and Chroma Features. A comprehensive machine learning framework is established using classifiers like Support Vector Machines (SVM), Random Forest (RF), Multi-Layer Perceptron (MLP), k-Nearest Neighbors (KNN), and Naïve Bayes (NB). The Toronto Emotional Speech Set (TESS) dataset is utilized to train and validate these models, ensuring a broad spectrum of emotional variations. The research findings reveal that the proposed model effectively identifies emotions, including happiness, sadness, anger, neutrality, and fear, showcasing its potential in applications like AI-driven virtual assistants and mental health assessment tools. The core functionality involves real-time voice recording, where audio is captured and processed for feature extraction.
Comparative analysis highlights the advantages and limitations of each model, showcasing their performance in terms of accuracy. Furthermore, the system is deployed using a Flask-based web application for real-time emotion detection. The system is implemented in a Flask-based web environment, allowing real-time emotion prediction from voice inputs. The system is designed with a dark-themed user interface, featuring navigation options like Home, About, Analyze, and Realtime Prediction to enhance usability. Once the voice is recorded, the extracted features are passed through a pre-trained emotion classification model, deployed in the Flask environment. The backend processes the audio data, applies the trained model, and returns the detected emotion to the web interface in real time. The paper aims to provide a robust and scalable solution for emotion detection with applications in mental health monitoring, customer service, and AI-driven personal assistants. By leveraging real-time speech processing, efficient machine learning algorithms, and a user-friendly web interface, this system contributes to advancements in speech-based affective computing and human-computer interaction.
Keywords :
Speech Emotion Recognition, Machine Learning, SVM, RF, MLP, KNN, NB, MFCC, Mel Spectrogram, Flask, Human-Computer Interaction.