Analysis of statistical estimators and neural network approaches for speech enhancement

Kandagatla, Ravi Kumar; Naidu, V. Jayachandra; Reddy, P.S. Sreenivasa; M., Gayathri; A., Jahnavi; K., Rajeswari

VOLUME 17 (Supplement)

SciEnggJ%202024%20Special%20Issue%201 7 Pasham%20et%20al

SciEnggJ 17 (Supplement) 017-027
available online: February 13, 2024
DOI: https://doi.org/10.54645/202417SupXBB-31

*Corresponding author
Email Address: 2k6ravi@gmail.com
Date received: May 6, 2023
Date revised: December 21, 2023
Date accepted: January 11, 2024

Read Full PDF

ARTICLE

Analysis of statistical estimators and neural network approaches for speech enhancement

Ravi Kumar Kandagatla*¹, V. Jayachandra Naidu², P.S. Sreenivasa Reddy³, Gayathri M.¹, Jahnavi A.¹, and Rajeswari K.¹

¹Department of Electronics and Communication Engineering, Lakireddy Bali
     Reddy College of Engineering (Autonomous), Mylavaram-521230,
     Andhra Pradesh, India
²Department of Electronics and Communication Engineering,
     Sri Venkateswara College of Engineering & Technology (Autonomous),
     Chittoor, India
³Department of Electronics and Communication Engineering,
     Nalla Narasimha Reddy Education Society’s group of Institutions,
     Telangana, India

KEYWORDS: Speech Enhancement, Neural Networks, Statistical Estimators

Speech communicated is adversely affected by environmental noise. It is important to process the speech and reduce noise for better understanding. Speech enhancement or noise reduction is useful to provide comfort for human or machine listening. Traditional algorithms provide better noise reduction and better-quality speech. Due to the non-stationary nature of noise and the quasi-stationary nature of speech, the traditional methods are proven inadequate in achieving high-quality speech. Later statistical estimators based on Gaussian, and super-Gaussian Probability Density Function (PDF) assumption further improved the enhancement performance. But still, non-stationary noise nature introduces artifacts in processed signal and results in decreased performance. It is observed that neural network approaches and the factorization approach provide better performance even under non-stationary noises by proper training and large database. Different features result in variations in output performance under unseen noise and speaker conditions. It is important to understand the importance and advantages of traditional methods, statistical estimators, and neural network approaches performances. To select the suitable method for a required application, it is essential to consider the trade-off between quality and distortion. In this work, the importance of speech enhancement methods is discussed. Performance measures used for understanding the speech enhancement like Signal to Noise Ratio (SNR), Segmental SNR, Log-Likelihood Ratio (LLR), Weighted Spectral Slope (WSS), Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), Signal to Distortion Ratio (SDR) and Mean Opinion Score (MOS), are given. Highlights of important results are discussed for analyzing better speech enhancement methods for the required application. In this work, performance is compared using objective and subjective performance measures. Simulation results show superior performance when neural network is employed in statistical estimators.