Abstract:
Enhancement of speech means modification to the
speech which is degraded by noise. Speech enhancement leads
to improvement in the intelligibility of speech to
human listeners. Deep learning techniques have drawn
tremendous attention for speech enhancement in recent years
which require clean speech along with noisy speech for
training purpose. However, availability of clean speech
signal in naturalistic scenarios is challenging. To ameliorate
it, this study proposes a deep neural network- based on speech
enhancement approach without the requirement of clean
speech to train the model called self-supervised learning. In
the proposed framework, two CNN-based speech enhancement
models have been deployed for two noisy conditions (babble
noise and machinery noise). This work has been
accomplished on two different datasets: IEEE speech corpus
distorted with real-time noise and recorded speech signals
in naturalistic environment. Experimental result demonstrates
that the proposed framework achieved significant
improvement in both subjective and objective measures.