[IPOL announce] new article: A Presentation and Short Discussion of rVAD-fast, a Fast Voice Activity Detector
announcements about the IPOL journal
announce at list.ipol.im
Tue Oct 11 13:38:41 CEST 2022
A new article is available in IPOL: https://www.ipol.im/pub/art/2022/427/
Sam Perochon,
A Presentation and Short Discussion of rVAD-fast, a Fast Voice Activity
Detector,
Image Processing On Line, 12 (2022), pp. 404–419.
https://doi.org/10.5201/ipol.2022.427
Abstract
Voice activity detection (VAD) usually refers to the detection of human
voices in acoustic signals and is often used as a pre-processing step in
numerous audio signal processing tasks. The unsupervised method proposed
here was originally developed by Zheng-Hua Tan, Achintya kr. Sarkar and
Najim Dehak [Computer Speech & Language, 2020] and consists of a robust
segment-based approach. The voice activity detection stage follows two
denoising steps. The first one detects high energy segments using a
posteriori SNR weighted energy difference, and the second enhances the
speech using the MSNE-mod approach. Use cases or downstream tasks
include intrusion detection, speech-to-text, speaker diarization, or
emotion estimation.
More information about the announce
mailing list