Measuring succes of Voice Activity Detection algorithms. HR0 and HR1
Posted on Fri 06 February 2015 in Notes
When measuring the effectiveness of a Voice Activity Detection algorithm (VAD) looking at 0-1 accuracy is rarely enough. We typically also look at Nonspeech Hit Rate (HR0) and Speech Hit Rate (HR1).
- HR0 is computed as the ratio of the number of correctly detected nonspeech frames to the number of real nonspeech frames.
- HR1 is computed as the ratio of the number of correctly detected speech frames to the number of real speech frames.
Park et al. 2014 [1]
Another way to put it is _the percentage of nonspeech and speech frames that are correctly predicted. In Python, this can be calculated in the following way:
import numpy as np
import our-vad-library as VAD
X = VAD.load_data()
y = VAD.load_targets()
y_hat = VAD.predict(X)
# Find nonspeech and speech hit rates:
index0 = np.where(y ==0)
index1 = np.where(y ==1)
hr0 = (y_hat[index0] == y[index0]).mean()
hr1 = (y_hat[index1] == y[index1]).mean()
First we create 2 indexes of y using numpy's where()
function (see more). index0
is a vector of all the positions of y that represents a silent frame in our data. Say y = [0,0,0,1,1,0]
, then index0 = [0,1,2,5]
, since y[0] = y[1] = y[2] = y[5] = 0.
this means that
print y[index0]
# -> [0,0,0,0]
Which in and of itself is not interesting. However, we can use the same index to pull out all the predictions in ŷ and compare them to the ground-truth in y
y_hat[index0] == y[index0]
# -> (True, True, False ... , dtype=bool)
This gives us a new array of the same dimensions with boolean True
or False
values. Each True
represents a correct prediction and each False
an incorrect. A neat python trick is that boolean values are treated as 0 and 1, so we can take the mean of this boolean result array to get the ratio between correct and incorrect prediction using the .mean()
function.
[1] Park, Jinsoo, Wooil Kim, David K. Han, and Hanseok Ko. “Voice Activity Detection in Noisy Environments Based on Double-Combined Fourier Transform and Line Fitting.” The Scientific World Journal 2014 (August 6, 2014): e146040. doi:10.1155/2014/146040.