Measuring succes of Voice Activity Detection algorithms. HR0 and HR1

Posted on Fri 06 February 2015 in Notes

When measuring the effectiveness of a Voice Activity Detection algorithm (VAD) looking at 0-1 accuracy is rarely enough. We typically also look at Nonspeech Hit Rate (HR0) and Speech Hit Rate (HR1).

  1. HR0 is computed as the ratio of the number of correctly detected nonspeech frames to the number of real nonspeech frames.
  2. HR1 is computed as the ratio of the number of correctly detected speech frames to the number of real speech frames.

Park et al. 2014 [1]

Another way to put it is _the percentage of nonspeech and speech frames that are correctly predicted. In Python, this can be calculated in the following way:

import numpy as np
import our-vad-library as VAD

X = VAD.load_data()
y = VAD.load_targets()

y_hat = VAD.predict(X)

# Find nonspeech and speech hit rates:
index0 = np.where(y ==0)
index1 = np.where(y ==1)

hr0 = (y_hat[index0] == y[index0]).mean()
hr1 = (y_hat[index1] == y[index1]).mean()

First we create 2 indexes of y using numpy's where() function (see more). index0 is a vector of all the positions of y that represents a silent frame in our data. Say y = [0,0,0,1,1,0], then index0 = [0,1,2,5], since y[0] = y[1] = y[2] = y[5] = 0.

this means that

print y[index0] 
# -> [0,0,0,0]

Which in and of itself is not interesting. However, we can use the same index to pull out all the predictions in ŷ and compare them to the ground-truth in y

y_hat[index0] == y[index0]
# -> (True, True, False ... , dtype=bool)

This gives us a new array of the same dimensions with boolean True or False values. Each True represents a correct prediction and each False an incorrect. A neat python trick is that boolean values are treated as 0 and 1, so we can take the mean of this boolean result array to get the ratio between correct and incorrect prediction using the .mean() function.


[1] Park, Jinsoo, Wooil Kim, David K. Han, and Hanseok Ko. “Voice Activity Detection in Noisy Environments Based on Double-Combined Fourier Transform and Line Fitting.” The Scientific World Journal 2014 (August 6, 2014): e146040. doi:10.1155/2014/146040.