# Measuring succes of Voice Activity Detection algorithms. HR0 and HR1

Posted on Fri 06 February 2015 in Notes

When measuring the effectiveness of a Voice Activity Detection algorithm (VAD) looking at 0-1 accuracy is rarely enough. We typically also look at Nonspeech Hit Rate (HR0) and Speech Hit Rate (HR1).

1. HR0 is computed as the ratio of the number of correctly detected nonspeech frames to the number of real nonspeech frames.
2. HR1 is computed as the ratio of the number of correctly detected speech frames to the number of real speech frames.

Park et al. 2014 

Another way to put it is _the percentage of nonspeech and speech frames that are correctly predicted. In Python, this can be calculated in the following way:

```import numpy as np

# Find nonspeech and speech hit rates:
index0 = np.where(y ==0)
index1 = np.where(y ==1)

hr0 = (y_hat[index0] == y[index0]).mean()
hr1 = (y_hat[index1] == y[index1]).mean()
```

First we create 2 indexes of y using numpy's `where()` function (see more). `index0` is a vector of all the positions of y that represents a silent frame in our data. Say `y = [0,0,0,1,1,0]`, then `index0 = [0,1,2,5]`, since `y = y = y = y = 0.`

this means that

```print y[index0]
# -> [0,0,0,0]
```

Which in and of itself is not interesting. However, we can use the same index to pull out all the predictions in ŷ and compare them to the ground-truth in y

```y_hat[index0] == y[index0]
# -> (True, True, False ... , dtype=bool)
```

This gives us a new array of the same dimensions with boolean `True` or `False` values. Each `True` represents a correct prediction and each `False` an incorrect. A neat python trick is that boolean values are treated as 0 and 1, so we can take the mean of this boolean result array to get the ratio between correct and incorrect prediction using the `.mean()` function.

 Park, Jinsoo, Wooil Kim, David K. Han, and Hanseok Ko. “Voice Activity Detection in Noisy Environments Based on Double-Combined Fourier Transform and Line Fitting.” The Scientific World Journal 2014 (August 6, 2014): e146040. doi:10.1155/2014/146040.