# Measuring succes of Voice Activity Detection algorithms. HR0 and HR1

Posted on Fri 06 February 2015 in Notes

When measuring the effectiveness of a Voice Activity Detection algorithm (VAD) looking at 0-1 accuracy is rarely enough. We typically also look at *Nonspeech Hit Rate* (HR0) and *Speech Hit Rate* (HR1).

**HR0**is computed as the ratio of the number of correctly detected nonspeech frames to the number of real nonspeech frames.**HR1**is computed as the ratio of the number of correctly detected speech frames to the number of real speech frames.

*Park et al. 2014 [1]*

Another way to put it is _the percentage of nonspeech and speech frames that are correctly predicted. In Python, this can be calculated in the following way:

```
import numpy as np
import our-vad-library as VAD
X = VAD.load_data()
y = VAD.load_targets()
y_hat = VAD.predict(X)
# Find nonspeech and speech hit rates:
index0 = np.where(y ==0)
index1 = np.where(y ==1)
hr0 = (y_hat[index0] == y[index0]).mean()
hr1 = (y_hat[index1] == y[index1]).mean()
```

First we create 2 indexes of y using numpy's `where()`

function (see more). `index0`

is a vector of all the positions of y that represents a silent frame in our data. Say `y = [0,0,0,1,1,0]`

, then `index0 = [0,1,2,5]`

, since `y[0] = y[1] = y[2] = y[5] = 0.`

this means that

```
print y[index0]
# -> [0,0,0,0]
```

Which in and of itself is not interesting. However, we can use the same index to pull out all the predictions in ŷ and compare them to the ground-truth in y

```
y_hat[index0] == y[index0]
# -> (True, True, False ... , dtype=bool)
```

This gives us a new array of the same dimensions with boolean `True`

or `False`

values. Each `True`

represents a correct prediction and each `False`

an incorrect. A neat python trick is that boolean values are treated as 0 and 1, so we can take the mean of this boolean result array to get the ratio between correct and incorrect prediction using the `.mean()`

function.

[1] Park, Jinsoo, Wooil Kim, David K. Han, and Hanseok Ko. “Voice Activity Detection in Noisy Environments Based on Double-Combined Fourier Transform and Line Fitting.” *The Scientific World Journal* 2014 (August 6, 2014): e146040. doi:10.1155/2014/146040.