from pysndfx import AudioEffectsChain
def audiofile_to_input_vector(audio_filename, numcep, numcontext):
r"""
Given a WAV audio file at ``audio_filename``, calculates ``numcep`` MFCC features
at every 0.01s time step with a window length of 0.025s. Appends ``numcontext``
context frames to the left and right of each time step, and returns this data
in a numpy array.
"""
# Load wav files
fs, audio = wav.read(audio_filename)
aug_fx = AudioEffectsChain()
if (random.random()<0.9):
aug_fx.tempo( random.uniform(0.8,1.2) )
aug_out = aug_fx(audio)
return audioToInputVector(aug_out , fs, numcep, numcontext)
The accuracy improved more than 5%.
Unfortunately, the training is like 10x slower. I noticed that during training, sometimes the GPU stays idle while the cpu is almost at 100%. My guess is that this wrapper is a bottleneck (as it only generates a command that is processed later), and it does not make use of parallel threads of deepspeech.
call as it’s processing the audio each time it grabs a batch of audio. (It is however, using multiple threads.)
Balancing the GPU and CPU is a bit of an art. We took sometime to make sure our GPU wasn’t starved waiting for the CPU.
One idea would be to increase the threads per queue[1]
class ModelFeeder(object):
'''
Feeds data into a model.
Feeding is parallelized by independent units called tower feeders (usually one per GPU).
Each tower feeder provides data from three runtime switchable sources (train, dev, test).
These sources are to be provided by three DataSet instances whos references are kept.
Creates, owns and delegates to tower_feeder_count internal tower feeder objects.
'''
def __init__(self,
train_set,
dev_set,
test_set,
numcep,
numcontext,
alphabet,
tower_feeder_count=-1,
threads_per_queue=2):
self.train = train_set
self.dev = dev_set
...
So the CPU can keep up with the GPU.
However, there’s also the question as to how many threads your CPU supports. If that’s already maxed out, increasing threads_per_queue will not help.
I’ve tried with threads_per_queue=8, and it improved the speed performance for like 20~25%. Nonetheless, it is still much slower than running without augmentation.
I don’t think that the augmentations I am using are computationally expensive, it must be something related to how the lib is working.
I’ll keep trying to make it faster, but even slow like this is now, the accuracy improving is good