How can I run inference on multiple files using the pre trained model

I have been testing my data on the deepspeech pre trained model version 0.6.1 and I wanted to know how can i run the inference in parallel for about 1000 files. I am right now using the code mentioned in client.py( which uses ds.stt(audio) to generate output) but running inference in a loop takes a lot of time. Also using multiprocessing results in the error that swigpy objects cannot be pickled. I need to know how to speed up the inference. Any help would be great.

Please be more descriptive about your usecase. Search on the forum, this is something that has already been explained several times, you can run mass-evaluation using evaluate.py as well as evaluate_tflite.py depending on your requirements.

  from __future__ import absolute_import, division, print_function
  from deepspeech import Model
  import absl.app
  import argparse
  import numpy as np
  import wave
  import csv
  import os
  import sys
  from functools import partial
  from multiprocessing import JoinableQueue, Process, cpu_count, Manager
  from six.moves import zip, range

  model    = 'deepspeech-0.6.1-models/output_graph.pbmm'
  lm='deepspeech-0.6.1-models/lm.binary'
  trie='deepspeech-0.6.1-models/trie'

  def tflite_worker(model, lm,trie, queue_in, queue_out, gpu_mask):
      os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_mask)
      ds = Model(model,500)
      ds.enableDecoderWithLM( lm, trie, 1.5,
                                2.1)
      while True:
          try:
              msg = queue_in.get()
              filename = msg['filename']
              print(filename)
              fin = wave.open(filename, 'rb')
              audio = np.frombuffer(fin.readframes(fin.getnframes()), np.int16)
              fin.close()
              decoded = ds.stt(audio)
              print(decoded)
              queue_out.put({'wav': filename, 'prediction': decoded, 'ground_truth': msg['transcript']})
          except FileNotFoundError as ex:
              print('FileNotFoundError: ', ex)
          print(queue_out.qsize(), end='\r') # Update the current progress
          queue_in.task_done()

  manager = Manager()
  work_todo = JoinableQueue()   # this is where we are going to store input data
  work_done = manager.Queue()  # this where we are gonna push them out

  processes = []
  for i in range(4):
      worker_process = Process(target=tflite_worker, args=(model, lm,trie, work_todo, work_done, i), daemon=True, name='tflite_process_{}'.format(i))
      worker_process.start()        # Launch reader() as a separate python process
      processes.append(worker_process)

  print([x.name for x in processes])

  wavlist = []
  ground_truths = []
  predictions = []
  losses = []
  wav_filenames = []
  count=0
  for i in range(0,100):
    count+=1
    work_todo.put({'filename': audio_order[i], 'transcript': txt_truth[i]})
    wav_filenames.extend(audio_order[i])

  print('Totally %d wav entries found in csv\n' % count)
  work_todo.join()
  print('\nTotally %d wav file transcripted' % work_done.qsize())

  while not work_done.empty():
      msg = work_done.get()
      losses.append(0.0)
      ground_truths.append(msg['ground_truth'])
      predictions.append(msg['prediction'])
      wavlist.append(msg['wav'])

I have used this code for running the task in parallel where audio_order is the list of wav filenames and txt_truth is the corresponding text. But the code gets stuck during join method . I do not understand why this is happening

@AANCHAL_VARMA Please use proper code formatting …

It looks like a hacked version of evaluate_tflite.py. Do you understand how it works ? Why hacking it like that instead of using it properly ?

Please be more descriptive here, this code runs a lot of parallel processes with deepspeech module being used. It’s not as efficient as using a TensorFlow session, and it does require intensive computing power. “gets stuck during join” does feel like expected, and unless you provide more details on how long, on whether you see processes running, etc. we can’t help.

Hey I tried running the code as mentioned in evaluate_tflite.py, but it took around 2 hrs for about 2600 files of 5 seconds each. Is there any way I can speed this up. I am running on google colab notebook.

Also I wanted to know that if I start training on my own data using the existing checkpoint file, i read that i cannot change the alphabet.txt and i will have to alter my transcript file. Will I have to alter the file used for generating language model as well? Is there anyway I can predict the numbers and special characters like ‘.’, ‘-’ by training from existing checkpoint?