I recently started using Deep Speech for audio transcriptions. I have doubts to clarify
Here is are the specification
- Training or Inference - Both
- DeepSpeech branch/version - 0.7.4
- OS Platform and Distribution (e.g., Linux Ubuntu 18.04) - Linux - Ubuntu 18.04
- Python version - 3.6.9
- TensorFlow version - tensorflow-gpu==1.15.2
1.What are the specifications to keep in mind regarding the audio file to be used for the inference?
Because I have different format file (.mp3,.wav, etc…)
2. Is there any restriction of the length of the audio( in mins )?
I started training the custom model . I want to understand what are the basic specifications for training model?
- I understand it should be .wav file and mono channel . What is the maximum wav file size ?
- Can we provide custom scorer file while training the custom model?
Thanks in advance!