My hardware doesn’t support avx2 (this machine is on ivy bridge), so i spent the day building native-client from source, including tensorflow.
The process was a bit arduous, but I ended up getting everything working. When i run the binary with the pretrained model, I can now get some output without anything throwing an error.
The problem is that the output is laughably bad. Its not even like… close. I think i’m doing something wrong.
For example: one input i’ve been using is a .wav (single channel, 16bit) of me saying “testing testing, 123”. The output from the pretrained model is just ‘oo’
Another file, much clearer, in the same format: "hi, im amy, one of the available high quality text to speech voices, select download now to install my voice’.
A smattering of outputs i got from deepspeech on that file when messing around with .wav encoding parameters:
“har omm one omhumho wommen”
“har o awi won o veembo hoa homten ta”
“a am won a vemhomable han wontontun”
I think i’m encoding the files incorrectly, or something. I’m not sure what else to do to get some clean output.