Invalid String

Matthew_Tan · July 21, 2020, 7:47pm

Hello all,

I’m new to the forum and DeepSpeech in general, and I was trying to train my own model. I’ve loaded up the .csv files and am running the DeepSpeech.py script as described in the documentation, loading up the 3 csv files. However, I’m getting an “invalid string” error. I’m sure this has to do with the .csv file formatting, but I’m not sure what the issue is. I’d attach the csv file but I can’t as I’m a new user. Any thoughts or advice?

Thanks in advance!
Matthew

Matthew_Tan · July 21, 2020, 7:50pm

Here is the text contents of the csv file

wav_filename,wav_filesize,transcript
/home/matthew/github/test/Data Ingest/video/audio/403260.wav,74924,United 800
/home/matthew/github/test/Data Ingest/video/audio/404281.wav,498902,The smoke seems to be dissipating but just have the trucks out if you can
/home/matthew/github/test/Data Ingest/video/audio/410293.wav,241712,United 800 trucks are already in position

I’ve cleared out punctuation (can one have punctuation in the transcript?) , but I’m not sure why I still get an Invalid String error.

othiele · July 21, 2020, 8:00pm

Please format code/txt you insert, but could it be the whitspace in “Data Ingest”, those are never a good idea If you need more support, we need much more info:

othiele · July 21, 2020, 8:03pm

You can have everything in it that’s also in the alphabet.txt

Matthew_Tan · July 21, 2020, 8:13pm

Ok, thanks for your quick response!

For the background info:
I am trying to train a model
Latest DeepSpeech version
Pop OS (basically ubuntu)
Python 3.6.11
TF 1.15 (I think whatever version is required in the setup docs)

Looks like there is no punctuation in the alphabet.txt, so the transcripts should be formatted fine.

I removed the space in the Data Ingest folder and updated the .csv files, but it is still giving me an invalid string error.

Matthew_Tan · July 21, 2020, 8:27pm

Ah! Could it be that the data has capital letters? I don’t see capital letters or numbers in the alphabet.txt (the one in /data is the one that is used when running DeepSpeech.py right?)

othiele · July 21, 2020, 8:32pm

Yep, could well be and take a branch like 0.7.4, master can be incompatible at times.

Matthew_Tan · July 21, 2020, 9:03pm

Ok, I took out all the capitals and the network now says that the Invalid String is “3,” so it WAS the capitals and numbers! What a silly mistake.

Thanks for your help!
Matthew