We just released Deep Speech 0.4.1!
In the Known Issues
tab of DeepSpeech 0.4.0
release you mentioned
Incorrect model was uploaded to release which will be fixed in 0.4.1
ā¦
is this (0.4.1
) release has the correct model. will it produce good results than deepspeech 0.3.0
ā¦
and will it solve the issues discussed here
This is the link to 0.4.1 and as stated in the 0.4.0 release the correct model is uploaded in the 0.4.1 release.
As to 0.4.1. vs 0.3.0 results, see the release notes for both for a comparison.
Hi @kdavis, I just did a WER test for the Windows client, hereās the result:
Estable RAM usage of 1,7GB.
The test took about 3h on a virtual Intel Xeon Platinum 8168 @ 2.7GHz vcores 16
WER 8,87% with LM enabled.
You can see the tool that I wrote here
I noticed considerable amount of errors related to ā for example with āiāmā and āi amā, this should happen? Yes the WER increases but at the end is the same meaning.
At the moment I canāt build for CUDA, hopefully soon I got access to a CUDA device
Cool! Nice having a second pair of eyes on the WER.
We had a slightly lower value 8.26%, but basically it seems about the same.
As to the problems with apostrophes, yes weāve noted the same. Iād guess itās a hard problem to solve as when spoken quickly it can sometimes be unclear if a person said āi amā or āiāmā.
If you have any ideas on how one could solve it, weāre āall ears.ā
Well no at the moment
What about this one āperformādā? There are a few with 'd
Iām still collecting Spanish from Librivox so, Iām not experienced with the creation of the LM, if I think I got something that can improve the apostrophe issue of course Iāll share.
āperformādā, interesting. I wasnāt aware that this was a word until just looking it up performād. We build our language model from SLR11; I wonder if there is some prevalence of āperformādā there?
I just grepāed the SLR11 text and there are 175 lines that contain āperformādā. So thatās the source of the āperformādā problem.
Hereās a list with the 'd issue
millionād
emergād
poisonād
impressād
piercād
removād
rebukād
steelād
I better share the result, Iām not native so I may be missing couple more.
I run the WER test again and noticed that few of them also are appearing in the LibriSpeech clean test corpus
Hereās the wer result https://pastebin.com/1Wrp3pVH
I donāt know if boy's
, infant's
and thee's
are correct.
When I validate I pay special attention to small things like whether the person said āIāmā or āI amā but I have no idea if other validators do. I think if youāre clicking through quickly you may miss stuff like that. So there may perhaps be incorrectly transcribed clips in the dataset contributing to this.
I haven looked for all the strings you mention. But Iāve found examples of all the ones Iāve searched for in SLR11. For exampleā¦
beyond the green within its western close a little vine hung leafy arbor rose where the pale lustre of the moony flood dimmād the vermillionād woodbineās scarlet bud and glancing through the foliage fluttering round in tiny circles gemmād the freckled ground
ā¦
amidst them next a light of so clear amplitude emergād that winterās month were but a single day were such a crystal in the cancerās sign
ā¦
a cleric lately poisonād his own mother and being brought before the courts of the church they but degraded him
ā¦
I seems like this is a common construct in older forms of English and SLR11 contains many texts that are in public domain, as they are old enough to pass in to public domain, and thus reflect this old construct.
It seems like we could get a pretty good boost by simply using newer texts in place of SLR11 . However, then we have the legal question of how to obtain modern texts that are still open.
If we correct the existing text? I think is not too hard since they are easy to spot. The question is, is there any legal issue editing the existing text?
Editing shouldnāt be problematic
Well I said it will be easy to spot, but not easy to correct them hahaha, is worse than I thought.
I can take it, but will need the help of native speakers, for example āworseānā I changed it to āworse thanā.
Changing to ābetter thanā here makes no sense.
A GIRL LIKE THAT OUGHT TO DO SOMETHIN BETTERāN THAN STAY HERE IN SOUTH HARNISS AND KEEP STORE
I found Spanish and French sentences, should I remove them or is there any reason to mix languages? Sometimes is mixed, for example āhe said hola amigosā.
I have an idea, which Iāve not had time to test yet.
Iāve made a number of language models with different parameters. In particular in some of them Iāve limited the vocabulary to the N most frequent words, with various Nās, of SLR11.
As the various āmillionādā, āemergādāā¦ are not common theyāll likely not be in the N most frequent words and the language model will not exhibit this strange behavior.
But I need to test this idea with some WER runs.
If you share them I can run the WER test
Hereās the librispeech-lm-norm.txt.gz cleaned, Iāve removed a lot of Spanish and French. One more to test
What about using Googleās BERT to brute force sentences ? Even can try Spacy to generate sentences that makes sense using the existing ones.
Unfortunately itās 16 TB of language models; sharing is a bit hard.