Reference model: missing apostrophes affecting WER

Hey there,

I am trying to run the Common Voice through DeepSpeech in order to get a WER estimate for the test dataset on the reference model provided by Mozilla. I am currently getting a WER of ~34%.

python3 DeepSpeech.py --n_hidden 2048 --initialize_from_frozen_model models/output_graph.pb --notrain --test --display_step=1 --epoch 1 --test_files=/data/CV/cv-valid-test.csv --dev_files=/data/CV/cv-valid-dev.csv --train_files=/data/CV/cv-valid-train.csv

I Initializing from frozen model: models/output_graph.pb 
I Test of Epoch 0 - WER: 0.344074, loss: 30.61848204136957, mean edit distance: 0.173512 
I --------------------------------------------------------------------------------
I WER: 0.090909, loss: 0.131743, mean edit distance: 0.019231 
I  - src: "it wasn't clear to him how to spend his morning time" 
I  - res: "it wasnt clear to him how to spend his morning time"
I --------------------------------------------------------------------------------
I WER: 0.125000, loss: 0.132348, mean edit distance: 0.026316 
I  - src: "you can't talk to her like that though"
I  - res: "you cant talk to her like that though" 
I --------------------------------------------------------------------------------
I WER: 0.166667, loss: 0.093715, mean edit distance: 0.080000
I  - src: "i don't care what you say"
I  - res: "i dont care what you say "
I --------------------------------------------------------------------------------
I WER: 0.200000, loss: 0.018371, mean edit distance: 0.034483
I  - src: "he doesn't have anything else"
I  - res: "he doesnt have anything else"
I --------------------------------------------------------------------------------

However, I’ve noticed that most of the WER penalties are occurring from missing apostrophes in conjunction words (like don't or can't). I have yet to find an example of a correctly predicted apostrophe, despite the ' character being present in data/alphabet.txt.

Are there any special precautions / flags I should add to DeepSpeech.py in order to correctly predict or ignore apostrophes in order to decrease my mean WER?

Newer language model should be able to deal with that, check https://github.com/mozilla/DeepSpeech/issues/955

Are the newer language models notably improved perchance? The ones I tested from release 0.1.1 seemed to get most words vaguely right, with Hello World becoming helo rld for example.

Another thing that I noticed was simple capitalization cases (notably I and the beginning of a recording) were left uncapitalized. Been debating writing something to clean that up…