Generally this is good set of guidelines. Nice work @Michael_Maggs!
However, I note that we should be clear to separate the guidelines for validating text from those for validating speech, the Problems with the written text section is more geared towards text.
One note on the “Ignore minor problems of punctuation if they don’t affect the recording” part…
The example ‘“the giant dinosaurs of the Triassic,’ is given as one in which the punctuation does not effect the reading. This is actually not quite the case.
Think of sentences which include commas. Generally when a comma is used correctly, it indicates a pause. A speech-to-text engine trained on text which uses commas correctly and which is read with the associated pause would learn to insert commas at the appropriate pauses.
However, if sentences similar to the above ‘“the giant dinosaurs of the Triassic,’ were used to train the system, it would never learn to insert commas in the correct place as there would be no correlation between commas and pauses. Sometimes commas would occur at the ends of sentences, sometimes at the start, sometimes randomly within sentences.
So it’s better to reject such sentences as they will cause the engine to have a invalid knowledge of commas.