Might some “quick and dirty” automated screening of sentences be possible by use of the kind of language models used by DeepSpeech on the sentences submitted?
KenLM etc work by giving you the probability of a sentence and if the model is big enough to be representative then it should get that probability reasonably accurately and in turn might you find a threshold below which things are either excluded or flagged to the user.
Typically they would give sentences with spelling mistakes much lower scores. There is some risk of legitimate sentences being hit, but perhaps at the margins that’s not such a big deal (since this is to produce a dataset for pronunciation not to represent every phrase possible)
These models tend to be quick to run (for the probabilities, not to create), so am hoping it could be tacked on without killing the user experience. And whilst I agree something sophisticated like @jf99 envisages would be great, with the resources / time available, “good enough” may be more practical
Finally, I see some LMs also let you see the probabilities of individual words within a sentence, such as here: https://colinmorris.github.io/lm-sentences/#/, so perhaps a heuristic could be used to filter (and this may provide a cheap way to flag to users the problem word(s) in a simple manner)