What if people are using text-to-speech to record?

I’ve added this to the draft reviewing guidelines, here: