I think its time to talk about AI generated sentences again

I don’t think it is a good idea for the following reasons:

  • For languages where there is enough training data (for LLMs):
    • there is enough text to select real sentences
    • a better approach to expanding the corpus is to work with specific domains
    • the sentences will need to be checked anyway
  • For languages where there isn’t enough training data (for LLMs):
    • the sentences have to be checked anyway, and mostly they are going to be rubbish (we tried recently with varieties of Nahuatl, it was a disaster), it will cause headaches for reviewers

If you wanted to use GPT to generate sentences and then run them through the normal review process, I don’t see a problem, but it seems to me that working on a specific task/application for larger languages (English, German, Esperanto) seems like it would be more productive.