Sentence Extractor - Current Status and Workflow Summary

I have added two manual triggers to the GitHub Actions.

Run Blocklist Generation

Until now you had to run a full extraction locally to generate the blocklist. Now this can be done through a comment on any issue. If you already have a PR open, use that. If not, create your own issue, so we don’t spam unrelated issues.

To trigger the job, add the following line in a new comment:

/action blocklist [language code] [max occurances of words]

For example: /action blocklist en 80

The job will then post a link to the GitHub Action run where you will find the resulting files (artifacts).

You can see an example here: https://github.com/Common-Voice/cv-sentence-extractor/issues/108#issuecomment-653887129

Trigger a full extraction

Additionally to the full extraction that can run when a PR gets merged, the following comment format will trigger a full extraction as well. Usually the sample extraction when creating and pushing new commits to a PR should be enough, however if you need more sentences to verify your ruels than what the sample extraction provides, you can use this method.

/action extract [language code]

Example: /action extract en

1 Like