Hello.
It’s time for another short update.
We’re currently preparing a new training run and figured out some “issues” with phoneme handling with mixed english/german wording.
This warning occurs quite often:
[WARNING] fount 1 utterances containing language switches on lines 1
[WARNING] extra phones may appear in the "de" phoneset
[WARNING] language switch flags have been kept (applying "keep-flags" policy)
We analyzed where these warnings come from and found out that our dataset (metadata.csv) contains several (408) phrases with non-native-german words, being common in german every day language.
Some examples:
- server
- opensource
- song
- chat
- team
- computer
- party
- cool
- …
Just a few phoneme samples of default config (keep-flags):
- Auf der Couch könnte sie es sich gemütlich machen.
- aʊf dɛɾ (en)kaʊtʃ(de) kœntə ziː ɛs zɪç ɡəmyːtlɪç maxən
- Wie kann man den Song so verschandeln?
- viː kan man deːn (en)sɒŋ(de) zoː fɛɾʃandəln
- Nicht alle Teenager sind so.
- nɪçt alə (en)tiːneɪdʒə(de) zɪnt zoː
- Währenddessen spricht sie mit ihrem Computer.
- vɛːrəndɛsən ʃpɾɪçt ziː mɪt iːrəm (en)kəmpjuːtə(de)
Currently we’re in discussion if we should run training with default option “–language-switch keep-flags” (with these warning to be produced) or if we should run training with disabled phoneme usage in config file.
Wishing you all a nice weekend