Pre-specified symbol embeddings

maneeshkyadav · August 11, 2020, 1:30am

I’ve adjusted the code a bit to be able to specify multicharacter, space delimited phonemes that map to pre-specified dvectors, in the hope that I might be able to better incorporate external information about phonemes (mouth positions etc.) and better performance (without success so far). Does anyone have utility for this? If so, I think there might need to be some reorganization of the way input ultimately gets to the network to accommodate a ‘straight in’ path that avoids cleaning and the potential for phoneme generation etc.

I use a pickle file that holds the symbol embeddings makes it all that is needed in ‘characters’ in the config JSON and it replaces the nn.Embedding network in a model with nn.Linear and the appropriate dims.

It seems to work ok but bypassing the other config checks etc. appropriately makes things a little inelegant. I haven’t yet designed the right way to do this correctly’ but happy to do so if anyone else wants the capability.

nmstoker · August 11, 2020, 12:27pm

It would be useful to be able to insert additional steps around this whole area (both in training and in inference). In the past I’d manually make a few code edits to allow me to intercept the phoneme inputs and substitute alternatives when the original text was a heteronym. I haven’t got the code to hand but could point out where it is applied this evening after work, although I expect you’re making changes in the same locations.

I also experimented with the server (for inference) so that it would check the input text for the presence of phonemes and if found it would skip sending it to be turned into phonemes. This meant you could normally use regular text but also make it capable of handling edge case pronunciation (although my version only worked on a whole sentence level for simplicity). It’s quite useful for testing.

Making the processing around those steps a more configurable pipeline would be ideal, but it comes at the cost of extra complexity.

Any strong views either way on this?

Topic		Replies	Views
Do we need to change symbols when using phonemic text as input? TTS (Text-to-Speech)	5	1149	December 2, 2020
Training a TTS model with a language that doesn't have a supported phoneme_language TTS (Text-to-Speech)	3	1654	March 30, 2020
Multispeaker development progress TTS (Text-to-Speech)	30	2901	May 31, 2020
Generate Phonemes From Text TTS (Text-to-Speech)	1	910	April 8, 2021
Front-end / Phoneme discussions TTS (Text-to-Speech)	9	2285	June 10, 2020

Pre-specified symbol embeddings

Related topics