I’ve adjusted the code a bit to be able to specify multicharacter, space delimited phonemes that map to pre-specified dvectors, in the hope that I might be able to better incorporate external information about phonemes (mouth positions etc.) and better performance (without success so far). Does anyone have utility for this? If so, I think there might need to be some reorganization of the way input ultimately gets to the network to accommodate a ‘straight in’ path that avoids cleaning and the potential for phoneme generation etc.
I use a pickle file that holds the symbol embeddings makes it all that is needed in ‘characters’ in the config JSON and it replaces the nn.Embedding network in a model with nn.Linear and the appropriate dims.
It seems to work ok but bypassing the other config checks etc. appropriately makes things a little inelegant. I haven’t yet designed the right way to do this correctly’ but happy to do so if anyone else wants the capability.