Hi everyone,
Reading the Fastspeech2 paper i saw that they calculate the pitch moments to evaluate their model. I was wondering if someone knows a python package that does this for me or a link to a detailed explanation on how these are computed so that i can implement it myself?
Thank you
Donβt know if this is exactly what you are looking for, but ParselMouth/Praat has some pitch contour extraction functionality