I want to add an option to train Chassidic yiddish specificaly, as it’s the Dialect spoken by the majority of yiddish speakers nowadays.
How can i add it in as an option.
Hey @Abey_Baruchov, welcome…
It is fairly simple to add language variants to the system through a PR in GitHub. Here are some examples I posted:
- Circassian languages (see comment for real PR merged)
- Megrelian
- Laz
On the other hand, working on variants and/or accents might need some linguistic work, as the borders can be blurry, and often in debate. E.g. our work with language experts for Circassian took a month or so to decide, even then we missed one case wrt. Common Voice flow-related restrictions (see 1 and 2). It is usually not easy for diaspora languages because of mixing with the prominent language in the country.
I don’t know anything about Yiddish (I’m not a linguist, and I only read Wikipedia article on Yiddish dialects), but a bit of research and working with linguists to get a more complete list might be a good idea.
Shalom Abey!
Ikh hob a bisl Yiddish gelernt For what I know, the Chassidic Yiddish differs from community to community, as each one was founded by speakers of different subdialects of the Eastern dialect (Litvaker Yiddish, Polish Yiddish, Ukranian Yiddish). So (again, as far as I know, but I can ask my ex-colleagues who remained linguists) we can’t really speak of a Chassidic variant, it’s more of a mix, and not a homogenious entity.
I think (cautiously) that adding Polish and Ukranian as variants would be of more use here; but again, if you really want to add some subdivision to Yiddish, i would be happy to ask my former colleagues who studied Yiddish dialectology!