Looking for YouTube channels for various languages

Hi, a friend is using Common Voice’s labeled (audio + text) data with nonlabeled (just audio) one from YouTube to train STT for various languages. They are looking for youtube channels with lots of long videos, in these languages. If you know any, please send a link. Thanks.

Albanian
Azerbaijani
Belarusian
Bosnian
Bulgarian
Croatian
Czech
Greek
Hungarian
Kazakh
Kyrgyz
Macedonian
Moldova
Polish
Romanian
Serbian
Slovak
Slovenian
Tajik
Turkish
Ukrainian
Uzbek

I can’t help with that, but note that you have Moldova, the language Moldovan is the same as Romanian, so you can just use Romanian channels for that. In addition if you’re only using unlabelled data, then you should be able to mix Bosnian, Croatian and Serbian for a more robust model.

1 Like

I think the news channels on Youtube for each country can be useful for that, most prominent media in Turkey has these.
It might depend on the domain you are working on of course…

1 Like

Also, Bosnian, Serbian, and Croatian are the same language. So-called “Standard Serbo-Croatian” was based on a Bosnian prestige dialect/accent. Modern versions are diverging due to the war, but they’re all the same language, except for the fact that Bosnians and Croatians use Latin characters with diacritics and Serbians use Cyrllic characters. The spoken bits, pronunciation, and words are very similar.