CommonVoice Hebrew

Hello.
Who manages CommonVoice in Hebrew?

There is a lot of material from the Bible in the data set. This material is with cantillation notes (“Taamey hamikra”), which are non-standard characters, and should not be in the data set.

Also there are a lot of punctuation characters (dots and lines). Do those match the rules?

שלום.
מי מנהל את CommonVoice בעברית?

יש במערך הנתונים הרבה חומר מהתנ"ך. החומר הזה הוא עם טעמי המקרא, שאלו תווים לא סטנדרטיים, ולא אמורים להיות במערך הנתונים.

כמו כן יש הרבה חומר מנוקד. האם ניקוד תואם את הכללים?

Hey @Musicode, FYI: Hebrew rules are here:

Here is the related issue which resulted in the above rules:

Hope these help…

2 Likes

Hi @Musicode

Thanks @Bulent_Daldalan.

Kindly note that community members are responsible for quality assurance and validation. To exclude sentences of this nature from the dataset, community members can downvote them as they encounter them during validation.

Thanks