With the Korean sentences, someone added many quotes from the Korean holy bible - the Korean Revised Version NKRV, and it was accepted.
The problem is the Holy bible itself is in the public domain, but the Korean translation is not. Although it is not well known, it is a copyrighted document by KOREAN BIBLE SOCIETY.
How they could be cleared from the sentence database?
//in Korean
성경은 기본적으로 퍼블릭 도메인이 맞지만, 한국어 번역은 저작권이 있습니다.
문장DB에서 제거되어야 하는 상황입니다.
Dear @gofeel, thank you for reporting this. Can you let us know what the sentences are (or at least some of them) so we can identify the contributor and remove them and any related sentences.
There are many different versions of Korean bible translations, and AFAIK CommonVoice got KRV(from 1952/1961) not NKRV(from 1998).
For example. common_voice_ko_36880175.mp3’s source sentence is “여호와 하나님이 에덴동산에서 그 사람을 내어 보내어”.(Genesis 3:23) which is identical with KRV version. NKRV version is “여호와 하나님이 에덴 동산에서 그를 내보내어”. Very similar, but slightly different.
CommonVoice team could check old SentenceCollector database which stated about its source. About a year ago I had similar concerns with gofeel. But then another participant let me know that it was not from NKRV but from KRV which is clearly known as copyright expired.
Useful links from KOREAN BIBLE SOCIETY(Solely writtein in Korean though):
States that NKRV(개역개정) is under copyright protection but KRV(개역한글) is not, as of 2011.12.31.
(Just in case, I re-write my opinion in Korean.)
성경의 한국어 번역본이 여럿이 있습니다만, 모질라 커먼보이스에 사용된 것은 개역한글(1952년 / 1961년)이지 개역개정(1998년)은 아닌 것으로 압니다.
예를 들어, common_voice_ko_36880175.mp3 파일의 원문은 창세기 3장 23절의 “여호와 하나님이 에덴동산에서 그 사람을 내어 보내어”인데, 이것은 개역한글 때의 번역입니다. 개역개정에서는 "여호와 하나님이 에덴 동산에서 그를 내보내어"로 번역되어 있습니다. 굉장히 유사하지만, 살짝 다릅니다.
CommonVoice 팀에서는 과거 SentenceCollector 의 DB에서 이 문장의 출처에 대해 뭐라고 명시했는지 찾아보실 수 있을 겁니다. 1년쯤 전에 저도 gofeel 님과 비슷한 문제의식을 가졌었는데, CommonVoice 프로젝트 참여자 중 다른 분께서 이것이 저작권이 만료된 KRV 번역에서 가져온 것임을 알려 주셨던 적이 있습니다. NKRV는 저작권이 만료되지 않았지만요.
처음 sentense collector에 성경 문구들이 올라왔을때 출처가 명확하지 않아서 여러 문장들을 reject했었습니다. 만약에 모든 문장이 KRV라고 하면 다행입니다.
I remember I rejected several sentences when it was on the sentence collector because the copyright information was not clearly written. If the sentences are all from KRV, it’s okay.
Some things I also wrote on the Telegram chatroom for Korean:
As far as I thought, corporate works (team projects) had special terms from the date of creation and that “author’s life + # years” was for solo projects
I am not a lawyer, and so I do not know where the line lies between solo and corporate.
A Disney movie with 200 animators?
Corporate.
Edgar Allan Poe’s “The Raven”?
Solo.
Something translated by a group of 4 people? … I don’t know
A source of confusion is tied to the re-printing and re-issuing of various editions.
Unlike the case with source code, no one makes comments next to every line in a book.
Therefore, there is no easy way to know if a given page in the 2nd Edition of a book is wholly new content for the 2nd Edition or is content from the 1st edition.
Best thing I can recommend is searching randomly chosen sentences to see if those randomly chosen sentences appeared that way in a much older edition of the book. Statistically speaking, if one were to pull 30 random Korean Bible sentences from Common Voice and all of those sentences had appeared already in a very old edition of the Korean Bible, then we are in the clear. Otherwise, we would have to identify individual sentences as being a problem or not.