Add Esperanto


(Tirifto) #1

Original post


I’d like to propose for Esperanto to be added among the next supported languages. It is the most widely used (neutral) international language, which makes it appropriate for inclusion in this open and international project.

Esperanto has a very regular and phonetic pronunciation, which might facilitate speech recognition. Its speakers also come from all over the world, so the many possible accents to compare would be very diverse, with none of them being native.

The language has been around for over a century now, so there should be some content in the public domain to take sentences from. The age of those shouldn’t be a problem, as the basic grammar and vocabulary are defined in the the “Fundamento”, and supposed to remain unchanged (so that the language won’t fall apart). The general usage has respected this, and so the language hasn’t changed in any substantial way.

(Note: “Fundamento” itself contains a set of exemplary sentences in Esperanto, which should all be in the public domain.)

Can I do something to help make this happen? Should I collect valid texts in the public domain and possibly restructure them in some way?

Text sources

Fundamenta Krestomatio (Contains example phrases, stories, dialogue, and poetry.)

(Michael Henretty) #2

Great idea! Our goal is to open up to multi-language in early 2018, so watch this space!

Yes! This would be extremely helpful if you can find public domain text, preferably conversational (e.g. movie scripts are better than poetry), in Esperanto. That way we can move faster!

(Tirifto) #3

Alright! Should I modify the first post whenever I find new sources, to add the links to them? (Or: Is that the best way I could submit text, and if yes, can I edit the post unlimited times?)

I suppose I could also write up some sentences or conversations myself and release them as public domain. Would such contribution be welcome and appropriate?

(Michael Henretty) #4

Sure, you can post links here! You can modify the original post, or send a new one.

And yes, personal sentences are definitely welcome!

But, to make a good speech database, you need many thousand sentences (10K is ok, 100K is better, 1 Million is idea). So writing that all yourself would take some time, and it’s better to find a large source to pull from.

(Pablo Busto) #5

Here you can find more sentences:

(Pablo Busto) #6

Those numbers, are for the numbers of collaborations orienta for the numbers of sentences?

(Michael Henretty) #7

if i understand your question correctly, the numbers i mentioned are indeed for the number of sentences. for number of people reading those sentences, the more the merrier! for instance, for english we have over 20K speakers.

(Tirifto) #8

Can the works included in “Tekstaro” be accessed whole, without having to search for keywords? Also, could you please point out a statement or indication that all of them are in public domain? (I couldn’t navigate to either in the website, so I haven’t added it to the list yet.)

(Pablo Busto) #9

It seems not directly through the web, but if this project goes ahead, probably will not be difficult having access to them.

(Daniele Scasciafratte) #10

I missed that post!

(Nicola Ruggiero) #11

I would like to help adding Esperanto!

(Daniele Scasciafratte) #12

Pli itala esperantistoj?
There are many video on youtube in pure esperanto, maybe we can organize a working group to find all this resources? I can remember on duolingo and on reddit there was list of a lot of stuff(also podcast).

(Lior Samuel) #13

there are also Esperanto ebooks on the gutenberg project

Now that this data-set exists, Is there a pre-trained model of it on DeepSpeech? It seems as if only English is available pre-trained.