It’s relatively silent here and on the chat/mailinglist, so I just wanted to check up the life sign of intellego. Are there news? plans? some activity I missed?
It’s relatively silent here and on the chat/mailinglist, so I just wanted to check up the life sign of intellego. Are there news? plans? some activity I missed?
Last I heard it was dead
It’s not quite dead. Our “fearless leader” is on hiatus. He mentioned that someone was interested in taking on more leadership, but we didn’t get a chance to get more into it before he took his time away.
Thanks for the update, good to know.
When I replied to this, I also took the time to reply to another thread
about Intellego on another forum. It looks like we’ll have a meeting in May
to figure out where things are and where they’re going. I’ll share those
details with you when they’re confirmed.
So, are there some news? …I couldn’t attend but saw it was a private meeting between you and Jeff.
Jeff is going to write up a bit of a status update, which is going to include specifics on what the project actually needs at this point. One thing is that we need someone who can actually get Moses set up. Jeff will provide the details, but if I’m summarizing correctly there are some issues with advanced configuration to get it working that Jeff and Axel couldn’t figure out.
I think having this summary of what types of contributions the project is currently in need of to move forward will really help us get traction again. It’ll certainly make it easier for me to try to “recruit” people since I’ll know what I’m looking for.
Thanks for the update Kensie. I’m looking forward to see it.
The one thing I’m curious about is where does Jeff plan to install moses?
I mean, you ideally need a decent server somewhere. Just to give you a rough idea, here are two quotes from the Moses website:
As an indication, a Europarl trained model, using 2000 sentences for tuning, takes 1-2 days to tune using 15 CPUs. 10-15 iterations are typical.
The single-most important thing you need to run Moses fast is MEMORY. Lots of MEMORY. (For example, the Edinburgh group have servers with 144GB of RAM).
So, of course, I or Jeff could try to install a little Moses instance on our small 2Gb Laptop to train toy examples during the night. It is possible. …but well …it’ll hog our laptop day and night, and the result will be a toy translation system of very poor quality.
By the way, I’d also like to point out that you’d need such a server for every language pair.
This is related to:
…I think I could install moses on mine …training a decent one is a whole other matter.
Jeff isn’t planning on installing moses anywhere, nor do I.
I tried a while back, and failed. There’s no reason to think about server resourcing if we fail at test installs locally.
We need folks that have a plan and the technical capabilities to execute it. Discussing server resources is still just a distraction.
well, I have installed moses locally. Suprisingly, it went smoothly, without trouble. The KenLM required some tweaking in order to work but for Moses, everything appears fine. But that’s probably a distraction as well.
On a larger scale, I totally agree with you that the bottleneck is certainly going to be the “know how” and the dedication required to deal with such engines. …installing Moses is only scratching the tip of the iceberg. Training them well is much more time and know-how intensive.
To summarize it, IMHO, what’s required would be:
The question is: how do you intend to obtain that?
While 2 could be easely provided by Mozilla, it’s probably peanuts for them, 1 is definitely more difficult.
Please don’t take me wrong but I think you underestimate the amount of work and know-how required to set up such engines effectively. I’m skeptical whether it would be possible for some average volunteers to set up engines providing decent translations, if at all. Installing the required tools is just one of many steps. That more than a year has passed for intellego and that it’s at the step “fails to install” kind of gives a hint, right?
Actually, I’m kind of a retired MT researcher, but it’s stilll an interesting topic for me and would like to contribute to intellego. I mean, building “Mozilla Translate” sounds great.
However, currently, I have the feeling there are several show-stoppers:
It’s like everyone is waiting for the system to build by itself, or that a few volunteers will come along some day, make a few clicks and voila. Don’t you think it requires a bit more than that? …or it’ll slowly drift in the abandonned projects area as time goes by, like so many other projects out there.
That said, there are a few things I think I could provide:
First and foremost, let’s move past the idea that we’re creating an MT engine (or installing one, or setting one up, or configuring one) on Mozilla servers and calling that Intellego. That is 100% outside the scope of Intellego and something we’re entirely uninterested in. We know we don’t have the expertise there and know that Intellego can’t win in that direction.
If you’re interested in participating in Intellego, I encourage you to concentrate on what Intellego is supposed to be: a central, web-based MT marketplace for a multitude of MT engines that are live on the Web, specifically targeting free and open source engines (see https://wiki.mozilla.org/Intellego#Intellego_Platform). That web platform and it’s accompanying APIs can be developed locally until we’ve set up a central code repo that you can push to. If you’re here to set up MT engines, I think your skills would be better suited contributing to Moses, Apertium, or other MT engine dev projects.
This might seem contradictory, considering the Q1 goals listed in https://wiki.mozilla.org/Intellego/Goals_Milestones . The Q1 goals were intended to discover where our talent lies, not as a fixture to Intellego as a platform. The goals and timeline now need to switch gears to refocus on Mozilla and the community’s strengths and what will contribute to the core focus of the Intellego project.
Yes, Intellego has been on hiatus and I’ve realized that my own personal hiatus has created a blocker to Intellego’s progression. The call held a week and a half ago with Kensie was helpful to determine the direction to go while eliminating blockers. I don’t have much to announce besides that at this time, but I’m hoping to see a revival during the second half of 2015 and will add more details to that between now and July.
Thanks for joining the discussion. I think your’re one of the key actors here. Thanks also for clarifying the goals, this was quite troubling for quite some time due to contradictory statements here, on the wiki, and and in other threads like:
I think it would indeed be very helpful to clearly state what intellego is and is not meant to be.
I get it from your response that it’s intended as a “proxy”. This leads of course to the following topic:
Basically: “Who the heck will be providing the translations? …and why?”
I mean, it costs manpower and infrastructure. What would be their incentive to give it away for free?
…honestly, I’m still a bit skeptikal that anyone would provide it for free …but who knowns, I’m curious to see proven otherwise.
In any case, before building a “marketplace”, API, web services and whatever, wouldn’t it make more sense to first find at least a single partner providing a translation system?
I don’t mean to sink the ship or blame anyone. It’s just that I feel like there are serious flaws that should be addressed. It’s also a question of priority for Mozilla. I think that some backing, it doesn’t have to be much, could have a great impact. Whether it’s offering “bounties” for certain tasks or as an incentive for providers. But I guess I’m knocking at the wrong door here. It’s also still a bit unclear to me what exactly “a central, web-based MT marketplace for a multitude of MT engines” is meant to be. …btw, “marketplace” may be poorly chosen as “marketplace” usually implies buying/selling while here it’s more about being a proxy, right?
And being a proxy/platform/whatever, does it mean anyone could consume their translation services freely? Then, how can they survive and pay the rent? Sorry that I talk about money all the time, but I still think it’s a non-negligible aspect of reality.
El 5/19/15 a las 3:08 PM, Arnaud Dagnelies escibió:
Thanks for joining the discussion. I think your’re one of the key
actors here. Thanks also for clarifying the goals, this was quite
troubling for quite some time due to contradictory statements here, on
the wiki, and and in other threads like:
What are intellego’s goals?
As I described here: https://wiki.mozilla.org/Intellego https://wiki.mozilla.org/Talk:Intellego ...what intellego is meant to be seems rather unclear to me. It says a "platform" but its hard to be more vague. Is it meant to be: ...self-hosted translation engines? ...a proxy where users can select among various translation providers? ...various tools, standarts, widgets related to translation? ...helping research? Are the goals everything? or nothing? or not yet precisely defined? wouldn't it…
I think it would indeed be very helpful to clearly state what
intellego is and is not meant to be.
I get it from your response that it’s intended as a “proxy”. This
leads of course to the following topic:
Intellego Platform: who will be the providers?
Hi, Is this the direction intellego platform wants to go? At least that's the impression I got from the various wiki, chat discussions, etc. If I understood correctly, the aim of the intellego "platform" is not to host/run engines directly. It is rather an API / website where the actual translation task is handed out to some "provider" responsible of the translation. Now, that's all good and nice. Seems like a gateway to all translation services ...however, before making such a gateway,…
Basically: /“Who the heck will be providing the translations? …and
//I believe we’ve had this conversation before
Existing free and open source MT engines. Many make their services
available for free (as in cost) because they believe in free
(uninhibited) accessibility. They don’t reach a wide enough audience
because of large players like Google & Microsoft, making their own work
harder to find. With Mozilla, they can reach a wider audience, not only
increasing their user bases but also displaying a real demand for open
source MT solutions, which they can take into their fundraising
campaigns. We’ve participated in collaboration with other open source
projects before where our involvement has helped them to secure funding.
I believe this can follow that same path.
I mean, it costs manpower and infrastructure. What would be their
incentive to give it away for free?
…honestly, I’m still a bit skeptikal that anyone would provide it
for free …but who knowns, I’m curious to see proven otherwise.
In any case, before building a “marketplace”, API, web services and
whatever, wouldn’t it make more sense to /first find at least a single
partner/ providing a translation system?
We have had discussions with a partner who was enthusiastic about this
project (Apertium). They already have a web-facing platform, as well as
a Python-based API, which they offer for free.
I don’t mean to sink the ship or blame anyone. It’s just that I feel
like there are serious flaws. It’s also a question of priority for
Mozilla. I think that with some backing, it doesn’t have to be much,
could have a great impact. Whether it’s offering “bounties” for
certain tasks or as an incentive for providers. But I guess I’m
knocking at the wrong door here. It’s also still a bit unclear to me
what exactly “/a central, web-based MT marketplace for a multitude of
MT engines/” is meant to be. …btw, “marketplace” may be poorly
chosen as “marketplace” usually implies buying/selling while here it’s
more about being a proxy, right?
And being a proxy/platform/whatever, does it mean /anyone/ could
consume their translation services? Then, how can they then survive
and pay the rent? Sorry if I talk about money all the time, but I
still think it’s a non-negligible aspect of reality.
It is, certainly, but it’s not my primary preoccupation. If they’re
already accessible and interested in Intellego, they must have some form
of funding and scaling to meet additional demand. I feel that proving
the concept is a higher priority right now. If we can’t prove the
concept, we’re dead in the water anyway.
Oh, that looks very interesting. I wasn’t aware of those. Could you please share the links?
Heck, you should even post them on the wiki, people would be glad to access free translation services/APIs!
El 5/19/15 a las 11:57 PM, Arnaud Dagnelies escibió:
Existing free and open source MT engines. Many make their services available for free...
Oh, that looks very interesting. I wasn’t aware of those. Could you
please share the links?
Apertium is one. When Gordon and I went to LREC last year, we learned
about a few others. I’ll have to look through my notes and things to
I remember too that there are a few Moses instances out there that have
made their services accessible. Again, it’s been a while since I looked
at this, so I’ll follow up.
Another thing to keep in mind here is that we’re trying to also lower
the barrier of accessibility to these MT engines with Intellego. So
while we certainly want to aim to partner with those projects that
already offer these services, we also want to promote the use of the
Intellego platform to MT engines that have not yet created these
services but have it as part of their vision to do so.
Well, as it stands now, I’ll stop my contributions. It doesn’t make sense for me to rent a server and maintain a prototype if it ain’t gonna be used anyway because paying for a lil’ server is too much.
I think it is simply dellusional to expect partners to offer the services, infrastructure and support for MT free of charge. Google asked millions, and you expect someone falling from the sky giving it to you for free? The same way free CPU/RAM resources are unlikely to be free beyond a little trial amount, MT services will be the same, they need to pay for their servers and feed their engineers. Btw, this has nothing to do with open source or not.
As for Apertium, I think the translation quality (ahem) speaks for itself …calling it a “translation” is already courageous. You also mentionned “there are a few Moses instances out there that have made their services accessible” …maybe. But I’ve no doubt they will pull the plug once you start pushing more traffic to it.
Good luck with your endeavors!
I hope Mozilla is aware of Wikimedia initiatives on open MT . I Hope Efforts can be unified. Wikimedia use MT for l10n and content translation by encouraging post-edit. They are also building a free licensed parallel corpora
https://www.mediawiki.org/wiki/Content_translation/Published_translations is about free licensed parallel corpora api
http://apertium.wmflabs.org/listPairs - provides apertium based MT api . It is a public MT api now in beta stage .
http://thottingal.in/documents/eamt2015_cx.pdf is a Research paper on Wikimedia MT based tool
I see a lot of potential , if wikimedia and Mozilla can join hands in this for an MT work based on Apertium , It will make a lot of sense for Open source and Open web . I am also interested with this MT development , with organizational and technical aspects
thanks for the pointers.
I had my share of giggles reading
…and no tool was mentionedby more than one participant
in the research paper.
A few questions, I guess:
Are you training the MT on wikipedia content? Are there tricks to find out which texts are actually aligned?
Also, if so, any idea yet on how such a tool can help small wikipedias? I.e., is there potential to use the post-editing of humans for training?
@Pike It’s using Apertium under the hood, which is not a statistical machine translation, but a rule based one. Hence there is no training corpus involved …as far as I know.
We don’t train, but wikipedia (will soon) provide aligned translation at least at paragraph, but sentence level if possible via a free licensed api for MT engines. As per my interaction with a Santhosh Thottingal, a lead developer in project , Wikipedia is also interested in building a Moses instance inside their cluster with this training data, but not immediate. At present Wikimedia is capable of capturing 200X200 language corpus. Soon it will be 280X280 (280 languages against 280 languages , since there are 280language wikis )
yes, that is the important point, our content translation tool present MT as template to translators, they have to improve it. The final manually corrected corpora is what they provide via apis. yes, small languages are important for wikimedia content translation work.
For eg: English->Spanish apertium is bad in quality. But translators use it heavily. they use MT then they improve it by editing. then publish it. This provides a better translation pair, they feed this back to machine learning systems for training. So next time better translation comes.
Wikimedia just provide a way we can pull the corpora from their systems - so that anybody can take it. They collect words without translations, and give back to apertium - they improve their vocabulary dictionaries. I think they are also using various MT services, and deriving a free corpora using that
I may not be 100% authentic in these , these are my understanding from observing their project, and various interaction with developers. I will initiate a mail thread with Santhosh Thottingal , with @Pike and @gueroJeff.