20050522

Google Translator: The Universal Language

At the end of the 19th century, L. L. Zamenhof proposed Esperanto; it was intended as a global language to be spoken and understood by everyone. The inventor was hoping that a common language could resolve global problems that lead to conflict. Esperanto as a planned language might have had some success, but today, English is much more universal. 30 countries have it as an official language, and in many other countries it is taught in school and understood fairly well. The internet can be suspected to further increase the adoption of English.
Still, many people around teh world cannot speak or communicate in English. Therefore the collective information that is distributed across the web is not available to them. When you browse the web, or saerch for information you may come across various documents that are written in other languages and have symbols that you cannot understand - like Chinese, Arabic, French, Spanish, etc. So those documents are inaccessible. Many translation systems have been developed but they cannot provide good accuracy to understand the gist of the document.
At the recent web cast of the Google Factory Tour, researcher Franz Och presented the current state of the Google Machine Translation Systems. He compared translations of the current Google translator and the status quo of the Google Research Lab's activities. The results were highly impressive. A sentence in Arabic which is now translated as "Alpine white new presence tape registered for coffee confirms Laden" is now in the Research Labs being translated to "The White House Confirmed the Existence of a New Bin Laden Tape".
How is it done?
It is complex to program such a system, but the principle on which it is based is easy - so easy in fact that the researchers working on this enabled to translate from Chinese to English without any researcher being able to speak Chinese. To the translation system, any language is treated the same, and there is no manually created rule-set of grammer, metaphors and such. Instead, the system is learning from the existing human translations. Google relies on a large corpus of texts which are available in multiple languages.
Let's take a simple example: if a book is titled "Thus Spoke Zarathustra" in English, and the German title is "Also sprach Zarathustra", the system can begin to understand that "thus spoke" can be translated with "also sprach". (This approach would even work for metaphors – surely, Google researchers will take the longest available phrase which has high statistical matches across different works). So the researchers need to feed the system with two text in language A and Language B and tell that both these are the same. The body of the text must be immensely large, else the system would stumble across many unrelated phrases.
Google used the United Nations Documents to train their machine, and all in fed 200 billion words. This is brute force AI, if you want – it works on statistical learning theory only and has not much real "understanding" of anything but patterns. We can expect that Google will release their translation system soon (by this or next year). So where can Google integrate these.
Google Search
Documents that are searched using Google Search can have a link to obtain an automatic translation of the text. You just have to specify your language in your Google preferences and the text will automatically be translated to your language of understanding. Also if you search for a particular keyword or phrase, it will be automatically translated to other languages and all the web pages containing that keyword (in any language) will be presented. So if you searched for "thus spoke, you would also get results containing also sprach.
Gmail
All mail will be translated to your language and you can understand what your German friend wrote automatically.
Google Browser
If at all Google brings out a Browser, it can integrate the translation tool right into the browser which automatically translates the web page you are currently viewing into our language of preference. So, if you know only English and you are viewing a French page, this browser can automatically show you the English translated version of it. You would be totally unaware of the translation and all the web resources will be available to you only in English.
Google Instant Messenger
If Google brings out a Google Instant Messenger (GIM ?) then it can automatically translate the conversation between two persons having different languages. If A and B (different language) communicate with each other, then GIM can automatically translate what A speaks to B and vice versa.
Google BabelFish
This is very advanced device and it right from a Sci-Fi movie. This is a small plugin for your ear which automatically translates what people around you speak. So that even if they speak various languages, all you would be hearing is your mother tongue.

Who knows Google might be already behind such projects. We will have to wait and watch.

No comments: