Machine Translation – A complete and Useful Guide

Hello  readers !!  This is completely a different topic that we will be discussing here. Yes in this article we will be talking about an important research topic called Machine Translation. We all know what is translation, right? It is basically a transformation of wordings and sentence structure from one language to another without changing the meaning of the sentence. We, the people from different regions use translation in our day to day life in order to make the communication process more easier and comfortable.

But when I said the term Machine Translation it might sound odd for common people. The reason is simple; we simply can’t think of a machine or system that can learn a human language and translate it to another human language.  The reason why I am saying this topic as different from others is because the topic is somewhat treated as research oriented topic over the years.  In my masters I did work in this Machine Translation area and I strongly believe that it has a very decent scope in near future. If you want to start your project in Machine Translation then I would recommend you to read the article till last.

So before we go details on how you can start your own Machine Translation project let’s look upon what actually Machine Translation is!!

What is Machine Translation ?

Machine translation is the process of translating a human language to another human language with the help of a Machine or computing device. Suppose consider a situation as being an Indian someday we need to communicate with a Chinese person!! Then how will you start communication?  Both of you don’t understand each other’s language. Then what you can do is you will start communicating with a common language which is known to you as well as the Chinese. For example English ; Now let’s assume that both of you don’t understand English and there is no other common language available !! Then as an Indian how do you   communicate with that Chinese person.  Here we will normally use a translator, right? That means we can appoint someone who knows both Hindi and Chinese!! And what he/she will do is once I say something in Hindi he/she will take the sentence and translates it to Chinese and make it available to the Chinese!! Similarly Machine Translation system also works in the same way. Only difference here is that you need to appoint a Machine to perform translation task that can be further used in real human communication.

Challenges in Machine Translation :

Although the translation process seems to be easier but actually it’s not. According to the Ethnologue Report , there are currently 6909 number of spoken languages available world wide. Each language has unique sentence structure which makes the translation process more difficult. To implement a Machine Translation system for any two language pair you must have depth knowledge about the languages. Again words can have multiple meanings at different contexts and this if not handled properly can make the translation system ambiguous. Lastly to implement such system the developer or programmer must know complex human behavior and accordingly should be able to interpret the meaning of a sentence using programs where 100% accuracy is not achievable. This is probably  the main reason why so many researches are going on Machine Translation.

Machine Translation System – Basic idea :

In this section we will let you know how a Machine Translation system works !  Let us look into the following image which shows  basic mechanism of a Machine Translation system.

As you can observe that at the above image an English to Hindi translation system’s black box view is shown. We are making things simpler here because some user might not be able to visualize complex things. As an user people doesn’t need to analyze what is inside the black box. But from developer perspective we will discus the inside mechanism of an Machine Translation System.

As you can see that the system takes an English language text as input, processes it and further it produces translated output in Hindi text. 

So how come the system translates that English text to Hindi text ?

Most of you probably aware about the term Machine Learning, right ? If not , let me explain to you briefly ! Machine Learning is a field of Computer Science and a sub field of Artificial Intelligence. Its an ability to make a computer system understand human behaviors and make the system able to take new decisions based on trained knowledge. In simple words , its a study of methods by which you can make a computer system to learn things.

How machine learning mechanisms can be used in Machine Translation ??

In machine translation task,  the system must have excessive knowledge of words , Parts of speech , Verb, noun ,object combination  in sentence and many more linguistic resources. More over the system should be able to add new vocabularies or rules whenever an user input something. Upon a successful translation the system must add that translated output to the system database so that it can be used in future translation. Hence Machine Learning makes the system able to adopt new things from  new user inputs and successful translation. 

Another important term in the context of Machine Translation is Corpus. Corpus is basically a collection of sentences. You can relate it with a database type of thing. Corpus plays an important role in computer aided translation. In a corpus of two languages , sentences are organised in such a way that for each sentence in one language you have a translated sentence in another language. Following diagram will help you to understand a corpus.

As you can see in the above corpus , for each English sentence there is a translated sentence in Hindi. From the above corpus the system will take probabilities of each word in English language to be translated into a word in Hindi language. Well this might be difficult to understand !! Simply we can say that from the above corpus the system will assume [on the basis of probability calculation] that for a particular English word suppose “India” the relevant Hindi word is भारत”. Thus using this probability calculation mechanism the system will have some idea about an English word and its relevant Hindi word. However to cover a large set of words and translations there must a huge corpus as a major requirement.    

How to prepare a CORPUS for your machine translation Project ?   

You might be thinking that preparing a corpus is a painful task, right ? But believe me , its not !! Prepare an excel sheet containing two columns :  in one column put one language sentences and in the second column put the other language sentences with relevant translation of the first language sentences.

To make your task easier I am giving you some link from which you can download inbuilt CORPUS for different set of languages.

 Free Corpus Download 1

 Free Corpus Download 2

 Free Corpus Download 3

What else need to do ?

Now I assumed that you have your CORPUS ready . Now you can choose any programming language of your know and start writing your project interface from scratch. If you are not sure from where to start let me explain you what you need to do after getting the CORPUS.

Generally you need to do :

  1. Prepare a program to take user input as sentence. 
  2. You have a CORPUS of both Source and Target language sentences.
  3. Now you need to define probability calculation criteria for word translation.
  4. Now you can add some rules for sentence formation (eg : S+V+O)
  5. Training the system with the help of your corpus.
  6.  Take the input sentence from user and match each word of the sentence with the CORPUS and find its relevant translation based on probability.
  7. Upon getting the translated words , produce the target sentence by combining each translated word based on the target language structure.

However you can follow any path to obtain the translation . Its totally up to you.

NOTE:  Before you start your own Machine Translation programs we recommend you to go through the following links which will make your things more easier. 
 
 
 
 

Final Words :

Hope you have got an idea about Machine Translation and its mechanisms. Now you can start developing your own Machine Translation system. There are too many documentations available in the web; try to explore them to get a better idea. I truly believe that Machine Translation will be extremely helpful in near future. Many People from villages are unable to communicate with the modern languages and therefore unable to convey their messages to government or officials. Developing a Machine Translation can help those people a lot and can contribute to the overall development of a country. Thanks for your time and keep visiting our blog. Have a great time !!! 

Leave a Comment