![]() It has mainly four diacritic which are being used to represent photonics. There are total fifteen (15) vowels in Urdu, however, to describe a particular vowel, Urdu has set of diacritical marks that are above or below a character which defines a particular vowel. NLP for English language is quite mature, but there is a lot of gap in terms of work to be done on Urdu as well as in Roman Urdu language. Tasks of NLP include morphological analysis, Parts of Speech (POS) tagging, removal of stop words, parsing, and so on. In order to make computers to learn and understand Natural Language, the term NLP ispredominately used to achieve this task by making interaction between humans and computers. ![]() Such as a (SMS) Short Service Messaging,Facebook, Whatsapp, Twitter, etc. Due to its ease in usage, therefore, it isbeing heavily used as a digital communication medium on different applications. Nonetheless, due to the emergence of mobile phone technology, people communicate with each other and share their views inUrdu language format with the use of English alphabets and so called Roman Urdu. This gap isdue to lack of interest by the engineering community along with insufficient availability of linguistic resources. In past, less research has been done in Urdu language compared to European Languages. The Urdu language family evolved from Indo-European, Indo–Iranian and Indo-Aryan.To a further extend it has roots in Persian, Arabic and exhibit most similarity with South Asian languages and has structuralsimilarity with Hindi. European countries like USA, includingUK, Canada have also Urdu speakers. It is estimated that roughly200 million people in Pakistan and 1.65 billion people in India speak Urdu language. Urdu is the national language of Pakistan and also is an official language in some states of India. Keywords: Natural Language Processing (NLP), Waikato Environment for Knowledge Analysis(WEKA), Roman-Urdu (UR), Named Entity Recognition (NER), Parts of Speech (POS) Finally, in conclusion, research findings and future directions are highlighted as to give novel ideas in this area. Furthermore, performance metrics, algorithms techniques have also been illustrated. Further, we discuss grammatical structure of Urdu languages, pre-processing techniques, software tools and database used in Roman-Urdu and Urdu language. The objective of this study is to provide comprehensive review on Roman-Urdu and Urdu language, which covers previous works done by authors in this area. Due to this fact, least work is reported in Urdu language by the NLP researcher’s community. Nonetheless, one major issue in Roman-Urdu text processing is that it has no standardized lexicon, resulting one word in Urdu have different spelling variations. Ameen Chahjro1ġDepartment of Software Engineering, SMI University, Karachi, PakistanĢDepartment of Computer Science, SMI University, Karachi, PakistanģDepartment of Business Administration, SMI University, Karachi, PakistanĪbstract:- Lately, there has been growing interest of users in Roman-Urdu text processing by using English language keyboard over the various social network platforms in Pakistan as well as in some other states of Indo-European. Madan Lal1, Kamlesh Kumar1, Asif Ali Wagan2, Asif Ali Laghari2, Mansoor Ahmed Khuhro2, A Systematic Study of Urdu Language Processing its Tools and Techniques: A Review
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |