Open Access Open Access  Restricted Access Subscription Access

Sentiment Analysis of Code Mixed Text Consisting of English- Punjabi Lexicon


Affiliations
1 Department o f Computer Science, Punjabi University, Patiala, India
2 School o f Management Studies, Punjabi University, Patiala, India
 

Sentiment analysis is a field of study for analyzing emotions of people such as happy, sad, angry, etc. towards the entities and attributes expressed in written text. In this study, the data was collected in the textual form from different sources like Facebook, YouTube, Twitter, and Whatsapp, then pre-processed the collected data. After that, identification of the language of code-mixed text performed, which includes tokenization, word-play, misspelled words, abbreviations, slang words, phonetic-typing, etc. After the identification task, the English-Punjabi dictionary was created which was consisting of opinionated words list like positive, negative, and neutral words list. The rest of the words are being stored in an unsorted word list. In the last, a statistical technique applied at sentence level sentiment polarity of the English-Punjabi code mixed dataset. It was identified that the results up to the Five-Grams and Tri-Grams approaches had the similarity.

Keywords

Code Mixed Text, Romanized Text, Natural Language Processing, Text Processing, Romanized Text, Sentiment Analysis, Microblogging.
User
Notifications
Font Size


  • Sentiment Analysis of Code Mixed Text Consisting of English- Punjabi Lexicon

Abstract Views: 421  |  PDF Views: 0

Authors

Mukhtiar Singh
Department o f Computer Science, Punjabi University, Patiala, India
Vishal Goyal
Department o f Computer Science, Punjabi University, Patiala, India
Sahil Raj
School o f Management Studies, Punjabi University, Patiala, India

Abstract


Sentiment analysis is a field of study for analyzing emotions of people such as happy, sad, angry, etc. towards the entities and attributes expressed in written text. In this study, the data was collected in the textual form from different sources like Facebook, YouTube, Twitter, and Whatsapp, then pre-processed the collected data. After that, identification of the language of code-mixed text performed, which includes tokenization, word-play, misspelled words, abbreviations, slang words, phonetic-typing, etc. After the identification task, the English-Punjabi dictionary was created which was consisting of opinionated words list like positive, negative, and neutral words list. The rest of the words are being stored in an unsorted word list. In the last, a statistical technique applied at sentence level sentiment polarity of the English-Punjabi code mixed dataset. It was identified that the results up to the Five-Grams and Tri-Grams approaches had the similarity.

Keywords


Code Mixed Text, Romanized Text, Natural Language Processing, Text Processing, Romanized Text, Sentiment Analysis, Microblogging.

References