Hate-Speech and Offensive Language Detection in Roman Urdu

Summary:

The task of automatic hate-speech and offensive language detection in social media content is of utmost importance due to its implications in unprejudiced society concerning race, gender, or religion. Existing research in this area, however, is mainly focused on the English language, limiting the applicability to particular demographics. Despite its prevalence, Roman Urdu (RU) lacks language resources, annotated datasets, and language models for this task. In this study, we: (1) Present a lexicon of hateful words in RU, (2) Develop an annotated dataset called RUHSOLD consisting of 10, 012 tweets in RU with both coarse-grained and fine-grained labels of hate-speech and offensive language, (3) Explore the feasibility of transfer learning of five existing embedding models to RU, (4) Propose a novel deep learning architecture called CNN-gram for hatespeech and offensive language detection and compare its performance with seven current baseline approaches on RUHSOLD dataset, and (5) Train domain-specific embeddings on more than 4.7 million tweets and make them publicly available. We conclude that transfer learning is more beneficial as compared to training embedding from scratch and that the proposed model exhibits greater robustness as compared to the baselines. (Rizwan et al., 2020)

Overview

First, we provide a lexicon base of 621 hateful words for the RU language.
Second, we develop a gold-standard dataset, called Roman Urdu Hate-Speech and Offensive Language Detection (RUHSOLD), from tweets in RU with binary coarse-grained as well as multi-class fine-grained labels.
Third, we explore the transfer learning capabilities of five existing multilingual embedding models to RU language through extensive experiments.
Fourth, we propose a novel deep learning model called Convolutional Neural Network n-gram (CNN-gram) and compare its performance with seven baseline models on the RUHSOLD dataset. In our presentation, we demonstrate that CNN-gram displays a greater robustness across both coarse-grained as well as fine-grained classification tasks.
Fifth, to exhibit contrast with transfer learning of embedding models, we train domainspecific embeddings called “RomUrEm” on

Dataset

First we construst our own lexicon of hateful words (by searching for such keywords online and interviewing people). this lexicon consists of abusive and derogatory terms along with slurs or terms pertaining to religious hate and sexist language. Using this lexicon along with a separate collection of RU common words, we search and collect $20, 000$ tweets and perform a manual preliminary analysis to find new slang, abuses, and identify frequently occurring common terms. The choice to add common RU words is made in order to extract random inoffensive tweets and the tweets that are offensive but do not contain any offensive words.

Using this updated lexicon we search and collect $50, 000$ new tweets. From this updated tweet base, around $10, 000$ tweets are randomly sampled for annotations. To avoid issues related to user distribution bias we restrict a maximum of 120 tweets per user.

The dataset is annotated for two sub-tasks. First sub-task is based on binary labels of "Hate-Offensive" content and "Normal/Neutral content" (i.e., inoffensive language). These labels are self-explanatory. We refer to this sub-task as “coarse-grained classification”. Second sub-task defines Hate-Offensive content with four labels at a granular level. These labels are the most relevant for the demographic of users who converse in RU and are defined in related literature. We refer to this sub-task as “fine-grained classification”. The objective behind creating two sub-tasks is to enable the researchers to evaluate the hatespeech detection approaches on both easier (coarsegrained) and challenging (fine-grained) scenarios. All labels and their definitions are summarized as follows:

Abusive/Offensive: Profanity, strongly impolite, rude or vulgar language expressed with fighting or hurtful words in order to insult a targeted individual or group.
Sexism: Language used to express hatred towards a targeted individual or group based on gender or sexual orientation.
Religious Hate: Language used to express hatred towards a targeted individual or group based on their religious beliefs or lack of any religious beliefs and the use of religion to incite violence or propagate hatred against a targeted individuals or group.
Profane: The use of vulgar, foul or obscene language without an intended target.
Normal: This contains text that does not fall into the above categories.

Samples with translations are provided in Table 1 and the dataset statistics are provided in Table 2.

⚠️ Warning: The following table contains offensive / explicit language (hover to show)

Tweet	Translation	Target Label
randi ke bache tu apne hashar ki fikar kar	you son of a prostitute, you should worry for what will happen to you.	Abusive/Offensive
Hindu bhenchod hi ki gaand ma hi keerra hota hay Tum hindu ho hi harami tumhara kabhi 1 baap nhi hota	There are always insects in asses of Hindu sisterfu**kers. These hindus have multiple fathers instead of 1.	Religious Hate
No wonder you can’t make it to First Lady. At least you managed to grab the title of FIRST RANDDI	No wonder you can’t make it to First Lady. At least you managed to grab the title of FIRST PROSTITUTE.	Sexism
bahria central park karachi forms sold out in two days. Abhi tax maango bhenchodo ka rona shru hojayega	bahria central park karachi forms sold out in two days. Now ask them for tax these motherf**kers start crying.	Profane
pakistan me ptv news or ptv parliment ne hi mulk k liye acha kam kia	in pakistan, only ptv news and ptv parliment has done good work for the country.	Neutral

Table 1. Samples of tweets for each label from RUHSOLD dataset.

Label	Tweet Count
Abusive/Offensive	2,402
Sexism	839
Religious Hate	782
Profane	640
Normal	5,349
Total	10,012

Table 2. Tweet counts with respect to labels in the RUHSOLD dataset.

Experimental Setup

The experiments were conducted on the RUHSOLD dataset in two settings: a coarse-grained binary classification between normal and hate/offensive content, and a fine-grained five-class setup (Normal, Abusive/Offensive, Profane, Sexism, and Religious Hate). The data was split into 7,209 training tweets, 801 validation tweets, and 2,003 test tweets, with a class imbalance favoring the normal category.

We tested six types of embeddings, including LASER, ELMo, multilingual BERT, XLM-RoBERTa, FastText, and RomUrEm, the latter being domain-specific Roman Urdu embeddings trained on approximately 4.7 million tweets. For baselines, seven models were implemented: LSTM with gradient boosted decision trees, Bi-LSTM with attention, FastText with CNN, domain embeddings with CNN, ensemble classifiers combining SVM, random forest and AdaBoost, BERT with LAMB optimizer, and BERT with LASER features combined with LightGBM.

To improve upon baseline approaches we propose CNN-gram model (Figure 1), this model stacks convolutional blocks to learn unigram, bigram, trigram, and four-gram patterns, followed by pooling layers and dense layers for classification. CNN-gram was tested with BERT, XLM-RoBERTa, FastText, and RomUrEm embeddings.

Figure 1: CNN-gram model for hate-speech and offensive language detection in Roman Urdu.

Coarse-grained Classification

Embedding	Without Fine-tuning				With Fine-tuning
Embedding	Accuracy	Precision	Recall	F1-score	Accuracy	Precision	Recall	F1-score
LASER	0.74	0.74	0.74	0.74	0.76	0.76	0.76	0.76
ELMo	0.80	0.80	0.80	0.80	0.79	0.79	0.79	0.79
BERT	0.68	0.70	0.68	0.67	0.89	0.90	0.89	0.89
XLM-RoBERTa	0.53	0.27	0.50	0.35	0.85	0.85	0.85	0.85
FastText	0.74	0.75	0.73	0.73	0.88	0.88	0.88	0.88
RomUrEm	0.85	0.84	0.84	0.84	0.88	0.88	0.88	0.88

Table 3. Out-of-the-box performance of different embeddings for coarse-grained classification.

Model	Accuracy	Precision	Recall	F1-score
LSTM+GBDT	0.54	0.58	0.51	0.38
BERT+LASER+GBDT	0.89	0.89	0.89	0.89
FastText+CNN	0.87	0.87	0.87	0.87
SVM+RF+AB	0.90	0.90	0.90	0.90
BERT+LAMB	0.90	0.90	0.89	0.89
Domain Embeddings+CNN	0.88	0.89	0.88	0.88
BiLSTM with Attention	0.86	0.86	0.85	0.85
BERT+CNN-gram	0.90	0.90	0.90	0.90
XLM-RoBERTa+CNN-gram	0.88	0.88	0.88	0.88
FastText+CNN-gram	0.81	0.81	0.80	0.80
RomUrEm+CNN-gram	0.89	0.89	0.89	0.89

Table 4. Comparisons of the proposed approach with baseline models on coarse-grained classification.

Fine-grained Classification

Embedding	Without Fine-tuning				With Fine-tuning
Embedding	Accuracy	Precision	Recall	F1-score	Accuracy	Precision	Recall	F1-score
LASER	0.66	0.62	0.42	0.46	0.67	0.59	0.52	0.54
ELMo	0.70	0.64	0.52	0.56	0.60	0.66	0.50	0.55
BERT	0.61	0.60	0.36	0.37	0.77	0.72	0.65	0.67
XLM-RoBERTa	0.53	0.11	0.20	0.14	0.79	0.70	0.75	0.72
FastText	0.62	0.55	0.33	0.35	0.77	0.69	0.63	0.66
RomUrEm	0.70	0.69	0.51	0.56	0.79	0.76	0.63	0.67

Table 5. Out-of-the-box performance of different embeddings for fine-grained classification.

Model	Accuracy	Precision	Recall	F1-score
LSTM+GBDT	0.44	0.28	0.29	0.27
BERT+LASER+GBDT	0.73	0.67	0.63	0.64
FastText+CNN	0.71	0.62	0.57	0.58
SVM+RF+AB	0.76	0.68	0.63	0.65
BERT+LAMB	0.76	0.67	0.63	0.64
Domain Embeddings+CNN	0.74	0.64	0.60	0.61
BiLSTM with Attention	0.71	0.62	0.59	0.60
BERT+CNN-gram	0.78	0.69	0.66	0.67
XLM-RoBERTa+CNN-gram	0.78	0.70	0.67	0.68
FastText+CNN-gram	0.73	0.64	0.61	0.62
RomUrEm+CNN-gram	0.79	0.71	0.67	0.69

Table 6. Comparisons of the proposed approach with baseline models on fine-grained classification.

Conclusion

In this work, we presented a dataset in Roman Urdu for the task of hate-speech detection in social media content, annotated with five fine-grained labels. We also make publicly available domain-specific embeddings trained on a parallel corpora of more than 4.7 million tweets. Furthermore, an extensive experimentation with respect to multiple embeddings, their power of transfer learning, and comparison with existing baseline models is carried out. As a future research, semantically challenging cases at fine-grained level with respect to complexities of Abusive/Offensive (targeted) and Profane (untargeted) language demand further investigation.