Hate Speech Detection in low-resource languages through data augmentation and contextual entity replacement

Info about the Project

Hate Speech Detection

We apply our approach to generate training data for a hate speech classification task in the Hindi language and Vietnamese. Our findings show that a model trained using this method outperforms simple language translation for all tasks and performs better than an original curated dataset when tested on a new dataset. This method can be used to bootstrap hate speech detection models from scratch in low-resource language settings. As the growth of social media within these contexts continues to outstrip response efforts, this work furthers our capacities for detection, understanding, and response

Faculty:

Michael Best

Students:

Daniel K Nkemelu, Cuong Nguyen, Aman Khullar

GVU Center

Hate Speech Detection in low-resource languages through data augmentation and contextual entity replacement

Info about the Project

Hate Speech Detection

Georgia Tech Resources

Visitor Resources