Hate Speech Detection in low-resource languages through data augmentation and contextual entity replacement

Info about the Project

We apply our approach to generate training data for a hate speech classification task in the Hindi language and Vietnamese. Our findings show that a model trained using this method outperforms simple language translation for all tasks and performs better than an original curated dataset when tested on a new dataset. This method can be used to bootstrap hate speech detection models from scratch in low-resource language settings. As the growth of social media within these contexts continues to outstrip response efforts, this work furthers our capacities for detection, understanding, and response
Faculty: 
Michael Best
Students: 
Daniel K Nkemelu, Cuong Nguyen, Aman Khullar