Info about the Project
We apply our approach to generate training data for a hate speech classification task in the Hindi language and Vietnamese. Our findings show that a model trained using this method outperforms simple language translation for all tasks and performs better than an original curated dataset when tested on a new dataset. This method can be used to bootstrap hate speech detection models from scratch in low-resource language settings. As the growth of social media within these contexts continues to outstrip response efforts, this work furthers our capacities for detection, understanding, and response
Faculty: Michael Best
Students: Daniel K Nkemelu, Cuong Nguyen, Aman Khullar
We apply our approach to generate training data for a hate speech classification task in the Hindi language and Vietnamese. Our findings show that a model trained using this method outperforms simple language translation for all tasks and performs better than an original curated dataset when tested on a new dataset. This method can be used to bootstrap hate speech detection models from scratch in low-resource language settings. As the growth of social media within these contexts continues to outstrip response efforts, this work furthers our capacities for detection, understanding, and response
Faculty:
Michael Best
Students:
Daniel K Nkemelu, Cuong Nguyen, Aman Khullar