IBM researchers are working to fix one of the internet’s most significant problems — offensive, and abusive language. With the help of deep learning, the team trained their neural network to automatically turn offensive comments into non-offensive ones.
“The use of offensive language is a common problem of abusive behavior on online social media networks,” the researchers stated in their paper. “This work is a first step in the direction of a new promising approach for fighting abusive posts on social media.”
Using NVIDIA Tesla GPUs and the cuDNN-accelerated TensorFlow deep learning framework, the team trained an encoder-decoder neural network on a dataset comprised of thousands of offensive and non-offensive text they collected from Twitter and Reddit. Once trained the neural network was able to replace offensive language with a 99% accuracy.
“To the best of our knowledge, all previous work addressing the problem of offensive language on social media has focused on text classification only. Those methods can thus be used mainly to flag and filter out the offensive content, but our proposed approach goes one step forward and produces an alternative non-offensive version of the content,” the researchers explained.
The researchers say their approach allows users who plan to post an offensive message to receive an alert that the content is offensive and will be blocked, offering them a “more polite” version of the message they can post.
“We believe that improved versions of the proposed method, together with the use of much larger volumes of training data, will be able to cope with other abusive posts such as posts containing hate speech, racism, and sexism,” the team said.