Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A Framework for Online Hate Speech Detection on Code-mixed Hindi-English Text and Hindi Text in Devanagari

A Framework for Online Hate Speech Detection on Code-mixed Hindi-English Text and Hindi Text in... Social Media has been growing and has provided the world with a platform to opine, debate, display, and discuss like never before. It has a major influence in research areas that analyze human behavior and social groups, and the phenomenon of social interactions is even being used in areas such as Internet of Things. This constant stream of data connecting individuals and organizations across the globe has had a tremendous impact on the functioning of society and even has the power to sway elections. Despite having numerous benefits, social media has certain issues such as the prevalence of fake news, which has also led to the rise of the hate speech phenomenon. Due to lax security throughout these social media platforms, these issues continue to exist without any repercussions. This leads to cyberbullying, defamation, and presents grave security concerns. Even though some work has been done independently on native scripts, hate speech detection, and code-mixed data, there exists a lack of academic work and research in the area of detecting hate speech in transliterated code-mixed data and in-text containing native language scripts. Research in this field is inhibited greatly due to the multiple variations in grammar and spelling and in general a lack of availability of annotated datasets, especially when it comes to native languages. This article comes up with a method to automate hate speech detection in code-mixed and native language text. The article presents an architecture containing a Tabnet classifier-based model trained on features extracted using MuRIL from transliterated code-mixed textual data. The article also shows that the same model works well on features extracted from text in Devanagari despite being trained on transliterated data. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) Association for Computing Machinery

A Framework for Online Hate Speech Detection on Code-mixed Hindi-English Text and Hindi Text in Devanagari

Loading next page...
 
/lp/association-for-computing-machinery/a-framework-for-online-hate-speech-detection-on-code-mixed-hindi-4YdY6EHXpW

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Association for Computing Machinery
Copyright
Copyright © 2023 Association for Computing Machinery.
ISSN
2375-4699
eISSN
2375-4702
DOI
10.1145/3568673
Publisher site
See Article on Publisher Site

Abstract

Social Media has been growing and has provided the world with a platform to opine, debate, display, and discuss like never before. It has a major influence in research areas that analyze human behavior and social groups, and the phenomenon of social interactions is even being used in areas such as Internet of Things. This constant stream of data connecting individuals and organizations across the globe has had a tremendous impact on the functioning of society and even has the power to sway elections. Despite having numerous benefits, social media has certain issues such as the prevalence of fake news, which has also led to the rise of the hate speech phenomenon. Due to lax security throughout these social media platforms, these issues continue to exist without any repercussions. This leads to cyberbullying, defamation, and presents grave security concerns. Even though some work has been done independently on native scripts, hate speech detection, and code-mixed data, there exists a lack of academic work and research in the area of detecting hate speech in transliterated code-mixed data and in-text containing native language scripts. Research in this field is inhibited greatly due to the multiple variations in grammar and spelling and in general a lack of availability of annotated datasets, especially when it comes to native languages. This article comes up with a method to automate hate speech detection in code-mixed and native language text. The article presents an architecture containing a Tabnet classifier-based model trained on features extracted using MuRIL from transliterated code-mixed textual data. The article also shows that the same model works well on features extracted from text in Devanagari despite being trained on transliterated data.

Journal

ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)Association for Computing Machinery

Published: May 8, 2023

Keywords: Cyber security

References