Comparación de modelos de identificación automática de odio en comentarios de microtextos en español
Fecha
2021-09
Tipo
tesis de maestría
Autores
Navarro Murillo, Noelia
Título de la revista
ISSN de la revista
Título del volumen
Editor
Resumen
Esta investigación se enfoca en la detección de odio en comentarios en español extraídos de Twitter. Se analiza la efectividad de los modelos de SVM (Support Vector Machine) y CNN (Convolutional Neural Network) en la identificación automática del odio en los textos. Se analizan los resultados obtenidos utilizando características de frecuencia de términos y word embeddings para SVM, así mismo el efecto de aplicar sobremuestro. Mientras, para las redes CNN se utilizaron los word embeddings. La investigación provee un corpus de textos anotados, para el cual se utilizó la guía de anotación de identificación de odio en el texto. Este trabajo busca colaborar con la investigación en español sobre la detección del odio, proporcionando el corpus anotado y el análisis de efectividad de los modelos SVM y CNN para la identificación automática del odio.
This research focuses on detecting hate in comments in Spanish extracted from Twitter. The effectiveness of the SVM (Support Vector Machine) and CNN (Convolutional Neural Network) models in identifying hate in texts is analyzed. The results were analyzed using characteristics of term frequency and word embeddings for SVM, as well as the effect of oversampling. Meanwhile, for CNN word embeddings were used. The research provides a corpus of texts annotated by people following an annotation guide for the manual identification of hate speech. This work has the aim to collaborate with the Spanish research on hate speech detection. It provides the annotated corpus and the performance results analysis for the SVM and CNN models used to identify hate in text.
This research focuses on detecting hate in comments in Spanish extracted from Twitter. The effectiveness of the SVM (Support Vector Machine) and CNN (Convolutional Neural Network) models in identifying hate in texts is analyzed. The results were analyzed using characteristics of term frequency and word embeddings for SVM, as well as the effect of oversampling. Meanwhile, for CNN word embeddings were used. The research provides a corpus of texts annotated by people following an annotation guide for the manual identification of hate speech. This work has the aim to collaborate with the Spanish research on hate speech detection. It provides the annotated corpus and the performance results analysis for the SVM and CNN models used to identify hate in text.
Descripción
Palabras clave
clasificador de texto, detección de odio, Support Vector Machine, Convolutional Neural Network