Toxic Comment Classification

Comment classification use-case

We can use the multi-label comment classification training and prediction apps for toxic comments classifications. This data contains comments associated with toxicity types labels which are: toxic, severe toxic, obscene, threat, insult, and identity hate.

To use the general multi-label classification training app, follow these steps:

  • Firstly, you must upload your data files into the dataset dashboard:

  • Open the app and import the train data and test data files as follows:

  • The data files are separated by commas so there is nothing to change in Read tabular date file nodes.

  • In the two Comments preprocessing nodes, fill the required inputs which is here the comments_column_name based on your data; here the text data column name is comment_text

  • In the Comments encoding node, set the comments_column_name to comment_text also (text column name in your data), in addition, you can choose the vectorization method from the drop-down list.

  • In the Test comments encoding node, set the comments_column_name to comment_text also (text column name in your data).

  • Multi-label comments classification node, fill the required input which is here the labels_col_names, write the column names separated by commas based on the data exactly, in our case will be toxic, severe_toxic, obscene, threat, insult, identity_hate

  • Fill the same labels_col_names as in the previous node in Measure model performance node.

  • To save the results of your training model, write the file name in Save model node; name field; should be with pickle extension (example: model1.pickle)

  • To save the transformer file, write the file name in Save file node; file_name field; should be with pickle extension (example: file1.pickle).

  • Execute the app, after the app finished successfully, two files will be generated, the model file will be found in the models’ dashboard. And transformer file will be found in the files’ dashboard.

  • Click on Measure model performance node, in the Output tab, the output is ROC-AUC.

------------------------------------------------------------------------------------------------------------------------

Now to use the general comment classification prediction app to classify new comments based on the model trained on toxic data, follow these steps:

  • Import the generated model file in the import model node

  • Import the generated transformer file into the import file node.

  • Write the comment you want to classify and run the app to get the results.

Example: comment_text: I will kill you

The output will be:

I.e. this comment belongs to toxic category and threat category.

You can also run it from the form directly as follows: