To build new applications, the following scenario will be used:
Scenario: Sentiment analysis (SA).
Data: Amazon video games reviews.
We need to build two applications, an application for training and another application for prediction based on the idea of Baseet of dividing the code into chunks so it can be reused in other applications.
To build a SA training application we need the following tasks:
Import the dataset
Read JSON data
Drop unnecessary columns
Encode review’s texts
Create SA deep learning model
Compile the model
Train the model
Test the model
After we determine the needed tasks, we create a node to perform each task, taking into consideration what the inputs and outputs of each node. We built this app on Baseet.ai Sentiment Analysis Deep Learning Training APP DL as shown in the following figure:
You can explore the code of each node in the app by double click on the node. But let’s take some nodes from the app to explain the main ideas:
This node task is to receive the original data frame and drop unnecessary columns by keeping the two required columns only, so it has two inputs Df_in and cols.
Df_in: dataframe produced from the Read JSON node.
Cols: column names of review texts and rating in the original data.
Here for Amazon data, the columns names of reviews text and rating are reviewText and overall respectively, so the input Cols will be: reviewText,overall For the input Df_in, connect it with the previous node output df (read JSON).
We must select the node, so its input appears on the right panel as the following figures show:
This node task is to check if the data is already labeled into 0/1 (negative/positive) or not (i.e. the labels are ratings). If it’s not labeled, then map the rating into the appropriate sentiment. The nodes inputs are:
Labeled_flag: 0/1, 0 indicates the data is not labeled for sentiment (i.e. ratings only).
Df_in: connected with the output of the previous node (drop unnecessary columns).
In this node, a deep learning model is created which consists of an embedding layer followed by a dropout layer, convolutional layer, and GlobalMaxPooling layer. You can create the model you the layers you want.
In this node, the model optimizer and loss function are determined as inputs. And the performance metric used is accuracy.
The task of this node is to train the model created by previous nodes. The inputs of this node are:
X_trian, y_train, and the created model as inputs from connections with previous nodes.
Batch size, # of epochs, and validation split percentage as inputs from the user as hyper-parameters.
The output of this node is the trained model (weights) which will be saved as H5 file by Save model node.
The task of this node is to test the model. The inputs of this node are: X_test, y_test data, and model as inputs from connections with previous nodes.
The output of this node is the test accuracy, here for Amazon video games data, the accuracy achieved is 90%. You can edit the model and play in the hyper-parameters to try to achieve higher accuracy.
The same for other nodes, connect them gradually and enter the values of inputs until we get the final app.
You can create each node, save it, and compile it successfully, then create an application and start drag and drop these nodes and connect them by links, then save the app, and execute it.
When the app works correctly, you can deploy each node to the community.
If you want to share the whole app to the community, deploy all nodes and create a new public app, drag and drop the deployed version of nodes, connect them, save the app, and execute it.
For the Save File node, the input filename must be entered with the extension, example : file_1.pickle
For the Save Model node, the input name must be entered with the extension, example : model_1.h5