Visual Programming with Orange Tool

This blog is continuation of learning the Orange tool, whereby with this blog would be discussing additional about exploitation new options in tool for splitting the dataset into training and testing dataset and more features for getting the accuracy for various models and examination them.

Creating the Workflow

    • First we tend to use File widget within the canvas and load the inbuilt iris dataset in the workflow.

    • Next Data Sampler, widget implements several data sampling methods. It outputs a sampled and a complementary dataset (with instances from the input set that are not included in the sampled dataset). The output is processed after the input dataset is provided and Sample Data is pressed. 
    • Here I sampled the data 75% output sampled data and 25% will be complementary data set.


    • Send the data from the Data Sampler widget to the Test and Score widget. The widget puts learning algorithms to the test. There are various sampling strategies available, including the use of distinct test data. The widget has two functions. It starts with a table that lists various classifier performance metrics, such as classification accuracy and area under the curve.
    • Second, it generates assessment data that can be used by other widgets to analyze classifier performance, such as ROC Analysis and Confusion Matrix.
    • Further, there is division of three different learning algorithm as Neural NetworkNaïve Bayes and Logistic Regression.
    How to efficiently use cross-validation in Orange? What is the effect of it on model output/accuracy?

    The data is divided into a specified number of folds using cross-validation (usually 5 or 10). Cross-validation is a technique used in applied machine learning to estimate a machine learning model's skill on unknown data. That is, to use a small sample to assess how the model will perform in general when used to generate predictions on data that was not utilized during the model's training.



    Split data in training data and testing data in Orange.

    By clicking on the link between Data Sampler and Test and Score, we will submit 75 percent of the sampled data from Data Sampler as train data and the remaining 25 percent data as test data to split the data into train and test datasets. As illustrated in the diagram below, link the Data Sample box to the Data box and the Remaining Data box to the Test Data box.



    Then, using train data, compare the evaluation outcomes of the three different models. Choose the option Test on train data in the Test and Score widget for sampling.



    Choose the option Test on test data in the Test and Score widget to test the learning models on the basis of the test data.




    So that’s all for now. 

    Thank You:)

    Comments

    Popular posts from this blog

    18IT100_Practical_Exam_Work

    Practical:10 Getting started with Neo4j and Gephi Tool

    Practical:11 PREDICTING GENDER AND AGE USING IMAGE DATA IN PYTHON