18IT100_Practical_Exam_Work
Task-1:
Dataset Description using Orange tool.What is need to be done to improve the accuracy of the classification result of the given dataset? Get the maximum classification accuracy possible by performing the following methods.
-->Pre-processing
o Encoding
o Normalization
o Missing value handling
o Feature Selection
Compare your accuracy with and without applying pre-processing steps. Perform the Classification and visualize accuracy before and after preprocessing in Orange/Python.
What is Data Pre-processing?
In any Machine Learning process, Data Preprocessing is that step in which the data gets transformed, or Encoded, to bring it to such a state that now the machine can easily parse it. In other words, the features of the data can now be easily interpreted by the algorithm.
Encoding
Encoding is the process of converting data from one form to another. While "encoding" can be used as a verb, it is often used as a noun, and refers to a specific type of encoded data
Normalization
Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.
Feature Selection
Let's start with the Orange Tool
Step:1 Insert file and reload the dataset spambase inside the file
Step:2 Add the data table to see the number of rows and columns inside it
Step:3 further pre-process that table and put inside the data sampler for the testing
Step:4 This pre-process file move to the test and score column for the evaluation purpose
Here, there is no value inside the evaluation results because of the issue in dataset, i try many times using the different algorithm and evaluation technique but dataset not provide the proper values for the evaluation.
Scenario
Task-2:
Generate the Dashboard of the preprocessed dataset from task-1.
Find the Maximum data insights by plotting Bar chart, Boxplot, Pie Plot, Stack Plot using PowerBI dashboard visualization.
The following answers need to be submitted in a single PDF file:
1. Provide a screenshot of the data description and explain in brief.
Here, is the dataset and there is no proper description given inside it hence during the evaluation it is a critical task to perform it. By ourselves there is created a target variable as feature 57 except that none content is given.
3. Provide a screenshot showing accuracy before and after pre-processing.
2. Provide screenshot (s) of data pre-processing steps showing their significance.
The above step2,3,4 are the example related to the pre-process scenario.
3. Provide a screenshot showing accuracy before and after pre-processing.
The default is in the dataset so no accuracy is showing to the test model
Comments
Post a Comment