18IT100_Practical_Exam

18IT100_Practical_Exam_Work

November 15, 2021

Task-1:

Dataset Description using Orange tool.
What is need to be done to improve the accuracy of the classification result of the given dataset? Get the maximum classification accuracy possible by performing the following methods.
-->Pre-processing
o Encoding
o Normalization
o Missing value handling
o Feature Selection

Compare your accuracy with and without applying pre-processing steps. Perform the Classification and visualize accuracy before and after preprocessing in Orange/Python.

What is Data Pre-processing?

In any Machine Learning process, Data Preprocessing is that step in which the data gets transformed, or Encoded, to bring it to such a state that now the machine can easily parse it. In other words, the features of the data can now be easily interpreted by the algorithm.

Encoding

Encoding is the process of converting data from one form to another. While "encoding" can be used as a verb, it is often used as a noun, and refers to a specific type of encoded data

Normalization

Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.

Feature Selection

DataSet: https://archive.ics.uci.edu/ml/datasets/Spambase

Let's start with the Orange Tool

Step:1 Insert file and reload the dataset spambase inside the file

Step:2 Add the data table to see the number of rows and columns inside it

Step:3 further pre-process that table and put inside the data sampler for the testing

Step:4 This pre-process file move to the test and score column for the evaluation purpose

Here, there is no value inside the evaluation results because of the issue in dataset, i try many times using the different algorithm and evaluation technique but dataset not provide the proper values for the evaluation.

Scenario

Task-2:
Generate the Dashboard of the preprocessed dataset from task-1.
Find the Maximum data insights by plotting Bar chart, Boxplot, Pie Plot, Stack Plot using PowerBI dashboard visualization.

The following answers need to be submitted in a single PDF file:
1. Provide a screenshot of the data description and explain in brief.

Here, is the dataset and there is no proper description given inside it hence during the evaluation it is a critical task to perform it. By ourselves there is created a target variable as feature 57 except that none content is given.