Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Pearson Correlation Coefficient k-Nearest Neighbor Outlier Classification on Real-Time Data Set


Affiliations
1 Department of Computer Science, Nandha Arts and Science College, Erode, Tamilnadu, India
     

   Subscribe/Renew Journal


Detection and classification of data that do not meet the expected behavior (outliers) plays the major role in wide variety of applications such as military surveillance, intrusion detection in cyber security, fraud detection in on-line transactions. Nowadays, an accurate detection of outliers with high dimension is the major issue. The trade-off between the high-accuracy and low computational time is the major requirement in outlier prediction and classification. The presence of large size diverse features need the reduction mechanism prior to classification approach. To achieve this, the Distance-based Outlier Classification (DOC) is proposed in this paper. The proposed work utilizes the Pearson Correlation Coefficient (PCC) to measure the correlation between the data instances. The minimum instance learning through PCC estimation reduces the dimensionality. The proposed work is split up into two phases namely training and testing.  During the training process, the labeling of most frequent samples isolates them from the infrequent reduce the data size effectively. The testing phase employs the k-Nearest Neighborhood (k-NN) scheme to classify the frequent samples effectively. The dimensionality and the k-value are inversely proportional to each other. In proposed work, the selection of large value of k offers the significant reduction in dimensionality. The combination of PCC-based instance learning and the high value of k reduces the dimensionality and noise respectively. The comparative analysis between the proposed PCC-k-NN with the conventional algorithms such as Decision Tree, Naïve Bayes, Instance-Based K-means (IBK), Triangular Boundary-based Classification (TBC) regarding sensitivity, specificity, accuracy, precision, and recall proves its effectiveness in OC. Besides, the experimental validation of proposed PCC-k-NN with the state-of art methods regarding the execution time assures trade-off between the low-time consumption and high-accuracy.


Keywords

Data Mining, Distance-based Instance Learning, Outlier Detection, Outlier Classification, Pearson Correlation Coefficient, k-Nearest Neighbor.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 208

PDF Views: 1




  • Pearson Correlation Coefficient k-Nearest Neighbor Outlier Classification on Real-Time Data Set

Abstract Views: 208  |  PDF Views: 1

Authors

Dr. D. Rajakumari
Department of Computer Science, Nandha Arts and Science College, Erode, Tamilnadu, India
S. Karthika
Department of Computer Science, Nandha Arts and Science College, Erode, Tamilnadu, India

Abstract


Detection and classification of data that do not meet the expected behavior (outliers) plays the major role in wide variety of applications such as military surveillance, intrusion detection in cyber security, fraud detection in on-line transactions. Nowadays, an accurate detection of outliers with high dimension is the major issue. The trade-off between the high-accuracy and low computational time is the major requirement in outlier prediction and classification. The presence of large size diverse features need the reduction mechanism prior to classification approach. To achieve this, the Distance-based Outlier Classification (DOC) is proposed in this paper. The proposed work utilizes the Pearson Correlation Coefficient (PCC) to measure the correlation between the data instances. The minimum instance learning through PCC estimation reduces the dimensionality. The proposed work is split up into two phases namely training and testing.  During the training process, the labeling of most frequent samples isolates them from the infrequent reduce the data size effectively. The testing phase employs the k-Nearest Neighborhood (k-NN) scheme to classify the frequent samples effectively. The dimensionality and the k-value are inversely proportional to each other. In proposed work, the selection of large value of k offers the significant reduction in dimensionality. The combination of PCC-based instance learning and the high value of k reduces the dimensionality and noise respectively. The comparative analysis between the proposed PCC-k-NN with the conventional algorithms such as Decision Tree, Naïve Bayes, Instance-Based K-means (IBK), Triangular Boundary-based Classification (TBC) regarding sensitivity, specificity, accuracy, precision, and recall proves its effectiveness in OC. Besides, the experimental validation of proposed PCC-k-NN with the state-of art methods regarding the execution time assures trade-off between the low-time consumption and high-accuracy.


Keywords


Data Mining, Distance-based Instance Learning, Outlier Detection, Outlier Classification, Pearson Correlation Coefficient, k-Nearest Neighbor.