Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Clustering Data Based on Probability Distribution Similarity


Affiliations
1 Vickram College of Engineering, Enathi, India
     

   Subscribe/Renew Journal


Clustering on Distribution measurement is an essential task in mining methodology. The previous methods extend traditional partitioning based clustering methods like k-means and density based clustering methods like DBSCAN rely on geometric measurements between objects. The probability distributions have not been considered in measuring distance similarity between objects. In this paper, objects are systematically modeled in discrete domains and the Kullback-Leibler Divergence is used to measure similarity between the probabilities of discrete values and integrate it into partitioning and density based clustering methods to cluster objects. Finally the resultant execution time, Mean square Error and Noise Point Detection, is calculated and it is compared for Partitioning Based Clustering Algorithm and Density Based Clustering Algorithm. The Partitioning and Density Based clustering using KL divergence have reduced the execution time to 68 sec, Mean Square Error to 0.001and 22 Noise Points are detected. The efficiency of Distribution based measurement clustering is better than the Distance based measurement clustering.


Keywords

Partitioning Based Clustering Methods, Density Based Clustering Method, Distribution Based Clustering, Kullback-Leibler Divergence.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 286

PDF Views: 4




  • Clustering Data Based on Probability Distribution Similarity

Abstract Views: 286  |  PDF Views: 4

Authors

J. Priyadharshini
Vickram College of Engineering, Enathi, India
S. Akila Devi
Vickram College of Engineering, Enathi, India
A. Askerunisa
Vickram College of Engineering, Enathi, India

Abstract


Clustering on Distribution measurement is an essential task in mining methodology. The previous methods extend traditional partitioning based clustering methods like k-means and density based clustering methods like DBSCAN rely on geometric measurements between objects. The probability distributions have not been considered in measuring distance similarity between objects. In this paper, objects are systematically modeled in discrete domains and the Kullback-Leibler Divergence is used to measure similarity between the probabilities of discrete values and integrate it into partitioning and density based clustering methods to cluster objects. Finally the resultant execution time, Mean square Error and Noise Point Detection, is calculated and it is compared for Partitioning Based Clustering Algorithm and Density Based Clustering Algorithm. The Partitioning and Density Based clustering using KL divergence have reduced the execution time to 68 sec, Mean Square Error to 0.001and 22 Noise Points are detected. The efficiency of Distribution based measurement clustering is better than the Distance based measurement clustering.


Keywords


Partitioning Based Clustering Methods, Density Based Clustering Method, Distribution Based Clustering, Kullback-Leibler Divergence.