Clustering Mixed Datasets with Multi-Swarm Optimization and K-Prototype

Version
Download 5
File Size 788.96 KB
File Count 1
Create Date October 17, 2020
Last Updated December 29, 2020

CONFPRO/V27/JULY2016/019

Description

ABSTRACT:

Clustering is produced by grouping objects with high degree of relationship from object with low degree of relationship, such that object found in a group are highly similar and share common attributes that is distinct from the other groups. One major problem of data clustering is to know accurately the number of clusters that can be formed out of a set of data. Clustering mixed data set is also another problem that is faced with clustering models. This article proposed a hybrid clustering algorithm called Multi-swarmK-prototype clustering algorithm for clustering mixed dataset. Multi-swarmK-prototype algorithm consists of multi-swarm optimization and k-prototype clustering algorithm. The multi-swarm optimization algorithm was used to improve the convergence accuracy of the objective function of traditional k-prototype by random search for the initial global best value of k. Six datasets (yeast, soybean, Hepatitis, Australian Credit Approval, German Credit Data and Statlog Heart) obtained from University of California, Irvine (UCI) Machine Learning Repository was used to demonstrate the clustering performance of our algorithm. From the experimental results of soybean and yeast, the proposed Multi-swarmK-prototype had accuracy of (0.971 and 0.973) while MixK-meansKFon had accuracy of (0.86 and 0.84). From the experimental results of Hepatitis, Australian Credit Approval, German Credit Data and Statlog Heart datasets; Multi-swarmK-prototype algorithm had accuracy of (0.9169, 0.9789, 0.9495, 0.9208) while PSO based K-prototype algorithm had accuracy of (0.7521, 0.8229, 0.6261, and 0.8387). The proposed hybrid clustering algorithm is highly proficient for clustering mixed large datasets than MixK-meansKFon and PSO based K-prototype algorithm. It gave a very efficient clustering convergence and minimized the time complexity for clustering large dataset.

Keywords: Clustering, k-prototype algorithm, Mixed Dataset, Multi Swarm Optimization and Multi-model problems.