Cluster analysis for data mining and system identification
Autoři
Více o knize
Dataclusteringisacommontechniqueforstatisticaldataanalysis, whichisusedin many ? elds, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. Clustering is the classi? cation of similar objects into di? erent groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait – often proximity according to some de? ned distance measure. The aim of this book is to illustrate that advanced fuzzy clustering algorithms can be used not only for partitioning of the data, but it can be used for visuali- tion, regression, classi? cationandtime-seriesanalysis, hence fuzzy cluster analysis is a good approach to solve complex data mining and system identi? cation pr- lems. Overview In the last decade the amount of the stored data has rapidly increased related to almost all areas of life. The most recent survey was given by Berkeley University of California about the amount of data. According to that, data produced in 2002 and stored in pressed media, ? lms and electronics devices only are about 5 - abytes. For comparison, if all the 17 million volumes of Library of Congress of the UnitedStatesofAmericaweredigitalized, itwouldbeabout136terabytes. Hence, 5 exabytes is about 37,000 Library of Congress. If this data mass is projected into 6. 3 billion inhabitants of the Earth, then it roughly means that each contem- rary generates 800 megabytes of data every year. It is interesting to compare this amount with Shakespeare’s life-work, which can be stored even in 5 megabytes.