高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

用于数据挖掘的聚类算法

姜园 张朝阳 仇佩亮 周东方

姜园, 张朝阳, 仇佩亮, 周东方. 用于数据挖掘的聚类算法[J]. 电子与信息学报, 2005, 27(4): 655-662.
引用本文: 姜园, 张朝阳, 仇佩亮, 周东方. 用于数据挖掘的聚类算法[J]. 电子与信息学报, 2005, 27(4): 655-662.
Jiang Yuan, Zhang Zhao-yang, Qiu Pei-liang, Zhou Dong-fang. Clustering Algorithms Used in Data Mining[J]. Journal of Electronics and Information Technology, 2005, 27(4): 655-662.
Citation: Jiang Yuan, Zhang Zhao-yang, Qiu Pei-liang, Zhou Dong-fang. Clustering Algorithms Used in Data Mining[J]. Journal of Electronics and Information Technology, 2005, 27(4): 655-662.

用于数据挖掘的聚类算法

Clustering Algorithms Used in Data Mining

  • 摘要: 数据挖掘用于从超大规模数据库中提取感兴趣的信息。聚类是数据挖掘的重要工具,根据数据间的相似性将数据库分成多个类,每类中数据应尽可能相似。从机器学习的观点来看,类相当于隐藏模式,寻找类是无监督学习过程。目前已有应用于统计、模式识别、机器学习等不同领域的几十种聚类算法。该文对数据挖掘中的聚类算法进行了归纳和分类,总结了7类算法并分析了其性能特点。
  • [1] Guha S, Rastogi R, Sim K. CURE: An efficient clustering algorithm for large databases. In Proc. of the ACM SIGMOD Conference, Seattle, WA, 1998:73 - 84.[2]Karypis G, Han E H, Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling.[J]. Computer.1999,32:68-[3]Boley D L. Principal direction divisive partitioning[J].Data Mining and Knowledge Discovery.1998, 2(4):325-[4]Fisher D. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 1987, 23(2): 139 - 172.[5]Mclachlan G, Krishnan T. The EM Algorithm and Extensions[J].New York, NY: John Wiley Sons.1997, http:-[6]Wallace C, Dowe D. Intrinsic classification by MML - the Snob program. In the Proc. of the 7th Australian Joint Conference on Artificial Intelligence, UNE, Armidale, Australia, World Scientific Publishing Co., 1994:37 - 44.[7]Cheeseman P, Stutz J. Bayesian classification (AutoClass): theory and results. Fayyad U M., Piatetsky-Shapiro G, Smyth P, and Uthurusamy R, (Eds.) Advances in Knowledge Discovery and Data Mining, AAAI Press/MIT Press, 1996:95 - 164.[8]Fraley C, Raftery A. MCLUST: Software for model-based cluster and discriminant analysis, Tech. Report 342, Dept. Statistics,Univ. of Washington, 1999.[9]高新波,裴继红,谢维信.基于统计检验指导的聚类分析方法.电子科学学刊,2000,22(1):6-12.[10]邢永康,马少平.一种基于Markov链模型的动态聚类方法.计算机研究与发展,2003,40(2):34-39.[11]杨岳湘,田艳芳,王韶红.基于模糊聚类和Naive Bayes方法的文本分类器,计算机工程与科学,2002,24(5):20-23.[12]Kaufman L, Rousseeuw P. Finding Groups in Data: An Introduction to Cluster Analysis. New York, John Wiley and Sons,NY, 1990: 145- 193.[13]Ng R, Hah J. Efficient and effective clustering methods for spatial data mining. In Proc. of the 20th Conference on VLDB, Santiago,Chile, 1994:144- 155.[14]Ian Davidson. Understanding K-Means No-hierarchical Clustering.Suny Albany-Technical Report 02-2, http:∥www.cs.alb any.edu/~davidson/courses/CSI635/UnderstandingK-MeansClustering.pdf.[15]Vance Faber. Clustering and the Continuous k-Means Algorithm.Los Alamos Science Number 22 1994, http:∥www.c3. lanl.gov/~kelly/ml/pubs/1994_concept/sidebar.pdf.[16]Bradley P S, Fayyad U M. Refining initial points for k-means clustering. In Proc. of the 15th ICML, Madison, WI, 1998:91-99.[17]Aristidis Likas, Nokos Vlassis, Jakob Verbeek. The global k-means clustering algorithm, http:∥iris. usc.edu/ Vision-Notes/bibliography/pattern623.html, 2003:451 - 461.[18]Babu G P, Murty M N. A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm[J].Pattern Recogn.Lell.1993, 14(10):763-[19]Brown D, Huntley C. A practical application of simulated annealing to clustering. Technical Report IPC-TR-91-003,University of Virginia, 1991.[20]Zhang B. Generalized k-harmonic means-dynamic weighting of data in unsupervised learning. In Proc. of the 1st SIAM International Conference on Data Mining, Chicago, IL, 2001:1- 13.[21]Pelleg D, Moore A. X-means: Extending K-means with efficient estimation of the number of clusters. In Proc. 17th ICML, Stanford University, 2000:89 - 97.[22]刘健庄,谢维信,等.聚类分析的遗传算法[J].电子学报,1995,23(11):81-83.[23]李碧,雍正正.一种改进的基于遗传算法的聚类分析方法.电路与系统学报,2002,7(3):96-99.[24]刘静,钟伟才,刘芳,焦李成.免疫进化聚类算法.电子学报,2001,29(12A):1868-1872.[25]高新波,裴继红,谢维信.模糊c均值聚类算法中加权指数m的研究.电子学报,2000,28(4):1-4.[26]张志华,郑南宁,史罡.极大熵聚类算法及其全局收敛性分析.中国科学(E辑),2001,31(1):59-70.[27]沈越泓,益晓新,徐发强,李兴国.模糊聚类和模糊模式识别技术在通信设备抗干扰性能评估系统中的应用.电子科学学刊,2000, 22(2): 210 - 217.[28]Ester M, Kriegel H P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. of the 2nd ACM SIGKDD, Portland, 1996:226 - 231.[29]Sander J, Ester M, Kriegel H P, Xu X. Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications[J].Data Mining and Knowledge Discovery.1998, 2(2):169-[30]Ankerst M, Breunig M, Kriegel H P, Sander J. OPTICS: Ordering points to identify clustering structure. In Proc. of the ACM SIGMOD Conference, Philadelphia, PA, 1999:49 - 60.[31]Xu X, Ester M, Kiegel H P, Sander J. A distribution-based clustering algorithm for mining in large spatial databases. In Proc.of the 14th ICDE, Orlando, FL, 1998:324 - 331.Hinneburg A, Keim D. An efficient approach to clustering large multimedia databases with noise. In Proc. of the 4th ACM SIGKDD, New York, NY, 1998:58 - 65.Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. of the ACM SIGMOD Conference, Seattle,WA, 1998:94 - 105.[32]Wang W, Yang J, Muntz R. STING: a statistical information grid approach to spatialdata mining. In Proc. of the 23rd Conference on VLDB, Athens, Greece, 1997:186 - 195.[33]Wang W, Yang J, Muntz R. STING+: An approach to active spatial data mining. In Proc. 15th ICDE, Sydney, Australia, 1999:116 - 125.[34]Sheikholeslami G, Chatterjee S, Zhang A. WaveCluster: A multi-resolution clustering approach for very large spatial databases. In Proc. of the 24th Conference on VLDB, New York,NY, 1998:428 - 439.[35]Barbara D, Chen P. Using the fractal dimension to cluster datasets.In Proc. of the 6th ACM SIGKDD, Boston, MA, 2000:260 - 264.[36]Guha S, Rastogi R, Shim K. ROCK: A robust clustering algorithm for categorical attributes. In Proc. of the 15th ICDE,Sydney, Australia, 1999:512 - 521.[37]Ertoz L, Steinbach M, Kumar V. Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data,Department of Computer Science, University of Minnesota,Minneapolis, MN, USA Technical Report, 2002, www-users.cs.umn.edu/~kumar/papers/kdd02 snn 28.pdf.[38]Ganti V, Gehrke J, Ramakrishnan R. CACTUS-clustering categorical data using summaries. In Proc. of the 5th ACM SIGKDD, San Diego, CA, 1999:73 - 83.[39]Gibson D, Kleinberg J, Raghavan P. Clustering categorical data:An approach based on dynamic systems. In Proc. of the 24thInternational Conference on Very Large Databases, New York,NY, 1998:311 - 323.[40]Cheng C, Fu A, Zhang Y. Entropy-based subspace clustering for mining numerical data. In Proc. of the 5th ACM SIGKDD, San Diego, CA, 1999:84 - 93.Hinneburg A, Keim D. Optimal grid-clustering: Towards breading the curse of dimensionality in high-dimensional clustering. In Proc. of the 25th Coference on VLDB, Edinburgh,Scotland, 1999:506 - 517.Aggarwal C C, Procopiuc C, Wolf J L, Yu P S, Park J S. Fast algorithms for projected clustering. In Proc. of the ACM SIGMOD Conference Philadelphia, PA, 1999:61 - 72,.[41]Aggarwal C C, Yu P S. Finding generalized projected clusters in high dimension spaces. In Proc. ACM SIGMOD Int. Conf. 2000,http:∥citeseer. ist.psu.edu/aggarwal00finding.html.[42]Kohonen T, The self-organizing map. Proc[J].IEEE.1990, 78(9):1464-[43]钱云涛,谢维信.一种由模糊逻辑神经元网络实现的聚类分析方法.西安电子科技大学学报,1995,22(1):1-7.[44]钱云涛,谢维信.聚类神经网络的通用设计方法.西安电子科技大学学报,1997,24(1):15-21.[45]黄敏超,张育林,陈启智.模糊超球神经网络在模式聚类中的应用.自动化学报,1997,23(2):279-282.[46]魏立梅,谢维信.聚类分析中竞争学习的一种新算法.电子科学学刊,2000,22(1):13-18.[47]黄凤岗,宋克欧.一种集成模糊聚类神经网络.哈尔滨工程大学学报,1997,18(3):82-85.[48]宋爱国,陆佶人.基于进化规划的Kohonen网络用于被动声纳目标聚类研究.电子学报,1998,26(7):128-132[49]张艳宁,赵荣椿,梁怡.一种有效的大规模数据的分类方法.电子学报,2002,30(10):1533-1535.[50]杨志荣,李磊.用SOM聚类实现多级高维点数据索引.计算机研究与发展,2003,40(1):100-106.[51]王莉,王正欧.TGSOM:一种用于数据聚类的动态自组织映射神经网络[J].电子与信息学报.2003,25(3):313-319浏览
  • [1] 张雄涛, 蒋云良, 潘兴广, 胡文军, 王士同.  基于迭代模糊聚类算法与K近邻和数据字典的集成TSK模糊分类器, 电子与信息学报. doi: 10.11999/JEIT190214
    [2] 高云龙, 王志豪, 潘金艳, 罗斯哲, 王德鑫.  基于自适应松弛的鲁棒模糊C均值聚类算法, 电子与信息学报. doi: 10.11999/JEIT190556
    [3] 王菊, 刘付显.  一种面向多属性不确定数据流的模体发现算法, 电子与信息学报. doi: 10.11999/JEIT160247
    [4] 黄利, 尤红建.  基于聚类的非共线多CCD遥感图像误匹配点去除方法, 电子与信息学报. doi: 10.11999/JEIT170043
    [5] 赵学健, 孙知信, 袁源.  基于预判筛选的高效关联规则挖掘算法, 电子与信息学报. doi: 10.11999/JEIT151107
    [6] 高放, 孙长建, 邵庆龙, 郭树旭.  基于K-均值聚类和传统递归最小二乘法的高光谱图像无损压缩, 电子与信息学报. doi: 10.11999/JEIT151439
    [7] 徐小龙, 李永萍.  一种基于MapReduce的知识聚类与统计机制, 电子与信息学报. doi: 10.11999/JEIT150247
    [8] 职为梅, 张婷, 范明.  基于影响函数的k-近邻分类, 电子与信息学报. doi: 10.11999/JEIT141433
    [9] 陈丽敏, 杨静, 张健沛.  一种基于嵌入技术的异构信息网络的快速聚类算法, 电子与信息学报. doi: 10.11999/JEIT150106
    [10] 李秋富, 谌德荣, 何光林, 冯辉, 杨柳心.  最大误差可控的高光谱图像聚类压缩算法, 电子与信息学报. doi: 10.11999/JEIT140451
    [11] 孙力娟, 陈小东, 韩崇, 郭剑.  一种新的数据流模糊聚类方法, 电子与信息学报. doi: 10.11999/JEIT141415
    [12] 张震, 汪斌强, 伊鹏, 兰巨龙.  一种分层组合的半监督近邻传播聚类算法, 电子与信息学报. doi: 10.3724/SP.J.1146.2012.00673
    [13] 江逸茗, 兰巨龙, 郭通, 田铭.  一种面向可重构网络的业务聚类方法, 电子与信息学报. doi: 10.3724/SP.J.1146.2012.00973
    [14] 王力, 吴成东, 陈东岳.  基于密度权期望最大与分裂合并策略的线状模式挖掘, 电子与信息学报. doi: 10.3724/SP.J.1146.2011.01014
    [15] 苏欣, 张大方, 罗章琪, 曾彬, 黎文伟.  基于Command and Control通信信道流量属性聚类的僵尸网络检测方法, 电子与信息学报. doi: 10.3724/SP.J.1146.2011.01098
    [16] 唐成龙, 王石刚, 徐威.  基于数据加权策略的模糊聚类改进算法, 电子与信息学报. doi: 10.3724/SP.J.1146.2009.00857
    [17] 王慎超, 苗夺谦, 陈敏, 王睿智.  基于覆盖的粗糙聚类算法, 电子与信息学报. doi: 10.3724/SP.J.1146.2007.00450
    [18] 郭伟, 王士同, 程科, 韩斌.  视觉采样聚类方法VSC, 电子与信息学报.
    [19] 刘海华, 张武, 陈心浩, 陈亚光.  基于模糊聚类的运动对象分割算法研究, 电子与信息学报.
    [20] 孔潇, 刘党辉, 沈兰荪.  基于模糊聚类的肤色分割, 电子与信息学报.
  • 加载中
  • 计量
    • 文章访问数:  3818
    • HTML全文浏览量:  95
    • PDF下载量:  4727
    • 被引次数: 0
    出版历程
    • 收稿日期:  2003-12-22
    • 修回日期:  2004-04-26
    • 刊出日期:  2005-04-19

    目录

      /

      返回文章
      返回

      官方微信,欢迎关注