高级搜索

一种基于属性空间相似性的模糊聚类算法

施伟锋 卓金宝 兰莹

引用本文: 施伟锋, 卓金宝, 兰莹. 一种基于属性空间相似性的模糊聚类算法[J]. 电子与信息学报, doi: 10.11999/JEIT180974 shu
Citation:  Weifeng SHI, Jinbao ZHUO, Ying LAN. A Novel Fuzzy Clustering Algorithm Based on Similarity of Attribute Space[J]. Journal of Electronics and Information Technology, doi: 10.11999/JEIT180974 shu

一种基于属性空间相似性的模糊聚类算法

    作者简介: 施伟锋: 男,1963年生,博士,教授,主要研究方向为电力系统自动化;
    卓金宝: 男,1991年生,博士生,研究方向为智能故障诊断与预测;
    兰莹: 女,1985年生,博士,讲师,研究方向为多自主体与混杂系统研究;
    通讯作者: 施伟锋, wfshi@shmtu.edu.cn
  • 基金项目: 国家自然科学基金(61503240),上海海事大学研究生创新基金(2016ycx078)

摘要: 模糊C均值(FCM)聚类算法及其相关改进算法基于最大模糊隶属度原则确定聚类结果,没有充分利用迭代后的模糊隶属度矩阵和簇类中心的样本属性特征信息,影响聚类准确度。针对这个问题,该文提出一种新的改进思路:改进FCM算法输出定类原则。给出二元属性拓扑子空间中属性相似度的定义,最终提出一种基于属性空间相似性的改进FCM算法:首先,选择FCM算法聚类后模糊隶属度低于聚类置信度的样本作为存疑样本;然后,计算存疑样本与聚类后聚类中心的属性相似度;最后,基于最大属性相似度原则更新存疑样本的簇类标签。通过UCI数据集实验,证明算法不仅有效,还较一些基于最大模糊隶属度原则定类的改进算法具有更优的聚类评价指标。

English

    1. [1]

      BEZDEK J C. Pattern Recognition with Fuzzy Objective Function Algorithms[M]. Boston: Springer, 1981: 155–201.

    2. [2]

      FRIGUI H and KRISHNAPURAM R. A robust competitive clustering algorithm with applications in computer vision[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(5): 450–465. doi: 10.1109/34.765656

    3. [3]

      YANG M and NATALIANI Y. Robust-learning fuzzy C-means clustering algorithm with unknown number of clusters[J]. Pattern Recognition, 2017, 71: 45–59. doi: 10.1016/j.patcog.2017.05.017

    4. [4]

      SON L H and TIEN N D. Tune up fuzzy C-means for big data: Some novel hybrid clustering algorithms based on initial selection and incremental clustering[J]. International Journal of Fuzzy Systems, 2017, 19(5): 1585–1602. doi: 10.1007/s40815-016-0260-3

    5. [5]

      HUANG Chengquan, CHUNG F, and WANG Shitong. Generalized competitive agglomeration clustering algorithm[J]. International Journal of Machine Learning and Cybernetics, 2017, 8(6): 1945–1969. doi: 10.1007/s13042-016-0572-5

    6. [6]

      SINGH C and BALA A. A DCT-based local and non-local fuzzy C-means algorithm for segmentation of brain magnetic resonance images[J]. Applied Soft Computing, 2018, 68: 447–457. doi: 10.1016/j.asoc.2018.03.054

    7. [7]

      SAHA A and DAS S. Geometric divergence based fuzzy clustering with strong resilience to noise features[J]. Pattern Recognition Letters, 2016, 79: 60–67. doi: 10.1016/j.patrec.2016.04.013

    8. [8]

      肖满生, 肖哲, 文志诚, 等. 一种空间相关性与隶属度平滑的FCM改进算法[J]. 电子与信息学报, 2017, 39(5): 1123–1129. doi: 10.11999/JEIT160710
      XIAO Mansheng, XIAO Zhe, WEN Zhicheng, et al. Improved fcm clustering algorithm based on spatial correlation and membership smoothing[J]. Journal of Electronics &Information Technology, 2017, 39(5): 1123–1129. doi: 10.11999/JEIT160710

    9. [9]

      WANG Xizhao, WANG Yadong, and WANG Lijuan. Improving fuzzy C-means clustering based on feature-weight learning[J]. Pattern Recognition Letters, 2004, 25(10): 1123–1132. doi: 10.1016/j.patrec.2004.03.008

    10. [10]

      YANG M S and NATALIANI Y. A feature-reduction fuzzy clustering algorithm based on feature-weighted entropy[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(2): 817–835. doi: 10.1109/TFUZZ.2017.2692203

    11. [11]

      KUO R J, LIN T C, ZULVIA F E, et al. A hybrid metaheuristic and kernel intuitionistic fuzzy c-means algorithm for cluster analysis[J]. Applied Soft Computing, 2018, 67: 299–308. doi: 10.1016/j.asoc.2018.02.039

    12. [12]

      JIANG Zhaohui, LI Tingting, MIN Wengfang, et al. Fuzzy c-means clustering based on weights and gene expression programming[J]. Pattern Recognition Letters, 2017, 90: 1–7. doi: 10.1016/j.patrec.2017.02.015

    13. [13]

      JIE Lilin, LIU Weidong, SUN Zheng, et al. Hybrid fuzzy clustering methods based on improved self-adaptive cellular genetic algorithm and optimal-selection-based fuzzy c-means[J]. Neurocomputing, 2017, 249: 140–156. doi: 10.1016/j.neucom.2017.03.068

    14. [14]

      KIM E H, OH S K, and PEDRYCZ W. Reinforced hybrid interval fuzzy neural networks architecture: Design and analysis[J]. Neurocomputing, 2018, 303: 20–36. doi: 10.1016/j.neucom.2018.04.003

    15. [15]

      DAGHER I. Fuzzy clustering using multiple Gaussian kernels with optimized-parameters[J]. Fuzzy Optimization and Decision Making, 2018, 17(2): 159–176. doi: 10.1007/s10700-017-9268-x

    16. [16]

      王骏, 刘欢, 蒋亦樟, 等. 堆叠隐空间模糊C均值聚类算法[J]. 控制与决策, 2016, 31(9): 1671–1677. doi: 10.13195/j.kzyjc.2015.0768
      WANG Jun, LIU Huan, JIANG Yizhang, et al. Cascaded hidden space fuzzy C means clustering algorithm[J]. Control and Decision, 2016, 31(9): 1671–1677. doi: 10.13195/j.kzyjc.2015.0768

    17. [17]

      LUO Xiong, XU Yang, WANG Weiping, et al. Towards enhancing stacked extreme learning machine with sparse autoencoder by correntropy[J]. Journal of the Franklin Institute, 2018, 355(4): 1945–1966. doi: 10.1016/j.jfranklin.2017.08.014

    18. [18]

      DAI Jianhua, HU Qinghua, HU Hu, et al. Neighbor inconsistent pair selection for attribute reduction by rough set approach[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(2): 937–950. doi: 10.1109/TFUZZ.2017.2698420

    19. [19]

      BU Fanyu. An efficient fuzzy c-means approach based on canonical polyadic decomposition for clustering big data in IoT[J]. Future Generation Computer Systems, 2018, 88: 675–682. doi: 10.1016/j.future.2018.04.045

    20. [20]

      MACIEJEWSKI R, JANG Y, WOO I, et al. Abstracting attribute space for transfer function exploration and design[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(1): 94–107. doi: 10.1109/TVCG.2012.105

    21. [21]

      李保珍, 张亭亭. 成对属性关联分析及其属性空间构建[J]. 情报学报, 2014, 33(11): 1194–1203.
      LI Baozhen and ZHANG Tingting. Association analysis of pairwise attributes and construction of attribute space[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(11): 1194–1203.

    22. [22]

      WEI Cuiping, WANG Pei, and ZHANG Yuzhong. Entropy, similarity measure of interval-valued intuitionistic fuzzy sets and their applications[J]. Information Sciences, 2011, 181(19): 4273–4286. doi: 10.1016/j.ins.2011.06.001

    23. [23]

      BACHE K and LICHMAN M. UCI irvine machine learning repository[EB/OL]. Irvine, CA: University of California, School of Information and Computer Science, http://archive.ics.uci.edu/ml2013.

    24. [24]

      RAND W M. Objective criteria for the evaluation of clustering methods[J]. Journal of the American Statistical Association, 1971, 66(336): 846–850. doi: 10.1080/01621459.1971.10482356

    25. [25]

      COOMBS C H, DAWES R M, and TVERSKY A. Mathematical Psychology: An Elementary Introduction[M]. Englewood Cliffs, USA: Prentice-Hall, 1970: 391–406.

    26. [26]

      ZHANG Daoqiang and CHEN Songcan. Clustering incomplete data using kernel-based fuzzy C-means algorithm[J]. Neural Processing Letters, 2003, 18(3): 155–162. doi: 10.1023/B:NEPL.0000011135.19145.1b

    1. [1]

      张杰鑫庞建民张铮邰铭刘浩. 基于非相似余度架构的网络空间安全系统异构性量化方法. 电子与信息学报, doi: 10.11999/JEIT180764

    2. [2]

      眭萍郭英李红光王宇宙. 基于混沌吸引子重构和Low-rank聚类的跳频信号电台分选. 电子与信息学报, doi: 10.11999/JEIT180947

    3. [3]

      王艳薛改娜李顺波惠飞飞. 一类新的周期为2pmq阶二元广义分圆序列的线性复杂度. 电子与信息学报, doi: 10.11999/JEIT180884

    4. [4]

      叶磊王勇杨强邓维波. 基于对数行列式散度与对称对数行列式散度的高频地波雷达目标检测器. 电子与信息学报, doi: 10.11999/JEIT181078

    5. [5]

      徐金甫吴缙李军伟曲彤洲董永兴. 基于敏感度混淆机制的控制型物理不可克隆函数研究. 电子与信息学报, doi: 10.11999/JEIT180775

    6. [6]

      陈鸿昶吴彦丞李邵梅高超. 基于行人属性分级识别的行人再识别. 电子与信息学报, doi: 10.11999/JEIT180740

    7. [7]

      杜小妮李丽张福军. 基于模2pm的欧拉商的二元序列的线性复杂度. 电子与信息学报, doi: 10.11999/JEIT190071

    8. [8]

      赵建高海英胡斌. 基于容错学习的属性基加密方案的具体安全性分析. 电子与信息学报, doi: 10.11999/JEIT180824

    9. [9]

      张玉磊刘文静刘祥震张永洁王彩芬. 基于授权的多服务器可搜索密文策略属性基加密方案. 电子与信息学报, doi: 10.11999/JEIT180944

    10. [10]

      杨英杰冷强常德显潘瑞萱胡浩. 基于属性攻击图的网络动态威胁分析技术研究. 电子与信息学报, doi: 10.11999/JEIT181025

    11. [11]

      臧鸿雁黄慧芳柴宏玉. 一类2次多项式混沌系统的均匀化方法研究. 电子与信息学报, doi: 10.11999/JEIT180735

    12. [12]

      杜小妮吕红霞王蓉. 一类四重和六重线性码的构造. 电子与信息学报, doi: 10.11999/JEIT180939

    13. [13]

      谢显中黎佳黄倩陈杰. 机器类通信中基于NOMA短编码块传输的高可靠低迟延无线资源分配优化方案. 电子与信息学报, doi: 10.11999/JEIT190128

    14. [14]

      达新宇王浩波罗章凯胡航倪磊潘钰. 基于双层多参数加权类分数阶傅里叶变换的双极化卫星安全传输方案. 电子与信息学报, doi: 10.11999/JEIT181135

    15. [15]

      陈少真张怡帆任炯炯. 具有最小异或数的最大距离可分矩阵的构造. 电子与信息学报, doi: 10.11999/JEIT181113

    16. [16]

      殷茗王文杰张煊宇姜继娇. 一种基于邻接表的最大频繁项集挖掘算法. 电子与信息学报, doi: 10.11999/JEIT180692

    17. [17]

      马彬李尚儒谢显中. 异构无线网络中基于模糊逻辑的分级垂直切换算法. 电子与信息学报, doi: 10.11999/JEIT190190

    18. [18]

      蒲磊冯新喜侯志强余旺盛. 基于空间可靠性约束的鲁棒视觉跟踪算法. 电子与信息学报, doi: 10.11999/JEIT180780

    19. [19]

      黄国策王桂胜任清华董淑福高维廷魏帅. 基于Hilbert信号空间的未知干扰自适应识别方法. 电子与信息学报, doi: 10.11999/JEIT180891

    20. [20]

      徐丹符吉祥孙光才邢孟道苏涛保铮. 空间目标的短时三维几何重构方法. 电子与信息学报, doi: 10.11999/JEIT180936

  • 表 1  FCM-SAS算法具体步骤

    输入样本集${{X}}$、样本数$n$,聚类个数$c$、加权指数$m$、迭代阈
    $\varepsilon $、最大迭代次数T、聚类存疑率$\xi $、属性占比率$\kappa $
    输出样本标签集${{{X}}_l}'$
    1表1的传统FCM算法步骤得到迭代后模糊隶属度矩阵${{U}}$、簇类中心${{V}}$和样本标签集${{{X}}_l}$,令$j = 0$
    2计算所有样本点${{x}}$的模糊隶属度最大值,按递增顺序排序并组成数组,选出此数组中第$\left[ {\xi \times n} \right]$个元素的数值作为聚类置信度$\eta $
    3$j = j + 1$,判断第$j$个样本的模糊隶属度$\max \left( {\left\{ {{u_{ij}}\left| {i = 1, 2, ·\!·\!· , c} \right.} \right\}} \right)$是否不大于聚类置信度$\eta $,若是则此样本为存疑样本,转步骤4;否则转步骤8;
    4按式(8)计算第$j$个样本与各个簇类中心在2元属性拓扑子空间中的拓扑相似度集合$\gamma \left( {{{{x}}_j}, {{{v}}_i}} \right)$,将所有集合中的元素取绝对值后按递增的顺序排序并组成数组,计算此数组中第$\left[ {n \times {\gamma _{dis}} \times \kappa } \right]$个元素与数值1之间差的绝对值作为邻域半径$\delta $
    5$\delta $为邻域半径,按式(10)计算第$j$个样本与各个簇类中心的属性相似度$\psi \left( {{{{x}}_j}, {{{v}}_i}} \right)$
    6若最大属性相似度$\max \left( {\psi \left( {{{{x}}_j}, {{{v}}_i}} \right)} \right)$只有一个,则选出最大属性相似度时的簇类中心所在的类别作为此样本更新后的标签$x_{lj}'$,转步骤8;否则,转步骤7;
    7若最大属性相似度不止一个,则选择这些簇类中最大拓扑相似度集合之和$\widehat S = \max \left( {\Sigma \gamma \left( {{{{x}}_j}, {{{v}}_i}} \right)} \right)$时的簇类中心所在的类别作为$x_{lj}'$
    8判断$j < n$,若是则转步骤3,否则输出更新后的样本标签集${{{X}}_l}'$
    下载: 导出CSV

    表 2  算法输入参数设置

    参数数值
    加权指数$m$2
    迭代阈值$\varepsilon $${10^{ - 3}}$
    最大迭代次数$T\;$$100$
    聚类存疑率$\xi $$0.3$
    属性占比率$\kappa $$0.5$
    下载: 导出CSV

    表 3  UCI数据集的统计描述

    数据集样本数维数簇类各类占比
    Iris1504350:50:50
    Wine17813359:71:48
    Seeds2107370:70:70
    Breast68392444:239
    Glass2149670:17:76:13:9:29
    下载: 导出CSV

    表 4  UCI数据集聚类结果评价指标对比(1)

    FCMRL-FCMRCAWFCMFRCMFCM-SAS(标准化样本)FCM-SAS(未标准化样本)
    IrisAR0.8930.9070.9670.9570.9600.9870.953
    RI0.8800.8920.9580.9340.9520.9830.942
    NMI0.7500.8310.8730.9490.8498
    SeedsAR0.8950.8950.9030.8950.8950.9190.900
    RI0.8740.8740.8840.8730.8760.8990.877
    NMI0.6950.6770.6970.7170.671
    BreastAR0.9370.9530.6550.9380.9470.9650.946
    RI0.8760.9100.5480.8840.9110.9320.897
    NMI0.7300.7360.7550.7820.688
    下载: 导出CSV

    表 5  UCI数据集聚类结果评价指标对比(2)

    FCMPSO-IFCMGA-IFCMABC-IFCMKFCMWGFCMFCM-SAS(标准化样本)FCM-SAS(未标准化样本)
    IrisAR0.8930.8070.8490.7870.8950.9730.9870.953
    WineAR0.9490.6550.6520.6420.9420.9660.9550.781
    GlassAR0.4210.4190.3930.4670.4600.7330.5330.472
    下载: 导出CSV

    表 6  FCM-SAS算法聚类过程统计数据

    样本集1次定类错误样本数1次定类正确的存疑样本数1次定类错误的存疑样本数存疑样本数2次定类正确样本数2次定类错误样本数
    Iris16291645432
    Seeds214419634815
    Breast43173272001919
    Wine945853476
    Glass1262142634518
    下载: 导出CSV
  • 加载中
计量
  • PDF下载量:  15
  • 文章访问数:  233
  • HTML全文浏览量:  166
  • 引证文献数: 0
文章相关
  • 通讯作者:  施伟锋, wfshi@shmtu.edu.cn
  • 收稿日期:  2018-10-17
  • 录用日期:  2019-02-28
  • 网络出版日期:  2019-04-25
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

/

返回文章