高级搜索

基于标记密度分类间隔面的组类属属性学习

王一宾 裴根生 程玉胜

引用本文: 王一宾, 裴根生, 程玉胜. 基于标记密度分类间隔面的组类属属性学习[J]. 电子与信息学报, doi: 10.11999/JEIT190343 shu
Citation:  Yibin WANG, Gensheng PEI, Yusheng CHENG. Group-Label-Specific Features Learning Based on Label-Density Classification Margin[J]. Journal of Electronics and Information Technology, doi: 10.11999/JEIT190343 shu

基于标记密度分类间隔面的组类属属性学习

    作者简介: 王一宾: 男,1970年生,教授,研究方向为多标记学习,机器学习,软件安全等;
    裴根生: 男,1992年生,硕士,研究方向为机器学习,数据挖掘,统计等;
    程玉胜: 男,1969年生,教授,研究方向为数据挖掘,机器学习等
    通讯作者: 程玉胜,chengyshaq@163.com
  • 基金项目: 安徽省高校重点科研项目(KJ2017A352)

摘要: 类属属性学习避免相同属性预测全部标记,是一种提取各标记独有属性进行分类的一种框架,在多标记学习中得到广泛的应用。而针对标记维度较大、标记分布密度不平衡等问题,已有的基于类属属性的多标记学习算法普遍时间消耗大、分类精度低。为提高多标记分类性能,该文提出一种基于标记密度分类间隔面的组类属属性学习(GLSFL-LDCM)方法。首先,使用余弦相似度构建标记相关性矩阵,通过谱聚类将标记分组以提取各标记组的类属属性,减少计算全部标记类属属性的时间消耗。然后,计算各标记密度以更新标记空间矩阵,将标记密度信息加入原标记中,扩大正负标记的间隔,通过标记密度分类间隔面的方法有效解决标记分布密度不平衡问题。最后,通过将组类属属性和标记密度矩阵输入极限学习机以得到最终分类模型。对比实验充分验证了该文所提算法的可行性与稳定性。

English

    1. [1]

      ZHANG Minling and ZHOU Zhihua. ML-KNN: A lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7): 2038–2048. doi: 10.1016/j.patcog.2006.12.019

    2. [2]

      LIU Yang, WEN Kaiwen, GAO Quanxue, et al. SVM based multi-label learning with missing labels for image annotation[J]. Pattern Recognition, 2018, 78: 307–317. doi: 10.1016/j.patcog.2018.01.022

    3. [3]

      ZHANG Junjie, WU Qi, SHEN Chunhua, et al. Multilabel image classification with regional latent semantic dependencies[J]. IEEE Transactions on Multimedia, 2018, 20(10): 2801–2813. doi: 10.1109/TMM.2018.2812605

    4. [4]

      AL-SALEMI B, AYOB M, and NOAH S A M. Feature ranking for enhancing boosting-based multi-label text categorization[J]. Expert Systems with Applications, 2018, 113: 531–543. doi: 10.1016/j.eswa.2018.07.024

    5. [5]

      ZHANG Minling and ZHOU Zhihua. Multilabel neural networks with applications to functional genomics and text categorization[J]. IEEE transactions on Knowledge and Data Engineering, 2006, 18(10): 1338–1351. doi: 10.1109/TKDE.2006.162

    6. [6]

      GUAN Renchu, WANG Xu, YANG M Q, et al. Multi-label deep learning for gene function annotation in cancer pathways[J]. Scientific Reports, 2018, 8(1): 267. doi: 10.1038/s41598-017-17842-9

    7. [7]

      SAMY A E, EL-BELTAGY S R, and HASSANIEN E. A context integrated model for multi-label emotion detection[J]. Procedia Computer Science, 2018, 142: 61–71. doi: 10.1016/j.procs.2018.10.461

    8. [8]

      ALMEIDA A M G, CERRI R, PARAISO E C, et al. Applying multi-label techniques in emotion identification of short texts[J]. Neurocomputing, 2018, 320: 35–46. doi: 10.1016/j.neucom.2018.08.053

    9. [9]

      TSOUMAKAS G and KATAKIS I. Multi-label classification: An overview[J]. International Journal of Data Warehousing and Mining, 2007, 3(3): 1. doi: 10.4018/jdwm.2007070101

    10. [10]

      ZHANG Minling and ZHOU Zhihua. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819–1837. doi: 10.1109/TKDE.2013.39

    11. [11]

      CRAMMER K, DREDZE M, GANCHEV K, et al. Automatic code assignment to medical text[C]. Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, Stroudsburg, USA, 2007: 129–136.

    12. [12]

      ZHANG Minling and WU Lei. Lift: Multi-label learning with label-specific features[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 107–120. doi: 10.1109/TPAMI.2014.2339815

    13. [13]

      XU Suping, YANG Xibei, YU Hualong, et al. Multi-label learning with label-specific feature reduction[J]. Knowledge-Based Systems, 2016, 104: 52–61. doi: 10.1016/j.knosys.2016.04.012

    14. [14]

      SUN Lu, KUDO M, and KIMURA K. Multi-label classification with meta-label-specific features[C]. Proceedings of 2016 IEEE International Conference on Pattern Recognition, Cancun, Mexico, 2016: 1612–1617. doi: 10.1109/ICPR.2016.7899867.

    15. [15]

      HUANG Jun, LI Guorong, HUANG Qingming, et al. Joint feature selection and classification for multilabel learning[J]. IEEE Transactions on Cybernetics, 2018, 48(3): 876–889. doi: 10.1109/TCYB.2017.2663838

    16. [16]

      WENG Wei, LIN Yaojin, WU Shunxiang, et al. Multi-label learning based on label-specific features and local pairwise label correlation[J]. Neurocomputing, 2018, 273: 385–394. doi: 10.1016/j.neucom.2017.07.044

    17. [17]

      HUANG Jun, LI Guorong, HUANG Qingming, et al. Learning label-specific features and class-dependent labels for multi-label classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(12): 3309–3323. doi: 10.1109/TKDE.2016.2608339

    18. [18]

      HUANG Guangbin, ZHU Qinyu, and SIEW C K. Extreme learning machine: Theory and applications[J]. Neurocomputing, 2006, 70(1/3): 489–501. doi: 10.1016/j.neucom.2005.12.126

    19. [19]

      HUANG Guangbin, ZHOU Hongming, DING Xiaojian, et al. Extreme learning machine for regression and multiclass classification[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) , 2012, 42(2): 513–529. doi: 10.1109/TSMCB.2011.2168604

    20. [20]

      赵小强, 刘晓丽. 基于公理化模糊子集的改进谱聚类算法[J]. 电子与信息学报, 2018, 40(8): 1904–1910. doi: 10.11999/IEIT170904
      ZHAO Xiaoqiang and LIU Xiaoli. An improved spectral clustering algorithm based on axiomatic fuzzy set[J]. Journal of Electronics &Information Technology, 2018, 40(8): 1904–1910. doi: 10.11999/IEIT170904

    21. [21]

      BOYD S, PARIKH N, CHU E, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers[J]. Foundations and Trends® in Machine learning, 2010, 3(1): 1–122. doi: 10.1561/2200000016

    22. [22]

      LIU Xinwang, WANG Lei, HUANG Guangbin, et al. Multiple kernel extreme learning machine[J]. Neurocomputing, 2015, 149: 253–264. doi: 10.1016/j.neucom.2013.09.072

    23. [23]

      邓万宇, 郑庆华, 陈琳, 等. 神经网络极速学习方法研究[J]. 计算机学报, 2010, 33(2): 279–287. doi: 10.3724/SP.J.1016.2010.00279
      DENG Wanyu, ZHENG Qinghua, CHEN Lin, et al. Research on extreme learning of neural networks[J]. Chinese Journal of Computers, 2010, 33(2): 279–287. doi: 10.3724/SP.J.1016.2010.00279

    24. [24]

      ZHOU Zhihua, ZHANG Minling, HUANG Shengjun, et al. Multi-instance multi-label learning[J]. Artificial Intelligence, 2012, 176(1): 2291–2320. doi: 10.1016/j.artint.2011.10.002

    25. [25]

      PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation[C]. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, USA, 2002: 311–318. doi: 10.3115/1073083.1073135.

    1. [1]

      吴超, 李雅倩, 张亚茹, 刘彬. 用于表示级特征融合与分类的相关熵融合极限学习机. 电子与信息学报,

    2. [2]

      张文博, 姬红兵. 融合极限学习机. 电子与信息学报,

    3. [3]

      刘政怡, 徐天泽. 基于优化的极限学习机和深度层次的RGB-D显著检测. 电子与信息学报,

    4. [4]

      刘彬, 杨有恒, 赵志彪, 吴超, 刘浩然, 闻岩. 一种基于正则优化的批次继承极限学习机算法. 电子与信息学报,

    5. [5]

      李佩佳, 石勇, 汪华东, 牛凌峰. 基于有序编码的核极限学习顺序回归模型. 电子与信息学报,

    6. [6]

      宋相法, 焦李成. 基于稀疏编码和集成学习的多示例多标记图像分类方法. 电子与信息学报,

    7. [7]

      芮兰兰, 李钦铭. 基于组合模型的短时交通流量预测算法. 电子与信息学报,

    8. [8]

      刘忠宝, 王士同. 基于熵理论和核密度估计的最大间隔学习机. 电子与信息学报,

    9. [9]

      陈鸿昶, 谢天, 高超, 李邵梅, 黄瑞阳. 候选标记信息感知的偏标记学习算法. 电子与信息学报,

    10. [10]

      黄成泉, 王士同, 蒋亦樟, 董爱美. v-软间隔罗杰斯特回归分类机. 电子与信息学报,

    11. [11]

      苏一丹, 李若愚, 覃华, 陈琴. K插值单纯形法核极限学习机的研究. 电子与信息学报,

    12. [12]

      徐涛, 郭威, 吕宗磊. 基于快速极限学习机和差分进化的机场噪声预测模型. 电子与信息学报,

    13. [13]

      包红强, 张兆扬. 基于时空标记场最大后验概率的多视频对象分割算法. 电子与信息学报,

    14. [14]

      骆振兴, 楼才义, 陈仕川, 李少伟. 基于最大分类间隔SVDD算法的辐射源个体确认. 电子与信息学报,

    15. [15]

      苏扬, 徐展琦, 刘增基. MPLS over WDM光互联网中多优先级标记交换路径路由算法研究. 电子与信息学报,

    16. [16]

      裴继红, 杨烜. 具有渐进局部学习特性的多色Voronoi分类器设计. 电子与信息学报,

    17. [17]

      吴佳妮, 陈永光, 代大海, 陈思伟, 王雪松. 基于快速密度搜索聚类算法的极化HRRP分类方法. 电子与信息学报,

    18. [18]

      刘正蓝, 朱淼良, 董亚波. 区分服务中一种TCP友好的公平数据包标记算法. 电子与信息学报,

    19. [19]

      周先存, 黎明曦, 陈振伟, 徐英来, 熊焰, 李瑞霞. 基于层次混合的高效概率包标记WSNs节点定位算法. 电子与信息学报,

    20. [20]

      韩振中, 陈后金, 李居朋, 姚畅, 程琳. 基于标记脉冲耦合神经网络的乳腺肿块分层检测方法. 电子与信息学报,

  • 图 1  标记密度间隔曲面

    图 2  算法性能比较

    图 3  类属属性提取系数矩阵对比

    表 1  标记空间虚拟数据集

    标记编号原标记密度标记
    Y1Y2Y3Y4Y1Y2Y3Y4
    1+1–1–1+1+1.333–1.273–1.318+1.278
    2+1–1–1–1+1.333–1.273–1.318–1.227
    3–1+1–1–1–1.182+1.222–1.318–1.227
    4+1–1–1+1+1.333–1.273–1.318+1.278
    5–1–1+1+1–1.182–1.273+1.167+1.278
    6+1–1+1–1+1.333–1.273+1.167–1.227
    7+1+1–1+1+1.333+1.222–1.318+1.278
    8–1+1–1–1–1.182+1.222–1.318–1.227
    9+1–1–1+1+1.333–1.273–1.318+1.278
    10–1+1+1–1–1.182+1.222+1.167–1.227
    下载: 导出CSV

    表 2  GLSFL-LDCM算法步骤

     输入:训练数据集$D = \left\{ {{{{x}}_i},{{{Y}}_i}} \right\}_{i = 1}^N$,测试数据集
     ${D^*} = \left\{ {{{x}}_j^*} \right\}_{j = 1}^{{N^*}}$,RBF核参数γ,惩罚因子C,类属属性参数:
     α, β, μ,聚类数K
     输出:预测标记Y*.
     Training: training data set D
     (1) 用式(1)、式(2)计算余弦相似度,构造标记相关性矩阵LC
     (2) 用式(3)谱聚类将标记分组:G=[G1,G2, ···, GK]
     (3) 用式(5)、式(6)构建类属属性提取矩阵S
     (4) 通过式(7)、式(8)更新标记空间,构造标记密度矩阵:YD
     (5) For k = 1, 2, ···, K do
     ${{\varOmega}} _{{\rm{ELM}}}^k = {{{\varOmega}} _{{\rm{ELM}}}}({{x}}(:,{{{S}}^k} \ne 0))$
     ${\bf{YD}}{^k} = {\bf{YD}}({{{G}}_k})$
     ${ {{\beta} } ^k} = {\left(\dfrac{ {{I} } }{C} + {{\varOmega} } _{ {\rm{ELM} } }^k\right)^{ - 1} }{\bf{YD} }{^k}$
     Prediction: testing data set D*
     1) For k = 1, 2, ···, K do
     ${{G}}_k^* = {{{\varOmega}} _{{\rm{ELM}}}}({{{x}}^*}(:,{{{S}}^k} \ne 0)){{{\beta}} ^k}$
     2) ${{{Y}}^*} = \left[ {{{G}}_1^*,{{G}}_2^*,...,{{G}}_K^*} \right]$
    下载: 导出CSV

    表 3  多标记数据描述

    数据集样本数特征数标记数标记基数应用领域
    Emotions1)5937261.869MUSIC
    Genbase1)6621186271.252BIOLOGY
    Medical1)9781449451.245TEXT
    Enron3)17021001534.275TEXT
    Image2)200029451.236IMAGE
    Scene1)240729461.074IMAGE
    Yeast1)2417103144.237BIOLOGY
    Slashdot3)37821079220.901TEXT
    下载: 导出CSV

    表 4  对比算法实验结果

    数据集ML-kNNLIFTFRS-LIFTFRS-SS-LIFTLLSF-DLGLSFL-LDCM
    HL
    Emotions0.1998±0.0167●0.1854±0.0260●0.1798±0.0290●0.1809±0.0310●0.2035±0.0082●0.1782±0.0154
    Genbase0.0043±0.0017●0.0011±0.0016●0.0015±0.0009●0.0017±0.0011●0.0008±0.0014●0.0006±0.0005
    Medical0.0158±0.0015●0.0115±0.0013●0.0087±0.00140.0089±0.00130.0092±0.00040.0089±0.0021
    Enron0.0482±0.0043●0.0365±0.0034○0.0341±0.00320.0372±0.0034○0.0369±0.0034○0.0468±0.0021
    Image0.1701±0.0141●0.1567±0.0136●0.1479±0.0103●0.1468±0.0097●0.1828±0.0152●0.1397±0.0133
    Scene0.0852±0.0060●0.0772±0.0047●0.0740±0.0052●0.0751±0.0057●0.1008±0.0059●0.0682±0.0084
    Yeast0.1934±0.0116●0.1919±0.0083●0.1875±0.0114●0.1869±0.0111●0.2019±0.0060●0.1855±0.0079
    Slashdot0.0221±0.0010●0.0159±0.0009○0.0159±0.0011○0.0160±0.0011○0.0158±0.00120.0196±0.0010
    win/tie/loss8/0/06/0/25/0/35/1/25/1/2
    数据集ML-kNNLIFTFRS-LIFTFRS-SS-LIFTLLSF-DLGLSFL-LDCM
    OE
    Emotions0.2798±0.0441●0.2291±0.0645●0.2155±0.06080.2223±0.0651●0.2583±0.0201●0.2157±0.0507
    Genbase0.0121±0.0139●0.0015±0.00470.0015±0.00470.0030±0.0094●0.0000±0.00000.0015±0.0048
    Medical0.2546±0.0262●0.1535±0.0258●0.1124±0.02790.1186±0.0231○0.1285±0.0271●0.1226±0.0383
    Enron0.5158±0.0417●0.4279±0.0456●0.3084±0.0444●0.3256±0.0437●0.2704±0.0321●0.2221±0.0227
    Image0.3195±0.0332●0.2680±0.0256●0.2555±0.0334●0.2490±0.0226●0.3180±0.0326●0.2365±0.0224
    Scene0.2185±0.0313●0.1924±0.0136●0.1841±0.0156●0.1836±0.0195●0.2323±0.0267●0.1562±0.0316
    Yeast0.2251±0.0284●0.2177±0.0255●0.2147±0.0171●0.2085±0.0156●0.2267±0.0239●0.2072±0.0250
    Slashdot0.0946±0.0143●0.0898±0.0134●0.0858±0.01620.0864±0.0138○0.0887±0.0123●0.0874±0.0107
    win/tie/loss8/0/07/1/04/2/26/0/27/0/1
    数据集ML-kNNLIFTFRS-LIFTFRS-SS-LIFTLLSF-DLGLSFL-LDCM
    RL
    Emotions0.1629±0.0177●0.1421±0.0244●0.1401±0.0299●0.1406±0.0280●0.1819±0.0166●0.1375±0.0226
    Genbase0.0062±0.0082●0.0034±0.0065●0.0043±0.0071●0.0051±0.0077●0.0071±0.0031●0.0017±0.0025
    Medical0.0397±0.0093●0.0262±0.0072●0.0248±0.0108●0.0236±0.0074●0.0218±0.0080●0.0148±0.0096
    Enron0.1638±0.0222●0.1352±0.0190●0.0953±0.0107●0.1046±0.0099●0.0927±0.0069●0.0735±0.0084
    Image0.1765±0.0202●0.1425±0.0169●0.1378±0.0149●0.1323±0.0171●0.1695±0.0162●0.1294±0.0127
    Scene0.0760±0.0100●0.0604±0.0047●0.0601±0.0061●0.0592±0.0072●0.0803±0.0133●0.0515±0.0093
    Yeast0.1666±0.0149●0.1648±0.0121●0.1588±0.0150●0.1560±0.0138●0.1716±0.0145●0.1551±0.0100
    Slashdot0.0497±0.0072●0.0418±0.0062●0.0289±0.0038●0.0311±0.0038●0.0307±0.0058●0.0126±0.0018
    win/tie/loss8/0/08/0/08/0/08/0/08/0/0
    数据集ML-kNNLIFTFRS-LIFTFRS-SS-LIFTLLSF-DLGLSFL-LDCM
    AP
    Emotions0.7980±0.0254●0.8236±0.0334●0.8280±0.0411●0.8268±0.0400●0.7504±0.0120●0.8316±0.0265
    Genbase0.9873±0.0121●0.9958±0.0078●0.9944±0.0078●0.9935±0.0085●0.9928±0.0024●0.9962±0.0057
    Medical0.8068±0.0248●0.8784±0.0145●0.9096±0.0176●0.9087±0.0155●0.9028±0.0172●0.9122±0.0281
    Enron0.5134±0.0327●0.5620±0.0321●0.6611±0.0408●0.6481±0.0287●0.6632±0.0182●0.6923±0.0159
    Image0.7900±0.0203●0.8240±0.0169●0.8314±0.0177●0.8364±0.0162●0.7943±0.0177●0.8444±0.0118
    Scene0.8687±0.0164●0.8884±0.0081●0.8913±0.0084●0.8921±0.0101●0.8609±0.0182●0.9082±0.0173
    Yeast0.7659±0.0194●0.7685±0.0148●0.7762±0.0172●0.7790±0.0167●0.7633±0.0160●0.7798±0.0140
    Slashdot0.8835±0.0116●0.8927±0.0091●0.9045±0.0098●0.9038±0.0074●0.9017±0.0095●0.9247±0.0059
    win/tie/loss8/0/08/0/08/0/08/0/08/0/0
    下载: 导出CSV

    表 5  各算法的时耗对比(s)

    数据集123456
    Emotions0.20.454.08.70.10.1
    Genbase1.02.915.01.70.90.2
    Medical4.312.566.314.82.30.4
    Enron6.548.11292.7182.70.60.6
    Image3.48.11805.2320.50.10.2
    Scene5.47.92174.1404.20.10.2
    Yeast3.544.313113.43297.70.20.3
    Slashdot34.184.511895.52650.01.10.8
    平均7.326.13802.0860.00.70.4
    下载: 导出CSV

    表 6  模型分解对比实验

    数据集KELMLSFL-KELMGLSFL-KELMLDCM-KELM
    HL
    Emotions0.1840±0.02750.1837±0.02530.1824±0.01960.1802±0.0295
    Genbase0.0010±0.00080.0008±0.00050.0006±0.00060.0007±0.0006
    Medical0.0094±0.00300.0093±0.00170.0091±0.00160.0092±0.0019
    Scene0.0706±0.00510.0693±0.00790.0683±0.00590.0682±0.0062
    数据集KELMLSFL-KELMGLSFL-KELMLDCM-KELM
    AP
    Emotions0.8144±0.03690.8223±0.02520.8296±0.02780.8306±0.0429
    Genbase0.9926±0.00460.9928±0.00480.9961±0.00460.9956±0.0038
    Medical0.9077±0.02620.9092±0.02290.9124±0.02050.9126±0.0306
    Scene0.9010±0.01270.9024±0.01860.9059±0.01320.9033±0.0152
    下载: 导出CSV
  • 加载中
图(3)表(6)
计量
  • PDF下载量:  2
  • 文章访问数:  263
  • HTML全文浏览量:  51
文章相关
  • 通讯作者:  程玉胜, chengyshaq@163.com
  • 收稿日期:  2019-05-18
  • 录用日期:  2019-09-30
  • 网络出版日期:  2020-01-29
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

/

返回文章