高级搜索

基于标记密度分类间隔面的组类属属性学习

王一宾 裴根生 程玉胜

引用本文: 王一宾, 裴根生, 程玉胜. 基于标记密度分类间隔面的组类属属性学习[J]. 电子与信息学报, 2020, 42(5): 1179-1187. doi: 10.11999/JEIT190343 shu
Citation:  Yibin WANG, Gensheng PEI, Yusheng CHENG. Group-Label-Specific Features Learning Based on Label-Density Classification Margin[J]. Journal of Electronics and Information Technology, 2020, 42(5): 1179-1187. doi: 10.11999/JEIT190343 shu

基于标记密度分类间隔面的组类属属性学习

    作者简介: 王一宾: 男,1970年生,教授,研究方向为多标记学习,机器学习,软件安全等;
    裴根生: 男,1992年生,硕士,研究方向为机器学习,数据挖掘,统计等;
    程玉胜: 男,1969年生,教授,研究方向为数据挖掘,机器学习等
    通讯作者: 程玉胜,chengyshaq@163.com
  • 基金项目: 安徽省高校重点科研项目(KJ2017A352)

摘要: 类属属性学习避免相同属性预测全部标记,是一种提取各标记独有属性进行分类的一种框架,在多标记学习中得到广泛的应用。而针对标记维度较大、标记分布密度不平衡等问题,已有的基于类属属性的多标记学习算法普遍时间消耗大、分类精度低。为提高多标记分类性能,该文提出一种基于标记密度分类间隔面的组类属属性学习(GLSFL-LDCM)方法。首先,使用余弦相似度构建标记相关性矩阵,通过谱聚类将标记分组以提取各标记组的类属属性,减少计算全部标记类属属性的时间消耗。然后,计算各标记密度以更新标记空间矩阵,将标记密度信息加入原标记中,扩大正负标记的间隔,通过标记密度分类间隔面的方法有效解决标记分布密度不平衡问题。最后,通过将组类属属性和标记密度矩阵输入极限学习机以得到最终分类模型。对比实验充分验证了该文所提算法的可行性与稳定性。

English

    1. [1]

      ZHANG Minling and ZHOU Zhihua. ML-KNN: A lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7): 2038–2048. doi: 10.1016/j.patcog.2006.12.019

    2. [2]

      LIU Yang, WEN Kaiwen, GAO Quanxue, et al. SVM based multi-label learning with missing labels for image annotation[J]. Pattern Recognition, 2018, 78: 307–317. doi: 10.1016/j.patcog.2018.01.022

    3. [3]

      ZHANG Junjie, WU Qi, SHEN Chunhua, et al. Multilabel image classification with regional latent semantic dependencies[J]. IEEE Transactions on Multimedia, 2018, 20(10): 2801–2813. doi: 10.1109/TMM.2018.2812605

    4. [4]

      AL-SALEMI B, AYOB M, and NOAH S A M. Feature ranking for enhancing boosting-based multi-label text categorization[J]. Expert Systems with Applications, 2018, 113: 531–543. doi: 10.1016/j.eswa.2018.07.024

    5. [5]

      ZHANG Minling and ZHOU Zhihua. Multilabel neural networks with applications to functional genomics and text categorization[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(10): 1338–1351. doi: 10.1109/TKDE.2006.162

    6. [6]

      GUAN Renchu, WANG Xu, YANG M Q, et al. Multi-label deep learning for gene function annotation in cancer pathways[J]. Scientific Reports, 2018, 8: No. 267. doi: 10.1038/s41598-017-17842-9

    7. [7]

      SAMY A E, EL-BELTAGY S R, and HASSANIEN E. A context integrated model for multi-label emotion detection[J]. Procedia Computer Science, 2018, 142: 61–71. doi: 10.1016/j.procs.2018.10.461

    8. [8]

      ALMEIDA A M G, CERRI R, PARAISO E C, et al. Applying multi-label techniques in emotion identification of short texts[J]. Neurocomputing, 2018, 320: 35–46. doi: 10.1016/j.neucom.2018.08.053

    9. [9]

      TSOUMAKAS G and KATAKIS I. Multi-label classification: An overview[J]. International Journal of Data Warehousing and Mining, 2007, 3(3): No. 1. doi: 10.4018/jdwm.2007070101

    10. [10]

      ZHANG Minling and ZHOU Zhihua. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819–1837. doi: 10.1109/TKDE.2013.39

    11. [11]

      CRAMMER K, DREDZE M, GANCHEV K, et al. Automatic code assignment to medical text[C]. Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, Stroudsburg, USA, 2007: 129–136.

    12. [12]

      ZHANG Minling and WU Lei. Lift: Multi-label learning with label-specific features[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 107–120. doi: 10.1109/TPAMI.2014.2339815

    13. [13]

      XU Suping, YANG Xibei, YU Hualong, et al. Multi-label learning with label-specific feature reduction[J]. Knowledge-Based Systems, 2016, 104: 52–61. doi: 10.1016/j.knosys.2016.04.012

    14. [14]

      SUN Lu, KUDO M, and KIMURA K. Multi-label classification with meta-label-specific features[C]. 2016 IEEE International Conference on Pattern Recognition, Cancun, Mexico, 2016: 1612–1617. doi: 10.1109/ICPR.2016.7899867.

    15. [15]

      HUANG Jun, LI Guorong, HUANG Qingming, et al. Joint feature selection and classification for multilabel learning[J]. IEEE Transactions on Cybernetics, 2018, 48(3): 876–889. doi: 10.1109/TCYB.2017.2663838

    16. [16]

      WENG Wei, LIN Yaojin, WU Shunxiang, et al. Multi-label learning based on label-specific features and local pairwise label correlation[J]. Neurocomputing, 2018, 273: 385–394. doi: 10.1016/j.neucom.2017.07.044

    17. [17]

      HUANG Jun, LI Guorong, HUANG Qingming, et al. Learning label-specific features and class-dependent labels for multi-label classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(12): 3309–3323. doi: 10.1109/TKDE.2016.2608339

    18. [18]

      HUANG Guangbin, ZHU Qinyu, and SIEW C K. Extreme learning machine: Theory and applications[J]. Neurocomputing, 2006, 70(1/3): 489–501. doi: 10.1016/j.neucom.2005.12.126

    19. [19]

      HUANG Guangbin, ZHOU Hongming, DING Xiaojian, et al. Extreme learning machine for regression and multiclass classification[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) , 2012, 42(2): 513–529. doi: 10.1109/TSMCB.2011.2168604

    20. [20]

      赵小强, 刘晓丽. 基于公理化模糊子集的改进谱聚类算法[J]. 电子与信息学报, 2018, 40(8): 1904–1910. doi: 10.11999/JEIT170904
      ZHAO Xiaoqiang and LIU Xiaoli. An improved spectral clustering algorithm based on axiomatic fuzzy set[J]. Journal of Electronics &Information Technology, 2018, 40(8): 1904–1910. doi: 10.11999/JEIT170904

    21. [21]

      BOYD S, PARIKH N, CHU E, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers[J]. Foundations and Trends® in Machine learning, 2010, 3(1): 1–122. doi: 10.1561/2200000016

    22. [22]

      LIU Xinwang, WANG Lei, HUANG Guangbin, et al. Multiple kernel extreme learning machine[J]. Neurocomputing, 2015, 149: 253–264. doi: 10.1016/j.neucom.2013.09.072

    23. [23]

      邓万宇, 郑庆华, 陈琳, 等. 神经网络极速学习方法研究[J]. 计算机学报, 2010, 33(2): 279–287. doi: 10.3724/SP.J.1016.2010.00279
      DENG Wanyu, ZHENG Qinghua, CHEN Lin, et al. Research on extreme learning of neural networks[J]. Chinese Journal of Computers, 2010, 33(2): 279–287. doi: 10.3724/SP.J.1016.2010.00279

    24. [24]

      ZHOU Zhihua, ZHANG Minling, HUANG Shengjun, et al. Multi-instance multi-label learning[J]. Artificial Intelligence, 2012, 176(1): 2291–2320. doi: 10.1016/j.artint.2011.10.002

    25. [25]

      PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation[C]. The 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, USA, 2002: 311–318. doi: 10.3115/1073083.1073135.

    1. [1]

      刘彬, 杨有恒, 赵志彪, 吴超, 刘浩然, 闻岩. 一种基于正则优化的批次继承极限学习机算法. 电子与信息学报, 2020, 42(7): 1734-1742.

    2. [2]

      夏平凡, 倪志伟, 朱旭辉, 倪丽萍. 基于双错测度的极限学习机选择性集成方法. 电子与信息学报, 2020, 42(0): 1-9.

    3. [3]

      张斌, 吴浩明. 一种面向连接的快速多维包分类算法. 电子与信息学报, 2020, 42(6): 1526-1533.

    4. [4]

      姚敏立, 王旭健, 张峰干, 戴定成. 基于动态参数差分进化算法的多约束稀布矩形面阵优化. 电子与信息学报, 2020, 42(5): 1281-1287.

    5. [5]

      王粉花, 赵波, 黄超, 严由齐. 基于多尺度和注意力融合学习的行人重识别. 电子与信息学报, 2020, 42(0): 1-8.

    6. [6]

      张琳, 陈炳均, 吴志强. 基于矩阵低秩估计的可靠多载波差分混沌键控接收机. 电子与信息学报, 2020, 42(0): 1-8.

    7. [7]

      蒋瀚, 刘怡然, 宋祥福, 王皓, 郑志华, 徐秋亮. 隐私保护机器学习的密码学方法. 电子与信息学报, 2020, 42(5): 1068-1078.

    8. [8]

      柳长源, 王琪, 毕晓君. 基于多通道多尺度卷积神经网络的单幅图像去雨方法. 电子与信息学报, 2020, 42(0): 1-8.

    9. [9]

      刘坤, 吴建新, 甄杰, 王彤. 基于阵列天线和稀疏贝叶斯学习的室内定位方法. 电子与信息学报, 2020, 42(5): 1158-1164.

    10. [10]

      李骜, 刘鑫, 陈德运, 张英涛, 孙广路. 基于低秩表示的鲁棒判别特征子空间学习模型. 电子与信息学报, 2020, 42(5): 1223-1230.

    11. [11]

      周牧, 李垚鲆, 谢良波, 蒲巧林, 田增山. 基于多核最大均值差异迁移学习的WLAN室内入侵检测方法. 电子与信息学报, 2020, 42(5): 1149-1157.

    12. [12]

      付晓薇, 杨雪飞, 陈芳, 李曦. 一种基于深度学习的自适应医学超声图像去斑方法. 电子与信息学报, 2020, 42(7): 1782-1789.

    13. [13]

      方维维, 刘梦然, 王云鹏, 李阳阳, 安竹林. 面向物联网隐私数据分析的分布式弹性网络回归学习算法. 电子与信息学报, 2020, 42(0): 1-9.

    14. [14]

      陈怡, 唐迪, 邹维. 基于深度学习的Android恶意软件检测:成果与挑战. 电子与信息学报, 2020, 42(0): 1-13.

    15. [15]

      陈卓, 冯钢, 何颖, 周杨. 运营商网络中基于深度强化学习的服务功能链迁移机制. 电子与信息学报, 2020, 42(0): 1-7.

    16. [16]

      唐伦, 曹睿, 廖皓, 王兆堃. 基于深度强化学习的服务功能链可靠部署算法. 电子与信息学报, 2020, 42(0): 1-8.

    17. [17]

      陈前斌, 管令进, 李子煜, 王兆堃, 杨恒, 唐伦. 基于深度强化学习的异构云无线接入网自适应无线资源分配算法. 电子与信息学报, 2020, 42(6): 1468-1477.

    18. [18]

      邵鸿翔, 孙有铭, 蔡佶昊. 面向用户体验的多小区混合非正交多址接入网络资源分配方法. 电子与信息学报, 2020, 42(0): 1-8.

    19. [19]

      高东, 梁子林. 基于能量效率的双层非正交多址系统资源优化算法. 电子与信息学报, 2020, 42(5): 1237-1243.

    20. [20]

      雷大江, 张策, 李智星, 吴渝. 基于多流融合生成对抗网络的遥感图像融合方法. 电子与信息学报, 2020, 41(0): 1-8.

  • 图 1  标记密度间隔曲面

    图 2  算法性能比较

    图 3  类属属性提取系数矩阵对比

    表 1  标记空间虚拟数据集

    标记编号原标记密度标记
    Y1Y2Y3Y4Y1Y2Y3Y4
    1+1–1–1+1+1.333–1.273–1.318+1.278
    2+1–1–1–1+1.333–1.273–1.318–1.227
    3–1+1–1–1–1.182+1.222–1.318–1.227
    4+1–1–1+1+1.333–1.273–1.318+1.278
    5–1–1+1+1–1.182–1.273+1.167+1.278
    6+1–1+1–1+1.333–1.273+1.167–1.227
    7+1+1–1+1+1.333+1.222–1.318+1.278
    8–1+1–1–1–1.182+1.222–1.318–1.227
    9+1–1–1+1+1.333–1.273–1.318+1.278
    10–1+1+1–1–1.182+1.222+1.167–1.227
    下载: 导出CSV

    表 2  GLSFL-LDCM算法步骤

     输入:训练数据集$D = \left\{ {{{{x}}_i},{{{Y}}_i}} \right\}_{i = 1}^N$,测试数据集
     ${D^*} = \left\{ {{{x}}_j^*} \right\}_{j = 1}^{{N^*}}$,RBF核参数γ,惩罚因子C,类属属性参数:
     α, β, μ,聚类数K
     输出:预测标记Y*.
     Training: training data set D
     (1) 用式(1)、式(2)计算余弦相似度,构造标记相关性矩阵LC
     (2) 用式(3)谱聚类将标记分组:G=[G1,G2, ···, GK]
     (3) 用式(5)、式(6)构建类属属性提取矩阵S
     (4) 通过式(7)、式(8)更新标记空间,构造标记密度矩阵:YD
     (5) For k = 1, 2, ···, K do
     ${{\varOmega}} _{{\rm{ELM}}}^k = {{{\varOmega}} _{{\rm{ELM}}}}({{x}}(:,{{{S}}^k} \ne 0))$
     ${\bf{YD}}{^k} = {\bf{YD}}({{{G}}_k})$
     ${ {{\beta} } ^k} = {\left(\dfrac{ {{I} } }{C} + {{\varOmega} } _{ {\rm{ELM} } }^k\right)^{ - 1} }{\bf{YD} }{^k}$
     Prediction: testing data set D*
     (a) For k = 1, 2, ···, K do
     ${{G}}_k^* = {{{\varOmega}} _{{\rm{ELM}}}}({{{x}}^*}(:,{{{S}}^k} \ne 0)){{{\beta}} ^k}$
     (b) ${{{Y}}^*} = \left[ {{{G}}_1^*,{{G}}_2^*,...,{{G}}_K^*} \right]$
    下载: 导出CSV

    表 3  多标记数据描述

    数据集样本数特征数标记数标记基数应用领域
    Emotions1)5937261.869MUSIC
    Genbase1)6621186271.252BIOLOGY
    Medical1)9781449451.245TEXT
    Enron3)17021001534.275TEXT
    Image2)200029451.236IMAGE
    Scene1)240729461.074IMAGE
    Yeast1)2417103144.237BIOLOGY
    Slashdot3)37821079220.901TEXT
    下载: 导出CSV

    表 4  对比算法实验结果

    数据集ML-kNNLIFTFRS-LIFTFRS-SS-LIFTLLSF-DLGLSFL-LDCM
    HL↓
    Emotions0.1998±0.0167●0.1854±0.0260●0.1798±0.0290●0.1809±0.0310●0.2035±0.0082●0.1782±0.0154
    Genbase0.0043±0.0017●0.0011±0.0016●0.0015±0.0009●0.0017±0.0011●0.0008±0.0014●0.0006±0.0005
    Medical0.0158±0.0015●0.0115±0.0013●0.0087±0.00140.0089±0.00130.0092±0.00040.0089±0.0021
    Enron0.0482±0.0043●0.0365±0.0034○0.0341±0.00320.0372±0.0034○0.0369±0.0034○0.0468±0.0021
    Image0.1701±0.0141●0.1567±0.0136●0.1479±0.0103●0.1468±0.0097●0.1828±0.0152●0.1397±0.0133
    Scene0.0852±0.0060●0.0772±0.0047●0.0740±0.0052●0.0751±0.0057●0.1008±0.0059●0.0682±0.0084
    Yeast0.1934±0.0116●0.1919±0.0083●0.1875±0.0114●0.1869±0.0111●0.2019±0.0060●0.1855±0.0079
    Slashdot0.0221±0.0010●0.0159±0.0009○0.0159±0.0011○0.0160±0.0011○0.0158±0.00120.0196±0.0010
    win/tie/loss8/0/06/0/25/0/35/1/25/1/2
    数据集ML-kNNLIFTFRS-LIFTFRS-SS-LIFTLLSF-DLGLSFL-LDCM
    OE↓
    Emotions0.2798±0.0441●0.2291±0.0645●0.2155±0.06080.2223±0.0651●0.2583±0.0201●0.2157±0.0507
    Genbase0.0121±0.0139●0.0015±0.00470.0015±0.00470.0030±0.0094●0.0000±0.00000.0015±0.0048
    Medical0.2546±0.0262●0.1535±0.0258●0.1124±0.02790.1186±0.0231○0.1285±0.0271●0.1226±0.0383
    Enron0.5158±0.0417●0.4279±0.0456●0.3084±0.0444●0.3256±0.0437●0.2704±0.0321●0.2221±0.0227
    Image0.3195±0.0332●0.2680±0.0256●0.2555±0.0334●0.2490±0.0226●0.3180±0.0326●0.2365±0.0224
    Scene0.2185±0.0313●0.1924±0.0136●0.1841±0.0156●0.1836±0.0195●0.2323±0.0267●0.1562±0.0316
    Yeast0.2251±0.0284●0.2177±0.0255●0.2147±0.0171●0.2085±0.0156●0.2267±0.0239●0.2072±0.0250
    Slashdot0.0946±0.0143●0.0898±0.0134●0.0858±0.01620.0864±0.0138○0.0887±0.0123●0.0874±0.0107
    win/tie/loss8/0/07/1/04/2/26/0/27/0/1
    数据集ML-kNNLIFTFRS-LIFTFRS-SS-LIFTLLSF-DLGLSFL-LDCM
    RL↓
    Emotions0.1629±0.0177●0.1421±0.0244●0.1401±0.0299●0.1406±0.0280●0.1819±0.0166●0.1375±0.0226
    Genbase0.0062±0.0082●0.0034±0.0065●0.0043±0.0071●0.0051±0.0077●0.0071±0.0031●0.0017±0.0025
    Medical0.0397±0.0093●0.0262±0.0072●0.0248±0.0108●0.0236±0.0074●0.0218±0.0080●0.0148±0.0096
    Enron0.1638±0.0222●0.1352±0.0190●0.0953±0.0107●0.1046±0.0099●0.0927±0.0069●0.0735±0.0084
    Image0.1765±0.0202●0.1425±0.0169●0.1378±0.0149●0.1323±0.0171●0.1695±0.0162●0.1294±0.0127
    Scene0.0760±0.0100●0.0604±0.0047●0.0601±0.0061●0.0592±0.0072●0.0803±0.0133●0.0515±0.0093
    Yeast0.1666±0.0149●0.1648±0.0121●0.1588±0.0150●0.1560±0.0138●0.1716±0.0145●0.1551±0.0100
    Slashdot0.0497±0.0072●0.0418±0.0062●0.0289±0.0038●0.0311±0.0038●0.0307±0.0058●0.0126±0.0018
    win/tie/loss8/0/08/0/08/0/08/0/08/0/0
    数据集ML-kNNLIFTFRS-LIFTFRS-SS-LIFTLLSF-DLGLSFL-LDCM
    AP↑
    Emotions0.7980±0.0254●0.8236±0.0334●0.8280±0.0411●0.8268±0.0400●0.7504±0.0120●0.8316±0.0265
    Genbase0.9873±0.0121●0.9958±0.0078●0.9944±0.0078●0.9935±0.0085●0.9928±0.0024●0.9962±0.0057
    Medical0.8068±0.0248●0.8784±0.0145●0.9096±0.0176●0.9087±0.0155●0.9028±0.0172●0.9122±0.0281
    Enron0.5134±0.0327●0.5620±0.0321●0.6611±0.0408●0.6481±0.0287●0.6632±0.0182●0.6923±0.0159
    Image0.7900±0.0203●0.8240±0.0169●0.8314±0.0177●0.8364±0.0162●0.7943±0.0177●0.8444±0.0118
    Scene0.8687±0.0164●0.8884±0.0081●0.8913±0.0084●0.8921±0.0101●0.8609±0.0182●0.9082±0.0173
    Yeast0.7659±0.0194●0.7685±0.0148●0.7762±0.0172●0.7790±0.0167●0.7633±0.0160●0.7798±0.0140
    Slashdot0.8835±0.0116●0.8927±0.0091●0.9045±0.0098●0.9038±0.0074●0.9017±0.0095●0.9247±0.0059
    win/tie/loss8/0/08/0/08/0/08/0/08/0/0
    下载: 导出CSV

    表 5  各算法的时耗对比(s)

    数据集123456
    Emotions0.20.454.08.70.10.1
    Genbase1.02.915.01.70.90.2
    Medical4.312.566.314.82.30.4
    Enron6.548.11292.7182.70.60.6
    Image3.48.11805.2320.50.10.2
    Scene5.47.92174.1404.20.10.2
    Yeast3.544.313113.43297.70.20.3
    Slashdot34.184.511895.52650.01.10.8
    平均7.326.13802.0860.00.70.4
    下载: 导出CSV

    表 6  模型分解对比实验

    数据集KELMLSFL-KELMGLSFL-KELMLDCM-KELM
    HL↓
    Emotions0.1840±0.02750.1837±0.02530.1824±0.01960.1802±0.0295
    Genbase0.0010±0.00080.0008±0.00050.0006±0.00060.0007±0.0006
    Medical0.0094±0.00300.0093±0.00170.0091±0.00160.0092±0.0019
    Scene0.0706±0.00510.0693±0.00790.0683±0.00590.0682±0.0062
    数据集KELMLSFL-KELMGLSFL-KELMLDCM-KELM
    AP↑
    Emotions0.8144±0.03690.8223±0.02520.8296±0.02780.8306±0.0429
    Genbase0.9926±0.00460.9928±0.00480.9961±0.00460.9956±0.0038
    Medical0.9077±0.02620.9092±0.02290.9124±0.02050.9126±0.0306
    Scene0.9010±0.01270.9024±0.01860.9059±0.01320.9033±0.0152
    下载: 导出CSV
  • 加载中
图(3)表(6)
计量
  • PDF下载量:  26
  • 文章访问数:  1449
  • HTML全文浏览量:  570
文章相关
  • 通讯作者:  程玉胜, chengyshaq@163.com
  • 收稿日期:  2019-05-18
  • 录用日期:  2019-09-30
  • 网络出版日期:  2020-01-29
  • 刊出日期:  2020-05-01
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

/

返回文章