高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于快速地标采样的大规模谱聚类算法

叶茂 刘文芬

叶茂, 刘文芬. 基于快速地标采样的大规模谱聚类算法[J]. 电子与信息学报, 2017, 39(2): 278-284. doi: 10.11999/JEIT160260
引用本文: 叶茂, 刘文芬. 基于快速地标采样的大规模谱聚类算法[J]. 电子与信息学报, 2017, 39(2): 278-284. doi: 10.11999/JEIT160260
YE Mao, LIU Wenfen. Large Scale Spectral Clustering Based on Fast Landmark Sampling[J]. Journal of Electronics and Information Technology, 2017, 39(2): 278-284. doi: 10.11999/JEIT160260
Citation: YE Mao, LIU Wenfen. Large Scale Spectral Clustering Based on Fast Landmark Sampling[J]. Journal of Electronics and Information Technology, 2017, 39(2): 278-284. doi: 10.11999/JEIT160260

基于快速地标采样的大规模谱聚类算法

doi: 10.11999/JEIT160260
基金项目: 

国家973计划(2012CB315905), 国家自然科学基金(61502527, 61379150)

Large Scale Spectral Clustering Based on Fast Landmark Sampling

Funds: 

The National 973 Program of China (2012CB315905), The National Natural Science Foundation of China (61502527, 61379150)

  • 摘要: 为避免传统谱聚类算法高复杂度的应用局限,基于地标表示的谱聚类算法利用地标点与数据集各点间的相似度矩阵,有效降低了谱嵌入的计算复杂度。在大数据集情况下,现有的随机抽取地标点的方法会影响聚类结果的稳定性,k均值中心点方法面临收敛时间未知、反复读取数据的问题。该文将近似奇异值分解应用于基于地标点的谱聚类,设计了一种快速地标点采样算法。该算法利用由近似奇异向量矩阵行向量的长度计算的抽样概率来进行抽样,同随机抽样策略相比,保证了聚类结果的稳定性和精度,同k均值中心点策略相比降低了算法复杂度。同时从理论上分析了抽样结果对原始数据的信息保持性,并对算法的性能进行了实验验证。
  • [1] 何清, 李宁, 罗文娟, 等. 大数据下的机器学习算法综述[J]. 模式识别与人工智能, 2014, 27(4): 327-336.
    [2] HE Qing, LI Ning, LUO Wenjuan, et al. A survey of machine learning algorithms for big data[J]. Pattern Recognition and Artificial Intelligence, 2014, 27(4): 327-336.
    [3] DING S, JIA H, ZHANG L, et al. Research of semi-supervised spectral clustering algorithm based on pairwise constraints[J]. Neural Computing and Applications, 2014, 24(1): 211-219. doi:  10.1007/s00521-012-1207-8.
    [4] NG A Y, JORDAN M I, and WEISS Y. On spectral clustering: Analysis and an algorithm[C]. Neural Information Processing Systems: Natural and Synthetic, Vancouver, Canada, 2001: 849-856.
    [5] FOWLKES C, BELONGIE S, CHUNG F, et al. Spectral grouping using the Nystrom method[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(2): 214-225. doi:  10.1109/TPAMI.2004.1262185.
    [6] LI M, KWOK J T, and LU B L. Making large-scale Nystrm approximation possible[C]. Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 2010: 631-638.
    [7] LI M, BI W, KWORK J T, et al. Large-scale Nystrm kernel matrix approximation using randomized SVD[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(1): 152-164. doi:  10.1109/TNNLS.2014.2359798.
    [8] 丁世飞, 贾洪杰, 史忠植. 基于自适应Nystrm 采样的大数据谱聚类算法[J]. 软件学报, 2014, 25(9): 2037-2049. doi:  10.13328/j.cnki.jos.004643.
    [9] DING Shifei, JIA Hongjie, and SHI Zhongzhi. Spectral clustering algorithm based on adaptive Nystrm sampling for big data analysis[J]. Journal of Software, 2014, 25(9): 2037-2049. doi:  10.13328/j.cnki.jos.004643.
    [10] YAN D, HUANG L, and JORDAN M I. Fast approximate spectral clustering[C]. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 2009: 907-916. doi:  10.1145/1557019.1557118.
    [11] CHEN X and CAI D. Large scale spectral clustering with landmark-based representation[C]. Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 2011: 313-318.
    [12] CAI D and CHEN X. Large scale spectral clustering via landmark-based sparse representation[J]. IEEE Transactions on Cybernetics, 2015, 45(8): 1669-1680. doi: 10.1109/TCYB. 2014.2358564.
    [13] BOUTSIDIS C, ZOUZIAS A, MAHONEY M W, et al. Randomized dimensionality reduction for-means clustering[J]. IEEE Transactions on Information Theory, 2015, 61(2): 1045-1062. doi:  10.1109/TIT.2014.2375327.
    [14] COHEN M, ELDER S, MUSCO C, et al. Dimensionality reduction for k-means clustering and low rank approximation[C]. Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, Portland, OR, USA, 2015: 163-172. doi:  10.1145/2746539.2746569.
    [15] KHOA N L D and CHAWLA S. A scalable approach to spectral clustering with SDD solvers[J]. Journal of Intelligent Information Systems, 2015, 44(2): 289-308. doi: 10.1007/ s10844-013-0285-0.
    [16] FRIEZE A, KANNAN R, and VEMPALA S. Fast Monte-Carlo algorithms for finding low-rank approximations[C]. Proceedings of the 39th Annual Symposium on Foundations of Computer Science, Palo Alto, California, USA, 1998: 370-378. doi: 10.1109/SFCS. 1998.743487.
    [17] DRINEAS P, MAHONEY M W, and MUTHUKRISHNAN S. Sampling algorithms for l2 regression and applications[C]. Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, Miami, Florida, USA, 2006: 1127-1136.
    [18] DRINES P, MAHONEY M W, and MUTHUKRISHNAN S. Subspace sampling and relative-error matrix approximation: Column-based methods[C]. 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems and 10th International Workshop on Randomization and Computation, Barcelona, Spain, 2006: 316-326. doi:  10.1007/11830924_30.
    [19] BOUTSIDIS C, DRINEAS P, and MAGDON-ISMAIL M. Near-optimal column-based matrix reconstruction [J]. SIAM Journal on Computing, 2014, 43(2): 687-717. doi:  10.1137/12086755X.
    [20] HALKO N, MARTINSSON P G, and TROPP J A. Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions[J]. SIAM Review, 2011, 53(2): 217-288. doi:  10.1137/090771806.
    [21] SARLOIS T. Improved approximation algorithms for large matrices via random projections[C]. Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, Berkeley, California, USA, 2006: 143-152. doi:  10.1109/FOCS.2006.37.
    [22] CHEN W Y, SONG Y, BAI H, et al. Parallel spectral clustering in distributed systems[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(3): 568-586. doi:  10.1109/TPAMI.2010.88.
    [23] AFAHAD A, ALSHATRI N, TARI Z, et al. A survey of clustering algorithms for big data: Taxonomy and empirical analysis[J]. IEEE Transactions on Emerging Topics in Computing, 2014, 2(3): 267-279. doi: 10.1109/TETC. 2014.2330519.
    [24] STREHL A and GHOSH J. Cluster ensemblesA knowledge reuse framework for combining multiple partitions[J]. The Journal of Machine Learning Research, 2003, 3: 583-617. doi:  10.1162/153244303321897735.
  • [1] 李旻, 何婷婷.  基于随机数三角阵映射的高维大数据二分聚类初始中心高效鲁棒生成算法, 电子与信息学报. doi: 10.11999/JEIT200043
    [2] 季正燕, 陈辉, 张佳佳, 李帅, 陆晓飞.  一种基于奇异值分解的解相干算法, 电子与信息学报. doi: 10.11999/JEIT161157
    [3] 李海洋, 王恒远.  基于TL1范数约束的子空间聚类方法, 电子与信息学报. doi: 10.11999/JEIT170193
    [4] 胡亚慧, 李石君, 余伟, 杨莎, 方其庆.  基于时空感知的用户角色推理, 电子与信息学报. doi: 10.11999/JEIT150700
    [5] 查翔, 倪世宏, 张鹏.  一类非线性信号去噪的奇异值分解有效迭代方法, 电子与信息学报. doi: 10.11999/JEIT141605
    [6] 罗恩韬, 王国军.  大数据中一种基于语义特征阈值的层次聚类方法, 电子与信息学报. doi: 10.11999/JEIT150422
    [7] 姜丽敏, 陈曙暄, 向茂生.  奇异值分解在InSAR区域网连接点检测中的应用, 电子与信息学报. doi: 10.3724/SP.J.1146.2011.00487
    [8] 钱叶魁, 陈鸣.  基于奇异值分解更新的多元在线异常检测方法, 电子与信息学报. doi: 10.3724/SP.J.1146.2009.01342
    [9] 张飞艳, 谢伟, 陈荣元, 秦前清.  基于视觉加权的奇异值分解压缩图像质量评价测度, 电子与信息学报. doi: 10.3724/SP.J.1146.2009.00577
    [10] 吴锐, 黄剑华, 唐降龙, 刘家锋.  基于灰度直方图和谱聚类的文本图像二值化方法, 电子与信息学报. doi: 10.3724/SP.J.1146.2008.01283
    [11] 张丽艳, 殷福亮.  一种改进的奇异值分解语音增强方法, 电子与信息学报. doi: 10.3724/SP.J.1146.2005.01655
    [12] 郭伟, 王士同, 程科, 韩斌.  视觉采样聚类方法VSC, 电子与信息学报.
    [13] 曹林, 刘小军, 邹谋炎.  基于分形编码和奇异值分解的混合人脸识别方法, 电子与信息学报.
    [14] 陈希信, 黄银河.  基于矩阵奇异值分解的高频雷达瞬态干扰抑制, 电子与信息学报.
    [15] 王文胜, 陈伏兵, 杨静宇.  一种基于奇异值分解的特征抽取方法, 电子与信息学报.
    [16] 姜园, 张朝阳, 仇佩亮, 周东方.  用于数据挖掘的聚类算法, 电子与信息学报.
    [17] 朱孝龙, 张贤达.  基于奇异值分解的超定盲信号分离, 电子与信息学报.
    [18] 李洁, 高新波, 焦李成.  一种基于GA的混合属性特征大数据集聚类算法, 电子与信息学报.
    [19] 孙丽萍, 胡光锐.  直接序列扩频通信中窄带干扰抑制的奇异值分解方法, 电子与信息学报.
    [20] 冯大政, 保铮, 焦李成.  用于奇异值分解的全并行神经网络, 电子与信息学报.
  • 加载中
  • 计量
    • 文章访问数:  841
    • HTML全文浏览量:  101
    • PDF下载量:  502
    • 被引次数: 0
    出版历程
    • 收稿日期:  2016-03-21
    • 修回日期:  2016-07-18
    • 刊出日期:  2017-02-19

    目录

      /

      返回文章
      返回

      官方微信,欢迎关注