高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种面向多属性不确定数据流的模体发现算法

王菊 刘付显

王菊, 刘付显. 一种面向多属性不确定数据流的模体发现算法[J]. 电子与信息学报, 2017, 39(1): 159-166. doi: 10.11999/JEIT160247
引用本文: 王菊, 刘付显. 一种面向多属性不确定数据流的模体发现算法[J]. 电子与信息学报, 2017, 39(1): 159-166. doi: 10.11999/JEIT160247
WANG Ju, LIU Fuxian. Motif Discovery Algorithm for Multiple Attributes Uncertain Data Stream[J]. Journal of Electronics and Information Technology, 2017, 39(1): 159-166. doi: 10.11999/JEIT160247
Citation: WANG Ju, LIU Fuxian. Motif Discovery Algorithm for Multiple Attributes Uncertain Data Stream[J]. Journal of Electronics and Information Technology, 2017, 39(1): 159-166. doi: 10.11999/JEIT160247

一种面向多属性不确定数据流的模体发现算法

doi: 10.11999/JEIT160247
基金项目: 

国家自然科学基金(61272011)

Motif Discovery Algorithm for Multiple Attributes Uncertain Data Stream

Funds: 

The National Natural Science Foundation of China (61272011)

  • 摘要: 该文针对多属性不确定数据流的频繁模式发现问题,借鉴生物信息学中的模体发现思想,提出了一种基于MEME(Multiple Expectation-maximization for Motif Elicitation)的多属性不确定数据流模体发现算法。该算法根据不确定数据流的特点,设计了基于混合型模型的不确定滑动窗口更新计算方法,改进了SAX(Symbolic Aggregate approXimation)的符号化策略,提出了不同滑动窗口下多属性模体的相似性分析方法。在实验当中,用防空反导情报传感器网络中的一组不确定数据流验证了其功能,通过植入不同数目的模体测试了其发现准确率,并在元组有效概率设置为1的条件下与已有算法进行了比较,结果表明:该算法可以较准确地发现多属性不确定数据流中的频繁模式。
  • LEUNG C K S, JIANG F, and HAYDUK Y. A landmark- model based system for mining frequent patterns from uncertain data streams[C]. 2011 International Database Engineering and Applications Symposium, Lisbon, Portugal, 2011: 249-250. doi: 10.1145/2076623.2076659.
    CHUI C K and KAO B. A decremental approach for mining frequent itemsets from uncertain data[C]. 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Osaka, Japan, 2008: 64-75. doi: 10.1007/978-3-540-68125.
    LEUNG C K S, HAO B, and BRAJCZUK D A. Mining uncertain data for frequent itemsets that satisfy aggregate constraints[C]. 25th Annual ACM Symposium on Applied Computing, Sierre, Switzerland, 2010: 1034-1038. doi: 10.1145/1774088.1774305.
    LEUNG C K S and HAO B. Mining of frequent items from streams of uncertain data[C]. 25th IEEE International Conference on Data Engineering, Piscataway, NJ, USA, 2009: 1663-1670. doi: 10.1109/ICDE.2009.157.
    汤克明. 不确定数据流中频繁数据挖掘[D]. [博士论文], 南京航空航天大学, 2012.
    TANG Keming. Study on frequent data mining from uncertain data streams[D]. [Ph.D. dissertation], Nanjing University of Aeronautics and Astronautics, 2012.
    HEWANADUNGODAGE C, YUNI X, and LEE J J. Hyper-structure mining of frequent patterns in uncertain data streams[J]. Knowledge and Information Systems, 2013, 37: 219-244. doi: 10.1007/s10115-012-0581-y.
    LEUNG C K S, CUZZOCREA A, FAN J, et al. Discovering frequent patterns from uncertain data streams with time-fading and landmark models[J]. Transactions on Large-Scale Data and Knowledge-Centered Systems VIII, 2013: 174-196. doi: 10.1007/978-3-642-37574-3_8.
    朱跃龙, 彭力, 李士进, 等. 水文时间序列模体挖掘[J]. 水利学报, 2012, 43(12): 1422-1430.
    ZHU Yuelong, PENG Li, LI Shijin, et al. Research on hydrological time series mining [J]. Journal of Hydraulic Engineering, 2012, 43(12): 1422-1430.
    张懿璞. 转录因子结合位点识别问题的算法研究[D]. [博士论文], 西安电子科技大学, 2014.
    ZHANG Yipu. Algorithm research on the problem of transcription factor binging sites identification[D]. [Ph.D. dissertation], Xidian University, 2014.
    杨矫云. 大规模生物序列分析的高性能算法和模型[D]. [博士论文], 中国科学技术大学, 2014.
    YANG Jiaoyun. High performance algorithms and models for large-scale biological sequence analysis[D]. [Ph.D. dissertation], University of Science and Technology of China, 2014.
    LIN J, KEOGH E, PATEL P, et al. Finding motifs in time series[C]. Proceedings of the 2nd Workshop on Temporal Data Mining at KDD, District of Colombia, USA, 2002: 53-68.
    CHIU B, KEOGH E, and LONARDI S. Probabilistic discovery of time series motifs[C]. 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, District of Colombia, USA, 2003: 493-498. doi: 10.1145/956750.956808.
    FERREIRA P G, AZEVEDO P J, SILVA C G, et al. Mining approximate motifs in time series[C]. 9th international conference on Discovery Science, Berlin, Germany, 2006: 89-101 .
    MUEEN A, KEOGH E, ZHU Q, et al. Exact discovery of time series motif[C]. 9th SIAM International Conference on Data Mining 2009, Nevada, USA, 2009: 469-480.
    ABDULLAH M and NIKAN C. Enumeration of time series motifs of all lengths[J]. Knowledge and Information Systems, 2015, 45: 105-132. doi: 10.1007/s10115-014-0793-4.
    张懿璞, 霍红卫, 于强, 等. 用于转录因子结合位点识别的定位投影求精算法[J]. 计算机学报, 2013, 36(12): 2545-2559. doi: 10.3724/SP.J.1016.2013.02545.
    ZHANG Yipu, HUO Hongwei, YU Q, et al. A novel fixed- position projection refinement algorithm for TFBS Identification[J]. Chinese Journal of Computers, 2013, 36(12): 2545-2559. doi: 10.3724/SP.J.1016.2013.02545.
    TIMOTHY L B. DREME: motif discovery in transcription factor ChIP-seq data[J]. Original Paper, 2011, 17(12): 1653-1659. doi: 10.1093/bioinformatics/btr261.
    DANIEL Q and XIE Xiaohui. EXTREME: an online EM algorithm for motif discovery[J]. Original Paper, 2014, 30(12): 1667-1673. doi: 10.1093/bioinformatics/btu093.
    THANH T L T, PENG Liping, DIAO Yanlei, et al. CLARO: modeling and processing uncertain data streams[J]. The VLDB Journal, 2012, 21: 651676. doi: 10.1007/s00778- 011-0261-7.
    ARCHAMBEAU C and VERLEYSEN M. Manifold constrained finite Gaussian mixtures [C]. 8th International Work Conference on Artificial Neural Networks, Berlin, Germany, 2005: 820828.
    MICHELE D. Modeling and querying data series and data streams with uncertainty[D]. [Ph.D. dissertation], University of Trento, 2014.
    HONG Y. On computing the distribution function for the sum of independent and non-identical random indicators [R]. Technical Report, Department of Statistics, Virginia Tech, 2011.
    曲文龙, 张克君, 杨炳儒, 等. 基于奇异事件特征聚类的时间序列符号化方法[J]. 系统工程与电子技术, 2006, 28(8): 11311134.
    QU Wenlong, ZHANG Kejun, YANG Bingru, et al. Time series symbolization based on singular event feature clustering[J]. Systems Engineering and Electronics, 2006, 28(8): 11311134.
    JESSICA L, EAMONN K, LI W, et al. Experiencing SAX: a novel symbolic representation of time series[J]. Data Minning and Knowledge Discovery, 2007, 15: 107144. doi: 10.1007/ s10618-007- 0064-z.
  • 加载中
计量
  • 文章访问数:  743
  • HTML全文浏览量:  64
  • PDF下载量:  476
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-03-17
  • 修回日期:  2016-08-16
  • 刊出日期:  2017-01-19

目录

    /

    返回文章
    返回

    官方微信,欢迎关注