高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于随机森林的流处理检查点性能预测

褚征 于炯

褚征, 于炯. 基于随机森林的流处理检查点性能预测[J]. 电子与信息学报, 2020, 42(6): 1452-1459. doi: 10.11999/JEIT190552
引用本文: 褚征, 于炯. 基于随机森林的流处理检查点性能预测[J]. 电子与信息学报, 2020, 42(6): 1452-1459. doi: 10.11999/JEIT190552
Zheng CHU, Jiong YU. Performance Prediction Based on Random Forest for the Stream Processing Checkpoint[J]. Journal of Electronics and Information Technology, 2020, 42(6): 1452-1459. doi: 10.11999/JEIT190552
Citation: Zheng CHU, Jiong YU. Performance Prediction Based on Random Forest for the Stream Processing Checkpoint[J]. Journal of Electronics and Information Technology, 2020, 42(6): 1452-1459. doi: 10.11999/JEIT190552

基于随机森林的流处理检查点性能预测

doi: 10.11999/JEIT190552
基金项目: 国家自然科学基金(61862060, 61462079, 61562086, 61562078),新疆大学博士生科技创新项目(XJUBSCX-201901)
详细信息
    作者简介:

    褚征:男,1991年生,博士生,研究方向为分布式计算、内存计算和机器学习

    于炯:男,1966年生,教授,研究方向为分布式计算、内存计算和绿色计算

    通讯作者:

    于炯 yujiong@xju.edu.cn

  • 中图分类号: TN919; TP311

Performance Prediction Based on Random Forest for the Stream Processing Checkpoint

Funds: The National Natural Science Foundation of China (61862060, 61462079, 61562086, 61562078), The Doctoral Science, Technology Innovation Project in Xinjiang University (XJUBSCX-201901)
  • 摘要:

    物联网(IoT)的发展引起流数据在数据量和数据类型两方面不断增长。由于实时处理场景的不断增加和基于经验知识的配置策略存在缺陷,流处理检查点配置策略面临着巨大的挑战,如费事费力,易导致系统异常等。为解决这些挑战,该文提出基于回归算法的检查点性能预测方法。该方法首先分析了影响检查点性能的6种特征,然后将训练集的特征向量输入到随机森林回归算法中进行训练,最后,使用训练好的算法对测试数据集进行预测。实验结果表明,与其它机器学习算法相比,随机森林回归算法在CPU密集型基准测试,内存密集型基准测试和网络密集型基准测试上针对检查点性能的预测具有误差低,准确率高和运行高效的优点。

  • 图  1  检查点策略配置不合理示例

    图  2  随机森林算法模型

    图  3  基准测试

    图  4  不同回归算法的预测准确率和不同特征重要性评分

    图  5  不同回归算法的执行效率

    表  1  动态特征总结

    特征名称描述
    本地进入记录数算子每秒接收的本地记录数。
    远程进入记录数算子每秒接收的远程记录数。
    本地缓存记录数算子每秒缓存的本地记录数。
    远程缓存记录数算子每秒缓存的远程记录数。
    下载: 导出CSV

    表  2  数据集描述

    基准测试样本数量特征数量训练样本数量预测样本数量
    CKCPU47100332376809420
    CKMEM1029017282322058
    CKNET18900524151203780
    下载: 导出CSV

    表  3  不同回归算法预测误差结果

    基准测试回归算法MAERMSEMediaAE
    CKCPUSVR poly0.1070061.90002337.921288
    SVR linear0.09500627.0633837.529361
    KNN0.1080060.3238700.286494
    BPNN0.0423800.0700430.129856
    RF0.0401780.0688110.125560
    CKMEMSVR poly0.1150070.03756010.924428
    SVR linear0.1780102.5245964.085918
    KNN0.1480080.3706600.373577
    BPNN0.0973560.1994610.214980
    RF0.0960460.1966190.206272
    CKMEMSVR poly0.0910050.6456190.634070
    SVR linear0.3010170.5458330.523365
    KNN0.1020060.7428730.742375
    BPNN0.0203430.1038570.147659
    RF0.0195010.0893150.089082
    下载: 导出CSV
  • 彭建华, 张帅, 许晓明, 等. 物联网中一种抗大规模天线阵列窃听者的噪声注入方案[J]. 电子与信息学报, 2019, 41(1): 67–73. doi: 10.11999/JEIT180342

    PENG Jianhua, ZHANG Shuai, XU Xiaoming, et al. A noise injection scheme resistant to massive MIMO eavesdropper in IoT[J]. Journal of Electronics &Information Technology, 2019, 41(1): 67–73. doi: 10.11999/JEIT180342
    刘素艳, 刘元安, 吴帆, 等. 物联网中基于相似性计算的传感器搜索[J]. 电子与信息学报, 2018, 40(12): 3020–3027. doi: 10.11999/JEIT171085

    LIU Suyan, LIU Yuan’an, WU Fan, et al. Sensor search based on sensor similarity computing in the Internet of Things[J]. Journal of Electronics &Information Technology, 2018, 40(12): 3020–3027. doi: 10.11999/JEIT171085
    CARBONE P, EWEN S, FÓRA G, et al. State management in Apache Flink®: Consistent stateful distributed stream processing[J]. Proceedings of the VLDB Endowment, 2017, 10(12): 1718–1729. doi: 10.14778/3137765.3137777
    VENKIVOLU D R and NALE M N. Adaptive encryption in checkpoint recovery of file transfers[P]. US, 20190306221, 2019-10-03.
    KIM Y, NAKAMURA J, KATAYAMA Y, et al. A cooperative partial snapshot algorithm for checkpoint-rollback recovery of large-scale and dynamic distributed systems[C]. The 6th International Symposium on Computing and Networking Workshops (CANDARW), Takayama, Japan, 2018: 285–291. doi: 10.1109/CANDARW.2018.00060.
    TAO Yangyang and YU Shucheng. kFHCO: Optimal VM consolidation via k-Factor horizontal checkpoint oversubscription[C]. 2019 International Conference on Computing, Networking and Communications (ICNC), Honolulu, USA, 2019: 380–384. doi: 10.1109/ICCNC.2019.8685604.
    GOUNTIA D and ROY S. Checkpoints assignment on cyber-physical digital microfluidic biochips for early detection of hardware Trojans[C]. The 3rd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 2019: 16–21. doi: 10.1109/ICOEI.2019.8862598.
    ZHANG Hanlin, CHEN Ningjiang, TANG Yusi, et al. Multi-level container checkpoint performance optimization strategy in SDDC[C]. The 4th International Conference on Big Data and Computing, Guangzhou, China, 2019: 253–259. doi: 10.1145/3335484.3335487.
    TITOUNA C, MOUMEN H, and ARI A A A. Cluster head recovery algorithm for wireless sensor networks[C]. The 6th International Conference on Control, Decision and Information Technologies (CoDIT), Paris, France, 2019: 1905–1910. doi: 10.1109/CoDIT.2019.8820414.
    OVENS S and WOELFEL P. Strongly linearizable implementations of snapshots and other types[C]. 2019 ACM Symposium on Principles of Distributed Computing, Toronto, Canada, 2019: 197–206. doi: 10.1145/3293611.3331632.
    ATHEY S, TIBSHIRANI J, WAGER S, et al. Gemeralized random ferests[J]. Annals of statistics, 2019, 47(2): 1148–1178. doi: 10.1214/18-AOS1709
    CHOI J, GU B, CHIN S, et al. Machine learning predictive model based on national data for fatal accidents of construction workers[J]. Automation in Construction, 2020, 110: 102974. doi: 10.1016/j.autcon.2019.102974
    LYU J and MANOOCHEHRI S. Dimensional prediction for FDM machines using artificial neural network and support vector regression[C]. ASME 2019 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. Anaheim, USA, 2019. doi: 10.1115/DETC2019-97963.
    DABERDAKU S, TAVAZZI E, and DI CAMILLO B. Interpolation and K-nearest neighbours combined imputation for longitudinal ICU laboratory data[C]. 2019 IEEE International Conference on Healthcare Informatics (ICHI), Xi’an, China, 2019: 1–3. doi: 10.1109/ICHI.2019.8904624.
    ASAAD R R and ALI R I. Back Propagation Neural Network (BPNN) and sigmoid activation function in multi-layer networks[J]. Academic Journal of Nawroz University, 2019, 8(4): 216–221. doi: 10.25007/ajnu.v8n4a464
  • 加载中
图(5) / 表(3)
计量
  • 文章访问数:  1944
  • HTML全文浏览量:  730
  • PDF下载量:  64
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-07-23
  • 修回日期:  2020-02-17
  • 网络出版日期:  2020-03-10
  • 刊出日期:  2020-06-22

目录

    /

    返回文章
    返回

    官方微信,欢迎关注