高级搜索

一种视角无关的时空关联深度视频行为识别方法

吴培良 杨霄 毛秉毅 孔令富 侯增广

引用本文: 吴培良, 杨霄, 毛秉毅, 孔令富, 侯增广. 一种视角无关的时空关联深度视频行为识别方法[J]. 电子与信息学报, 2019, 41(4): 904-910. doi: 10.11999/JEIT180477 shu
Citation:  Peiliang WU, Xiao YANG, Bingyi MAO, Lingfu KONG, Zengguang HOU. A Perspective-independent Method for Behavior Recognition in Depth Video via Temporal-spatial Correlating[J]. Journal of Electronics and Information Technology, 2019, 41(4): 904-910. doi: 10.11999/JEIT180477 shu

一种视角无关的时空关联深度视频行为识别方法

    作者简介: 吴培良: 男,1981年生,副教授,研究方向为家庭服务机器人行为识别与学习、功用性认知;
    杨霄: 男,1993年生,硕士生,研究方向为家庭服务机器人行为识别;
    毛秉毅: 男,1964年生,副研究员,研究方向为家庭服务机器人;
    孔令富: 男,1957年生,教授,研究方向为智能机器人系统、智能信息处理;
    侯增广: 男,1969年生,研究员,研究方向为机器人与智能系统、康复机器人与微创介入手术机器人
    通讯作者: 毛秉毅,ysdxmby@163.com
  • 基金项目: 国家自然科学基金(61305113),河北省自然科学基金(F2016203358),中国博士后基金(2018M631620),燕山大学博士基金(BL18007)

摘要: 当前行为识别方法在不同视角下的识别准确率较低,该文提出一种视角无关的时空关联深度视频行为识别方法。首先,运用深度卷积神经网络的全连接层将不同视角下的人体姿态映射到与视角无关的高维空间,以构建空间域下深度行为视频的人体姿态模型(HPM);其次,考虑视频序列帧之间的时空相关性,在每个神经元激活的时间序列中分段应用时间等级池化(RP)函数,实现对视频时间子序列的编码;然后,将傅里叶时间金字塔(FTP)算法作用于每一个池化后的时间序列,并加以连接产生最终的时空特征表示;最后,在不同数据集上,基于不同方法进行了行为识别分类测试。实验结果表明,该文方法(HPM+RP+FTP)提高了不同视角下深度视频识别准确率,在UWA3DII数据集中,比现有最好方法高出18%。此外,该文方法具有较好的泛化性能,在MSR Daily Activity3D数据集上得到82.5%的准确率。

English

    1. [1]

      ZHOU Yang, NI Bingbing, HONG Richang, et al. Interaction part mining: A mid-level approach for fine-grained action recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 3323–3331. doi: 10.1109/CVPR.2015.7298953.

    2. [2]

      WANG Jiang, NIE Xiaohan, XIA Yin, et al. Cross-view action modeling, learning, and recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 2649–2656. doi: 10.1109/CVPR.2014.339.

    3. [3]

      LIU Peng and YIN Lijun. Spontaneous thermal facial expression analysis based on trajectory-pooled fisher vector descriptor[C]. IEEE International Conference on Multimedia and Expo, Hong Kong, China, 2017: 835–840. doi: 10.1109/ICME.2017.8019315.

    4. [4]

      YANG Xiaodong and TIAN Yingli. Super normal vector for activity recognition using depth sequences[C]. IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 804–811. doi: 10.1109/CVPR.2014.108.

    5. [5]

      ZHANG Baochang, YANG Yun, CHEN Chen, et al. Action recognition using 3D histograms of texture and a multi-class boosting classifier[J]. IEEE Transactions on Image Processing, 2017, 26(10): 4648–4660. doi: 10.1109/TIP.2017.2718189

    6. [6]

      YIN Xiaochuan and CHEN Qijun. Deep metric learning autoencoder for nonlinear temporal alignment of human motion[C]. IEEE International Conference on Robotics and Automation, Stockholm, Sweden, 2016: 2160–2166. doi: 10.1109/ICRA.2016.7487366.

    7. [7]

      SHAHROUDY A, LIU Jun, NG T, et al. NTU RGB+D: A large scale dataset for 3D human activity analysis[C]. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 1010–1019. doi: 10.1109/CVPR.2016.115.

    8. [8]

      KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks[C]. IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 1725–1732. doi: 10.1109/CVPR.2014.223.

    9. [9]

      HAIDER F, CAMPBELL N, and LUZ S. Active speaker detection in human machine multiparty dialogue using visual prosody information[C]. IEEE Global Conference on Signal and Information Processing, Washington, D.C., USA, 2016: 1207–1211. doi: 10.1109/GlobalSIP.2016.7906033.

    10. [10]

      SIMONYAN K and ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[J]. Advances in Neural Information Processing Systems, 2014, 1(4): 568–576. doi: 10.1002/14651858.CD001941.pub3

    11. [11]

      TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]. IEEE International Conference on Computer Vision, Honolulu, USA, 2015: 4489–4497. doi: 10.1109/ICCV.2015.510.

    12. [12]

      DONAHUE J, HENDRICKS L A, ROHRBACH M, et al. Long-term recurrent convolutional networks for visual recognition and description[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 677–691. doi: 10.1109/TPAMI.2016.2599174

    13. [13]

      GUPTA S, GIRSHICK R, ARBELEZ P, et al. Learning rich features from RGB-D images for object detection and segmentation[C]. European Conference on Computer Vision, Zurich, Switzerland, 2014: 345–360. doi: 10.1007/978-3-319-10584-0_23.

    14. [14]

      FERNANDO B, GAVVES E, ORAMAS J, et al. Rank pooling for action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 773–787. doi: 10.1109/TPAMI.2016.2558148

    15. [15]

      WANG Jiang, LIU Zicheng, WU Ying, et al. Learning actionlet ensemble for 3D human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(5): 914–927. doi: 10.1109/TPAMI.2013.198

    16. [16]

      RAHMANI H and MIAN A. 3D action recognition from novel viewpoints[C]. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 1506–1515. doi: 10.1109/CVPR.2016.167.

    17. [17]

      RAHMANI H and MIAN A. Learning a non-linear knowledge transfer model for cross-view action recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 2458–2466. doi: 10.1109/CVPR.2015.7298860.

    18. [18]

      RAHMANI H, MAHMOOD A, HUYNH D Q, et al. HOPC: Histogram of oriented principal components of 3D pointclouds for action recognition[C]. European Conference on Computer Vision, Zurich, Switzerland, 2014: 742–757. doi: 10.1007/978-3-319-10605-2_48.

    19. [19]

      JALAL A, KAMAL S, and KIM D. A depth video sensor-based life-logging human activity recognition system for elderly care in smart indoor environments[J]. Sensors, 2014, 14(7): 11735–11759. doi: 10.3390/s140711735

    20. [20]

      MULLER M and RODER T. Motion templates for automatic classification and retrieval of motion capture data[C]. ACM Siggraph/eurographics Symposium on Computer Animation, Vienna, Austria, 2006: 137–146. doi: 10.1145/1218064.1218083.

    21. [21]

      WANG Jiang, LIU Zicheng, WU Ying, et al. Mining actionlet ensemble for action recognition with depth cameras[C]. IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 1290–1297. doi: 10.1007/978-3-319-04561-0_2.

    22. [22]

      CAVAZZA J, ZUNINO A, BIAGIO M S, et al. Kernelized covariance for action recognition[C]. International Conference on Pattern Recognition, Cancun, Mexico, 2016: 408–413. doi: 10.1109/ICPR.2016.7899668.

    1. [1]

      郭晨, 简涛, 徐从安, 何友, 孙顺. 基于深度多尺度一维卷积神经网络的雷达舰船目标识别. 电子与信息学报, 2019, 41(6): 1302-1309.

    2. [2]

      王鑫, 李可, 宁晨, 黄凤辰. 基于深度卷积神经网络和多核学习的遥感图像分类方法. 电子与信息学报, 2019, 41(5): 1098-1105.

    3. [3]

      杨宏宇, 王峰岩. 基于深度卷积神经网络的气象雷达噪声图像语义分割方法. 电子与信息学报, 2019, 41(10): 2373-2381.

    4. [4]

      贺丰收, 何友, 刘准钆, 徐从安. 卷积神经网络在雷达自动目标识别中的研究进展. 电子与信息学报, 2019, 41(0): 1-13.

    5. [5]

      袁野, 贾克斌, 刘鹏宇. 基于深度卷积神经网络的多元医学信号多级上下文自编码器. 电子与信息学报, 2019, 41(0): 1-8.

    6. [6]

      秦华标, 曹钦平. 基于FPGA的卷积神经网络硬件加速器设计. 电子与信息学报, 2019, 41(11): 2599-2605.

    7. [7]

      王巍, 周凯利, 王伊昌, 王广, 袁军. 基于快速滤波算法的卷积神经网络加速器设计. 电子与信息学报, 2019, 41(11): 2578-2584.

    8. [8]

      王斐, 吴仕超, 刘少林, 张亚徽, 魏颖. 基于脑电信号深度迁移学习的驾驶疲劳检测. 电子与信息学报, 2019, 41(9): 2264-2272.

    9. [9]

      孙彦景, 石韫开, 云霄, 朱绪冉, 王赛楠. 基于多层卷积特征的自适应决策融合目标跟踪算法. 电子与信息学报, 2019, 41(10): 2464-2470.

    10. [10]

      盖杉, 鲍中运. 基于改进深度卷积神经网络的纸币识别研究. 电子与信息学报, 2019, 41(8): 1992-2000.

    11. [11]

      刘静, 刘涵, 黄开宇, 苏立玉. 基于自动秩估计的黎曼优化矩阵补全算法及其在图像补全中的应用. 电子与信息学报, 2019, 41(11): 2787-2794.

    12. [12]

      刘政怡, 段群涛, 石松, 赵鹏. 基于多模态特征融合监督的RGB-D图像显著性检测. 电子与信息学报, 2019, 41(0): 1-8.

    13. [13]

      周洋, 吴佳忆, 陆宇, 殷海兵. 面向三维高效视频编码的深度图错误隐藏. 电子与信息学报, 2019, 41(11): 2760-2767.

    14. [14]

      王莉, 曹一凡, 杜高明, 刘冠宇, 王晓蕾, 张多利. 一种低延迟的3维高效视频编码中深度建模模式编码器. 电子与信息学报, 2019, 41(7): 1625-1632.

    15. [15]

      毕秀丽, 魏杨, 肖斌, 李伟生, 马建峰. 基于级联卷积神经网络的图像篡改检测算法. 电子与信息学报, 2019, 41(0): 1-8.

    16. [16]

      张烨, 许艇, 冯定忠, 蒋美仙, 吴光华. 基于难分样本挖掘的快速区域卷积神经网络目标检测研究. 电子与信息学报, 2019, 41(6): 1496-1502.

    17. [17]

      陈莹, 何丹丹. 基于贝叶斯融合的时空流异常行为检测模型. 电子与信息学报, 2019, 41(5): 1137-1144.

    18. [18]

      沈璇, 孙兵, 刘国强, 李超. 数字视频广播通用加扰算法的不可能差分分析. 电子与信息学报, 2019, 41(1): 46-52.

    19. [19]

      孙怡峰, 吴疆, 黄严严, 汤光明. 一种视频监控中基于航迹的运动小目标检测算法. 电子与信息学报, 2019, 41(11): 2744-2751.

    20. [20]

      刘嘉铭, 邢孟道, 符吉祥, 徐丹. 基于模型重建的深度卷积网络权值可视化方法. 电子与信息学报, 2019, 41(9): 2194-2200.

  • 图 1  整体模型框架

    图 2  本文采用的CNN模型结构

    图 3  比较2种方法对于特点动作的识别准确率

    图 4  MSR Daily Activity3D数据集16种动作的混淆矩阵

    表 1  UWA3D Multiview ActivityII数据集的动作识别准确性(%)

    训练视角V1&V2V1&V3V1&V4V2&V3V2&V4V3&V4平均准确率
    测试视角V3V4V2V4V2V3V1V4V1V3V1V2
    文献[6]45.040.435.136.934.736.049.529.357.135.449.029.339.8
    文献[7]49.442.834.639.738.144.853.333.553.641.256.732.643.4
    文献[18]52.751.859.057.542.844.258.138.463.243.866.348.052.2
    文献[17]60.161.357.165.161.666.870.659.573.259.372.554.563.5
    HPM(fc7)+RP80.274.969.976.449.263.871.459.980.776.984.468.471.3
    HPM(fc7)+FTP80.680.575.282.0 65.4 72.077.367.083.6 81.083.674.176.9
    HPM(fc6)+RP+FTP83.981.3 74.8 82.066.272.8 78.8 70.0 83.379.1 85.9 75.9 77.8
    HPM(fc7)+RP+FTP85.8 81.6 76.3 80.561.776.5 78.1 71.5 82.981.7 85.9 76.3 78.3
    注:V1, V2, V3, V4分别表示正面视角、左侧视角、右侧视角、顶部视角
    下载: 导出CSV

    表 2  几种方法对MSR Daily Activity3D的准确率(%)

    方法准确率
    文献[19]79.1
    文献[20]54.0
    文献[21]68.0
    文献[22]73.8
    HPM(fc7)+RP60.0
    HPM(fc7)+FTP79.9
    HPM(fc6)+RP+FTP81.3
    HPM(fc7)+RP+FTP82.5
    下载: 导出CSV
  • 加载中
图(4)表(2)
计量
  • PDF下载量:  25
  • 文章访问数:  407
  • HTML全文浏览量:  138
文章相关
  • 通讯作者:  毛秉毅, ysdxmby@163.com
  • 收稿日期:  2018-05-21
  • 录用日期:  2018-12-04
  • 网络出版日期:  2018-12-14
  • 刊出日期:  2019-04-01
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

/

返回文章