高级搜索

基于金字塔池化网络的道路场景深度估计方法

周武杰 潘婷 顾鹏笠 翟治年

引用本文: 周武杰, 潘婷, 顾鹏笠, 翟治年. 基于金字塔池化网络的道路场景深度估计方法[J]. 电子与信息学报, 2019, 41(10): 2509-2515. doi: 10.11999/JEIT180957 shu
Citation:  Wujie ZHOU, Ting PAN, Pengli GU, Zhinian ZHAI. Depth Estimation of Monocular Road Images Based on Pyramid Scene Analysis Network[J]. Journal of Electronics and Information Technology, 2019, 41(10): 2509-2515. doi: 10.11999/JEIT180957 shu

基于金字塔池化网络的道路场景深度估计方法

    作者简介: 周武杰: 男,1983年生,副教授,博士,研究方向为计算机视觉与模式识别,深度学习;
    潘婷: 女,1994年生,硕士,研究方向为计算机视觉与模式识别;
    顾鹏笠: 男,1989年生,硕士,研究方向为计算机视觉与模式识别;
    翟治年: 男,1977年生,讲师,博士,研究方向为深度学习
    通讯作者: 周武杰,wujiezhou@163.com
  • 基金项目: 国家自然科学基金(61502429),浙江省自然科学基金(LY18F0002)

摘要: 针对从单目视觉图像中估计深度信息时存在的预测精度不够准确的问题,该文提出一种基于金字塔池化网络的道路场景深度估计方法。该方法利用4个残差网络块的组合提取道路场景图像特征,然后通过上采样将特征图逐渐恢复到原始图像尺寸,多个残差网络块的加入增加网络模型的深度;考虑到上采样过程中不同尺度信息的多样性,将提取特征过程中各种尺寸的特征图与上采样过程中相同尺寸的特征图进行融合,从而提高深度估计的精确度。此外,对4个残差网络块提取的高级特征采用金字塔池化网络块进行场景解析,最后将金字塔池化网络块输出的特征图恢复到原始图像尺寸并与上采样模块的输出一同输入预测层。通过在KITTI数据集上进行实验,结果表明该文所提的基于金字塔池化网络的道路场景深度估计方法优于现有的估计方法。

English

    1. [1]

      LUO Yue, REN J, LIN Mude, et al. Single view stereo matching[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 155–163.

    2. [2]

      SILBERMAN N, HOIEM D, KOHLI P, et al. Indoor segmentation and support inference from RGBD images[C]. The 12th European Conference on Computer Vision, Florence, Italy, 2012: 746–760.

    3. [3]

      REN Xiaofeng, BO Liefeng, and FOX D. RGB-(D) scene labeling: Features and algorithms[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 2759–2766.

    4. [4]

      SHOTTON J, SHARP T, KIPMAN A, et al. Real-time human pose recognition in parts from single depth images[J]. Communications of the ACM, 2013, 56(1): 116–124. doi: 10.1145/2398356

    5. [5]

      ALP GÜLER R, NEVEROVA N, and KOKKINOS I. Densepose: Dense human pose estimation in the wild[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7297–7306.

    6. [6]

      LUO Wenjie, SCHWING A G, and URTASUN R. Efficient deep learning for stereo matching[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 5695–5703.

    7. [7]

      FLINT A, MURRAY D, and REID I. Manhattan scene understanding using monocular, stereo, and 3D features[C]. 2011 International Conference on Computer Vision, Barcelona, Spain, 2011: 2228–2235.

    8. [8]

      KUNDU A, LI Yin, DELLAERT F, et al. Joint semantic segmentation and 3D reconstruction from monocular video[C]. The 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 703–718.

    9. [9]

      YAMAGUCHI K, MCALLESTER D, and URTASUN R. Efficient joint segmentation, occlusion labeling, stereo and flow estimation[C]. The 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 756–771.

    10. [10]

      BAIG M H and TORRESANI L. Coupled depth learning[C]. 2016 IEEE Winter Conference on Applications of Computer Vision, Lake Placid, USA, 2016: 1–10.

    11. [11]

      EIGEN D and FERGUS R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 2650–2658.

    12. [12]

      SCHARSTEIN D and SZELISKI R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms[J]. International Journal of Computer Vision, 2002, 47(1/3): 7–42. doi: 10.1023/A:1014573219977

    13. [13]

      UPTON K. A modern approach[J]. Manufacturing Engineer, 1995, 74(3): 111–113. doi: 10.1049/me:19950308

    14. [14]

      FLYNN J, NEULANDER I, PHILBIN J, et al. Deep stereo: Learning to predict new views from the world's imagery[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 5515–5524.

    15. [15]

      SAXENA A, CHUNG S H, and NG A Y. 3-D depth reconstruction from a single still image[J]. International Journal of Computer Vision, 2008, 76(1): 53–69.

    16. [16]

      KARSCH K, LIU Ce, and KANG S B. Depth transfer: Depth extraction from video using non-parametric sampling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11): 2144–2158. doi: 10.1109/TPAMI.2014.2316835

    17. [17]

      EIGEN D, PUHRSCH C, and FERGUS R. Depth map prediction from a single image using a multi-scale deep network[C]. The 27th International Conference on Neural Information Processing Systems, Montréal, Canada, 2014: 2366–2374.

    18. [18]

      LAINA I, RUPPRECHT C, BELAGIANNIS V, et al. Deeper depth prediction with fully convolutional residual networks[C]. The 4th International Conference on 3D Vision, Stanford, USA, 2016: 239–248.

    19. [19]

      FU Huan, GONG Mingming, WANG Chaohui, et al. Deep ordinal regression network for monocular depth estimation[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 2002–2011.

    20. [20]

      DIMITRIEVSKI M, GOOSSENS B, VEELAERT P, et al. High resolution depth reconstruction from monocular images and sparse point clouds using deep convolutional neural network[J]. SPIE, 2017, 10410: 104100H.

    21. [21]

      MANCINI M, COSTANTE G, VALIGI P, et al. Toward domain independence for learning-based monocular depth estimation[J]. IEEE Robotics and Automation Letters, 2017, 2(3): 1778–1785. doi: 10.1109/LRA.2017.2657002

    22. [22]

      GARG R, VIJAY KUMAR B G, CARNEIRO G, et al. Unsupervised CNN for single view depth estimation: Geometry to the rescue[C]. The 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 740–756.

    23. [23]

      KUZNIETSOV Y, STUCKLER J, and LEIBE B. Semi-supervised deep learning for monocular depth map prediction[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6647–6655.

    24. [24]

      GODARD C, MAC AODHA O, and BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6602–6611.

    25. [25]

      ZORAN D, ISOLA P, KRISHNAN D, et al. Learning ordinal relationships for mid-level vision[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 388–396.

    26. [26]

      CHEN Weifeng, FU Zhao, YANG Dawei, et al. Single-image depth perception in the wild supplementary Materia[C]. The 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 730–738.

    27. [27]

      HE Kaiming, ZHANG Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.

    28. [28]

      ZHAO Hengshuang, SHI Jianping, QI Xiaojuan, et al. Pyramid scene parsing network[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6230–6239.

    29. [29]

      ZHOU Bolei, KHOSLA A, LAPEDRIZA A, et al. Object detectors emerge in deep scene CNNs[J]. arXiv preprint arXiv: 1412.6856, 2014.

    30. [30]

      SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1–9.

    31. [31]

      UHRIG J, SCHNEIDER N, SCHNEIDER L, et al. Sparsity invariant CNNs[C]. 2017 International Conference on 3D Vision, Qingdao, China, 2017: 11–20.

    32. [32]

      KINGMA D P and BA J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv: 1412.6980, 2014.

    1. [1]

      肖成龙, 孙颖, 林邦姜, 汤璇, 王珊珊, 张敏, 谢宇芳, 戴玲凤, 骆佳彬. 基于神经网络与复合离散混沌系统的双重加密方法. 电子与信息学报, 2019, 41(0): 1-8.

    2. [2]

      马彬, 李尚儒, 谢显中. 异构无线网络中基于人工神经网络的自适应垂直切换算法. 电子与信息学报, 2019, 41(5): 1210-1216.

    3. [3]

      熊余, 杨娅娅, 张振振, 蒋婧. 软件定义时分波分复用无源光网络中基于带宽预测的资源分配策略. 电子与信息学报, 2019, 41(8): 1885-1892.

    4. [4]

      冯浩, 黄坤, 李晶, 高榕, 刘东华, 宋成芳. 基于深度学习的混合兴趣点推荐算法. 电子与信息学报, 2019, 41(4): 880-887.

    5. [5]

      崔维嘉, 张鹏, 巴斌. 基于贝叶斯自动相关性确定的稀疏重构正交频分复用信号时延估计算法. 电子与信息学报, 2019, 41(10): 2318-2324.

    6. [6]

      赵斐, 张文凯, 闫志远, 于泓峰, 刁文辉. 基于多特征图金字塔融合深度网络的遥感图像语义分割. 电子与信息学报, 2019, 41(10): 2525-2531.

    7. [7]

      盖杉, 鲍中运. 基于改进深度卷积神经网络的纸币识别研究. 电子与信息学报, 2019, 41(8): 1992-2000.

    8. [8]

      王鑫, 李可, 宁晨, 黄凤辰. 基于深度卷积神经网络和多核学习的遥感图像分类方法. 电子与信息学报, 2019, 41(5): 1098-1105.

    9. [9]

      郭晨, 简涛, 徐从安, 何友, 孙顺. 基于深度多尺度一维卷积神经网络的雷达舰船目标识别. 电子与信息学报, 2019, 41(6): 1302-1309.

    10. [10]

      杨宏宇, 王峰岩. 基于深度卷积神经网络的气象雷达噪声图像语义分割方法. 电子与信息学报, 2019, 41(10): 2373-2381.

    11. [11]

      袁野, 贾克斌, 刘鹏宇. 基于深度卷积神经网络的多元医学信号多级上下文自编码器. 电子与信息学报, 2019, 41(0): 1-8.

    12. [12]

      毕秀丽, 魏杨, 肖斌, 李伟生, 马建峰. 基于级联卷积神经网络的图像篡改检测算法. 电子与信息学报, 2019, 41(0): 1-8.

    13. [13]

      秦华标, 曹钦平. 基于FPGA的卷积神经网络硬件加速器设计. 电子与信息学报, 2019, 41(0): 1-7.

    14. [14]

      贺丰收, 何友, 刘准钆, 徐从安. 卷积神经网络在雷达自动目标识别中的研究进展. 电子与信息学报, 2019, 41(0): 1-13.

    15. [15]

      王巍, 周凯利, 王伊昌, 王广, 袁军. 基于快速滤波算法的卷积神经网络加速器设计. 电子与信息学报, 2019, 41(0): 1-7.

    16. [16]

      梁晓萍, 郭振军, 朱昌洪. 基于头脑风暴优化算法的BP神经网络模糊图像复原. 电子与信息学报, 2019, 41(0): 1-7.

    17. [17]

      陈光武, 程鉴皓, 杨菊花, 刘昊, 张琳婧. 基于改进神经网络增强自适应UKF的组合导航系统. 电子与信息学报, 2019, 41(7): 1766-1773.

    18. [18]

      张烨, 许艇, 冯定忠, 蒋美仙, 吴光华. 基于难分样本挖掘的快速区域卷积神经网络目标检测研究. 电子与信息学报, 2019, 41(6): 1496-1502.

    19. [19]

      陈红松, 陈京九. 基于循环神经网络的无线网络入侵检测分类模型构建与优化研究. 电子与信息学报, 2019, 41(6): 1427-1433.

    20. [20]

      李海, 任嘉伟, 尚金雷. 一种基于模糊神经网络模糊C均值聚类的双偏振气象雷达降水粒子分类方法. 电子与信息学报, 2019, 41(4): 809-815.

  • 图 1  本文提出的神经网络框架

    图 2  两种残差网络块块的结构图

    图 3  上采样恢复尺度模块

    图 4  金字塔池化模块

    表 1  深度图像的预测值与真实值之间的误差和相关性

    RMSE Lg Lg_rms a1 a2 a3
    Fine_coarse[17] 2.6440 0.272 0.167 0.488 0.948 0.972
    ResNet50[18] 2.4618 0.243 0.126 0.674 0.943 0.972
    ResNet_fcn50[19] 2.5284 0.247 0.134 0.636 0.950 0.979
    D_U[20] 2.8246 0.305 0.127 0.634 0.916 0.945
    UVD_fcn[21] 2.6507 0.264 0.145 0.566 0.945 0.970
    本文方法 2.3504 0.230 0.120 0.684 0.949 0.975
    下载: 导出CSV

    表 2  不同恢复尺度方法的结果

    RMSE Lg Lg_rms a1 a2 a3
    使用反卷积层恢复尺度的方法 2.3716 0.237 0.125 0.673 0.946 0.973
    使用卷积块恢复尺度的方法 2.4724 0.240 0.129 0.646 0.948 0.974
    使用上采样层恢复尺度的方法 2.3504 0.230 0.120 0.684 0.949 0.975
    下载: 导出CSV
  • 加载中
图(4)表(2)
计量
  • PDF下载量:  19
  • 文章访问数:  621
  • HTML全文浏览量:  521
文章相关
  • 通讯作者:  周武杰, wujiezhou@163.com
  • 收稿日期:  2018-10-12
  • 录用日期:  2019-05-21
  • 网络出版日期:  2019-05-28
  • 刊出日期:  2019-10-01
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

/

返回文章