高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于金字塔池化网络的道路场景深度估计方法

周武杰 潘婷 顾鹏笠 翟治年

周武杰, 潘婷, 顾鹏笠, 翟治年. 基于金字塔池化网络的道路场景深度估计方法[J]. 电子与信息学报, 2019, 41(10): 2509-2515. doi: 10.11999/JEIT180957
引用本文: 周武杰, 潘婷, 顾鹏笠, 翟治年. 基于金字塔池化网络的道路场景深度估计方法[J]. 电子与信息学报, 2019, 41(10): 2509-2515. doi: 10.11999/JEIT180957
Wujie ZHOU, Ting PAN, Pengli GU, Zhinian ZHAI. Depth Estimation of Monocular Road Images Based on Pyramid Scene Analysis Network[J]. Journal of Electronics and Information Technology, 2019, 41(10): 2509-2515. doi: 10.11999/JEIT180957
Citation: Wujie ZHOU, Ting PAN, Pengli GU, Zhinian ZHAI. Depth Estimation of Monocular Road Images Based on Pyramid Scene Analysis Network[J]. Journal of Electronics and Information Technology, 2019, 41(10): 2509-2515. doi: 10.11999/JEIT180957

基于金字塔池化网络的道路场景深度估计方法

doi: 10.11999/JEIT180957
基金项目: 国家自然科学基金(61502429),浙江省自然科学基金(LY18F0002)
详细信息
    作者简介:

    周武杰:男,1983年生,副教授,博士,研究方向为计算机视觉与模式识别,深度学习

    潘婷:女,1994年生,硕士,研究方向为计算机视觉与模式识别

    顾鹏笠:男,1989年生,硕士,研究方向为计算机视觉与模式识别

    翟治年:男,1977年生,讲师,博士,研究方向为深度学习

    通讯作者:

    周武杰 wujiezhou@163.com

  • 中图分类号: TP391.4

Depth Estimation of Monocular Road Images Based on Pyramid Scene Analysis Network

Funds: The National Natural Science Foundation of China (61502429), The Zhejiang Provincial Natural Science foundation (LY18F020012)
  • 摘要: 针对从单目视觉图像中估计深度信息时存在的预测精度不够准确的问题,该文提出一种基于金字塔池化网络的道路场景深度估计方法。该方法利用4个残差网络块的组合提取道路场景图像特征,然后通过上采样将特征图逐渐恢复到原始图像尺寸,多个残差网络块的加入增加网络模型的深度;考虑到上采样过程中不同尺度信息的多样性,将提取特征过程中各种尺寸的特征图与上采样过程中相同尺寸的特征图进行融合,从而提高深度估计的精确度。此外,对4个残差网络块提取的高级特征采用金字塔池化网络块进行场景解析,最后将金字塔池化网络块输出的特征图恢复到原始图像尺寸并与上采样模块的输出一同输入预测层。通过在KITTI数据集上进行实验,结果表明该文所提的基于金字塔池化网络的道路场景深度估计方法优于现有的估计方法。
  • 图  1  本文提出的神经网络框架

    图  2  两种残差网络块块的结构图

    图  3  上采样恢复尺度模块

    图  4  金字塔池化模块

    表  1  深度图像的预测值与真实值之间的误差和相关性

    RMSE Lg Lg_rms a1 a2 a3
    Fine_coarse[17] 2.6440 0.272 0.167 0.488 0.948 0.972
    ResNet50[18] 2.4618 0.243 0.126 0.674 0.943 0.972
    ResNet_fcn50[19] 2.5284 0.247 0.134 0.636 0.950 0.979
    D_U[20] 2.8246 0.305 0.127 0.634 0.916 0.945
    UVD_fcn[21] 2.6507 0.264 0.145 0.566 0.945 0.970
    本文方法 2.3504 0.230 0.120 0.684 0.949 0.975
    下载: 导出CSV

    表  2  不同恢复尺度方法的结果

    RMSE Lg Lg_rms a1 a2 a3
    使用反卷积层恢复尺度的方法 2.3716 0.237 0.125 0.673 0.946 0.973
    使用卷积块恢复尺度的方法 2.4724 0.240 0.129 0.646 0.948 0.974
    使用上采样层恢复尺度的方法 2.3504 0.230 0.120 0.684 0.949 0.975
    下载: 导出CSV
  • [1] LUO Yue, REN J, LIN Mude, et al. Single view stereo matching[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 155–163.
    [2] SILBERMAN N, HOIEM D, KOHLI P, et al. Indoor segmentation and support inference from RGBD images[C]. The 12th European Conference on Computer Vision, Florence, Italy, 2012: 746–760.
    [3] REN Xiaofeng, BO Liefeng, and FOX D. RGB-(D) scene labeling: Features and algorithms[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 2759–2766.
    [4] SHOTTON J, SHARP T, KIPMAN A, et al. Real-time human pose recognition in parts from single depth images[J]. Communications of the ACM, 2013, 56(1): 116–124. doi:  10.1145/2398356
    [5] ALP GÜLER R, NEVEROVA N, and KOKKINOS I. Densepose: Dense human pose estimation in the wild[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7297–7306.
    [6] LUO Wenjie, SCHWING A G, and URTASUN R. Efficient deep learning for stereo matching[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 5695–5703.
    [7] FLINT A, MURRAY D, and REID I. Manhattan scene understanding using monocular, stereo, and 3D features[C]. 2011 International Conference on Computer Vision, Barcelona, Spain, 2011: 2228–2235.
    [8] KUNDU A, LI Yin, DELLAERT F, et al. Joint semantic segmentation and 3D reconstruction from monocular video[C]. The 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 703–718.
    [9] YAMAGUCHI K, MCALLESTER D, and URTASUN R. Efficient joint segmentation, occlusion labeling, stereo and flow estimation[C]. The 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 756–771.
    [10] BAIG M H and TORRESANI L. Coupled depth learning[C]. 2016 IEEE Winter Conference on Applications of Computer Vision, Lake Placid, USA, 2016: 1–10.
    [11] EIGEN D and FERGUS R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 2650–2658.
    [12] SCHARSTEIN D and SZELISKI R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms[J]. International Journal of Computer Vision, 2002, 47(1/3): 7–42. doi:  10.1023/A:1014573219977
    [13] UPTON K. A modern approach[J]. Manufacturing Engineer, 1995, 74(3): 111–113. doi:  10.1049/me:19950308
    [14] FLYNN J, NEULANDER I, PHILBIN J, et al. Deep stereo: Learning to predict new views from the world's imagery[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 5515–5524.
    [15] SAXENA A, CHUNG S H, and NG A Y. 3-D depth reconstruction from a single still image[J]. International Journal of Computer Vision, 2008, 76(1): 53–69.
    [16] KARSCH K, LIU Ce, and KANG S B. Depth transfer: Depth extraction from video using non-parametric sampling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11): 2144–2158. doi:  10.1109/TPAMI.2014.2316835
    [17] EIGEN D, PUHRSCH C, and FERGUS R. Depth map prediction from a single image using a multi-scale deep network[C]. The 27th International Conference on Neural Information Processing Systems, Montréal, Canada, 2014: 2366–2374.
    [18] LAINA I, RUPPRECHT C, BELAGIANNIS V, et al. Deeper depth prediction with fully convolutional residual networks[C]. The 4th International Conference on 3D Vision, Stanford, USA, 2016: 239–248.
    [19] FU Huan, GONG Mingming, WANG Chaohui, et al. Deep ordinal regression network for monocular depth estimation[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 2002–2011.
    [20] DIMITRIEVSKI M, GOOSSENS B, VEELAERT P, et al. High resolution depth reconstruction from monocular images and sparse point clouds using deep convolutional neural network[J]. SPIE, 2017, 10410: 104100H.
    [21] MANCINI M, COSTANTE G, VALIGI P, et al. Toward domain independence for learning-based monocular depth estimation[J]. IEEE Robotics and Automation Letters, 2017, 2(3): 1778–1785. doi:  10.1109/LRA.2017.2657002
    [22] GARG R, VIJAY KUMAR B G, CARNEIRO G, et al. Unsupervised CNN for single view depth estimation: Geometry to the rescue[C]. The 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 740–756.
    [23] KUZNIETSOV Y, STUCKLER J, and LEIBE B. Semi-supervised deep learning for monocular depth map prediction[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6647–6655.
    [24] GODARD C, MAC AODHA O, and BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6602–6611.
    [25] ZORAN D, ISOLA P, KRISHNAN D, et al. Learning ordinal relationships for mid-level vision[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 388–396.
    [26] CHEN Weifeng, FU Zhao, YANG Dawei, et al. Single-image depth perception in the wild supplementary Materia[C]. The 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 730–738.
    [27] HE Kaiming, ZHANG Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.
    [28] ZHAO Hengshuang, SHI Jianping, QI Xiaojuan, et al. Pyramid scene parsing network[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6230–6239.
    [29] ZHOU Bolei, KHOSLA A, LAPEDRIZA A, et al. Object detectors emerge in deep scene CNNs[J]. arXiv preprint arXiv: 1412.6856, 2014.
    [30] SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1–9.
    [31] UHRIG J, SCHNEIDER N, SCHNEIDER L, et al. Sparsity invariant CNNs[C]. 2017 International Conference on 3D Vision, Qingdao, China, 2017: 11–20.
    [32] KINGMA D P and BA J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv: 1412.6980, 2014.
  • [1] 张天骐, 范聪聪, 葛宛营, 张天.  基于ICA和特征提取的MIMO信号调制识别算法, 电子与信息学报. doi: 10.11999/JEIT190320
    [2] 江小平, 王妙羽, 丁昊, 李成华.  基于信道状态信息幅值-相位的被动式室内指纹定位, 电子与信息学报. doi: 10.11999/JEIT180871
    [3] 冯熳, 王梓楠.  基于奇异值分解与神经网络的干扰识别, 电子与信息学报. doi: 10.11999/JEIT190228
    [4] 王春华, 蔺海荣, 孙晶如, 周玲, 周超, 邓全利.  基于忆阻器的混沌、存储器及神经网络电路研究进展, 电子与信息学报. doi: 10.11999/JEIT190821
    [5] 付哲泉, 李尚生, 李相平, 但波, 王旭坤.  基于高效可扩展改进残差结构神经网络的舰船目标识别技术, 电子与信息学报. doi: 10.11999/JEIT190913
    [6] 咸鹤群, 张艺, 汪定, 李增鹏, 贺云龙.  CSNN:基于汉语拼音与神经网络的口令集安全评估方法, 电子与信息学报. doi: 10.11999/JEIT190856
    [7] 肖成龙, 孙颖, 林邦姜, 汤璇, 王珊珊, 张敏, 谢宇芳, 戴玲凤, 骆佳彬.  基于神经网络与复合离散混沌系统的双重加密方法, 电子与信息学报. doi: 10.11999/JEIT190213
    [8] 陈曦, 杨健.  动态频谱接入中基于最小贝叶斯风险的稳健频谱预测, 电子与信息学报. doi: 10.11999/JEIT170519
    [9] 徐志红, 方震, 陈贤祥, 覃力, 杜利东, 赵湛, 刘杰昕.  一种基于多参数融合的无袖带式连续血压测量方法的研究, 电子与信息学报. doi: 10.11999/JEIT170238
    [10] 李寰宇, 毕笃彦, 查宇飞, 杨源.  一种易于初始化的类卷积神经网络视觉跟踪算法, 电子与信息学报. doi: 10.11999/JEIT150600
    [11] 章雒霏, 张铭, 李晨.  一种新的语音和噪声活动检测算法及其在手机双麦克风消噪系统中的应用, 电子与信息学报. doi: 10.11999/JEIT151302
    [12] 高迎彬, 孔祥玉, 胡昌华, 张会会, 侯立安.  一种广义主成分提取算法及其收敛性分析, 电子与信息学报. doi: 10.11999/JEIT151433
    [13] 赵雄文, 郭春霞, 李景春.  基于高阶累积量和循环谱的信号调制方式混合识别算法, 电子与信息学报. doi: 10.11999/JEIT150747
    [14] 侯志强, 戴铂, 胡丹, 余旺盛, 陈晨, 范舜奕.  基于感知深度神经网络的视觉跟踪, 电子与信息学报. doi: 10.11999/JEIT151449
    [15] 高学星, 孙华刚, 侯保林.  使用不同置信级训练样本的神经网络学习方法, 电子与信息学报. doi: 10.3724/SP.J.1146.2013.01099
    [16] 李鹏飞, 张旻, 钟子发, 罗争.  基于空频域稀疏表示的宽频段DOA估计, 电子与信息学报. doi: 10.3724/SP.J.1146.2011.00503
    [17] 刘强, 汪斌强, 刘海成.  基于可重构路由器装配的构件候选集生成方法研究, 电子与信息学报. doi: 10.3724/SP.J.1146.2009.01608
    [18] 周力, 毛钧杰, 姚德淼.  基于神经网络的S参数估计法, 电子与信息学报.
    [19] 廖晓昕, 费奇, 孙德保, 齐欢.  联想记忆神经网络吸收区域的估计, 电子与信息学报.
    [20] 罗发龙.  基于神经网络的ML方向估计, 电子与信息学报.
  • 加载中
  • 图(4) / 表ll (2)
    计量
    • 文章访问数:  1720
    • HTML全文浏览量:  1216
    • PDF下载量:  46
    • 被引次数: 0
    出版历程
    • 收稿日期:  2018-10-12
    • 修回日期:  2019-05-21
    • 网络出版日期:  2019-05-28
    • 刊出日期:  2019-10-01

    目录

      /

      返回文章
      返回

      官方微信,欢迎关注