高级搜索

基于深度学习的手语识别综述

张淑军 张群 李辉

引用本文: 张淑军, 张群, 李辉. 基于深度学习的手语识别综述[J]. 电子与信息学报, doi: 10.11999/JEIT190416 shu
Citation:  Shujun ZHANG, Qun ZHANG, Hui LI. A Review of Sign Language Recognition Based on Deep Learning[J]. Journal of Electronics and Information Technology, doi: 10.11999/JEIT190416 shu

基于深度学习的手语识别综述

    作者简介: 张淑军: 女,1980年生,副教授,研究方向为计算机视觉;
    张群: 女,1994年生,硕士生,研究方向为计算机视觉;
    李辉: 男,1984年生,副教授,研究方向为计算机视觉
    通讯作者: 张淑军,lindazsj@163.com
  • 基金项目: 国家自然科学基金(61702295, 61672305),山东省重点研发计划项目(2017GGX10127)

摘要: 手语识别涉及计算机视觉、模式识别、人机交互等领域,具有重要的研究意义与应用价值。深度学习技术的蓬勃发展为更加精准、实时的手语识别带来了新的机遇。该文综述了近年来基于深度学习的手语识别技术,从孤立词与连续语句两个分支展开详细的算法阐述与分析。孤立词识别技术划分为:基于卷积神经网络、3维卷积网络和循环神经网络3种架构的方法;连续语句识别所用模型复杂度更高,通常需要辅助某种长时时序建模算法,按其主体结构分为:双向长短时记忆网络模型、3维卷积网络模型和混合模型。归纳总结了目前国内外常用手语数据集,探讨了手语识别技术的研究挑战与发展趋势,高精度前提下的鲁棒性和实用化仍有待于推进。

English

    1. [1]

      HINTON G E, OSINDERO S, and TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 1527–1554. doi: 10.1162/neco.2006.18.7.1527

    2. [2]

      周宇. 中国手语识别中自适应问题的研究[D].[博士论文], 哈尔滨工业大学, 2009.
      ZHOU Yu. Research on signer adaptation in Chinese sign language recognition[D].[Ph.D. dissertation], Harbin Institute of Technology, 2009.

    3. [3]

      CHEOK M J, OMAR Z, and JAWARD M H. A review of hand gesture and sign language recognition techniques[J]. International Journal of Machine Learning and Cybernetics, 2019, 10(1): 131–153. doi: 10.1007/s13042-017-0705-5

    4. [4]

      TANG Ao, LU Ke, WANG Yufei, et al. A real-time hand posture recognition system using deep neural networks[J]. ACM Transactions on Intelligent Systems and Technology, 2015, 6(2): 21. doi: 10.1145/2735952

    5. [5]

      PIGOU L, DIELEMAN S, KINDERMANS P J, et al. Sign language recognition using convolutional neural networks[C]. European Conference on Computer Vision, Zurich, Switzerland, 2014: 572–578.

    6. [6]

      KANG B, TRIPATHI S, and NGUYEN T Q. Real-time sign language fingerspelling recognition using convolutional neural networks from depth map[C]. Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia, 2015: 136–140.

    7. [7]

      HOSSEN M A, GOVINDAIAH A, SULTANA S, et al. Bengali sign language recognition using Deep Convolutional Neural Network[C]. Proceedings of the Joint 7th International Conference on Informatics, Electronics & Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR), Kitakyushu, Japan, 2018: 369–373.

    8. [8]

      KOLLER O, BOWDEN R, and NEY H. Automatic alignment of hamNoSys subunits for continuous sign language recognition[C]. Proceedings of the 10th Edition of the Language Resources and Evaluation Conference, Portorož, Slovenia, 2016: 121–128.

    9. [9]

      GARCIA B and VIESCA S A. Real-time American sign language recognition with convolutional neural networks[J]. Convolutional Neural Networks for Visual Recognition, 2016: 225–232.

    10. [10]

      JI Y, KIM S, and LEE K B. Sign language learning system with image sampling and convolutional neural network[C]. Proceedings of the 1st IEEE International Conference on Robotic Computing (IRC), Taichung, China, 2017: 371–375.

    11. [11]

      KIM S, JI Y, and LEE K B. An effective sign language learning with object detection based ROI segmentation[C]. Proceedings of the 2nd IEEE International Conference on Robotic Computing (IRC), Laguna Hills, USA, 2018: 330–333.

    12. [12]

      KÖPÜKLÜ O, KÖSE N, and RIGOLL G. Motion fused frames: Data level fusion strategy for hand gesture recognition[C]. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, USA, 2018: 2103–2111.

    13. [13]

      KONSTANTINIDIS D, DIMITROPOULOS K, and DARAS P. Sign language recognition based on hand and body skeletal data[C]. Proceedings of 2018-3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), Helsinki, Finland, 2018: 1–4.

    14. [14]

      DEVINEAU G, MOUTARDE F, WANG Xi, et al. Deep learning for hand gesture recognition on skeletal data[C]. Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xian, China, 2018: 106–113.

    15. [15]

      MOLCHANOV P, GUPTA S, KIM K, et al. Hand gesture recognition with 3D convolutional neural networks[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition workshops, Boston, USA, 2015: 1–7.

    16. [16]

      WU Di, PIGOU L, KINDERMANS P J, et al. Deep dynamic neural networks for multimodal gesture segmentation and recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(8): 1583–1597. doi: 10.1109/TPAMI.2016.2537340

    17. [17]

      HUANG Jie, ZHOU Wengang, LI Houqiang, et al. Sign language recognition using 3D convolutional neural networks[C]. Proceedings of 2015 IEEE International Conference on Multimedia and Expo (ICME), Turin, Italy, 2015: 1–6.

    18. [18]

      HUANG Jie, ZHOU Wengang, LI Houqiang, et al. Attention-based 3D-CNNs for large-vocabulary sign language recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(9): 2822–2832. doi: 10.1109/TCSVT.2018.2870740

    19. [19]

      LI Yunan, MIAO Qiguang, TIAN Kuan, et al. Large-scale gesture recognition with a fusion of RGB-D data based on the C3D model[C]. Proceedings of the 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 2016: 25–30.

    20. [20]

      LI Yunan, MIAO Qiguang, TIAN Kuan, et al. Large-scale gesture recognition with a fusion of RGB-D data based on saliency theory and C3D model[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(10): 2956–2964. doi: 10.1109/TCSVT.2017.2749509

    21. [21]

      MIAO Qiguang, LI Yunan, OUYANG Wanli, et al. Multimodal gesture recognition based on the resc3d network[C]. Proceedings of 2017 IEEE International Conference on Computer Vision Workshops, Venice, Italy, 2017: 3047–3055.

    22. [22]

      ELBADAWY M, ELONS A S, SHEDEED H A, et al. Arabic sign language recognition with 3d convolutional neural networks[C]. Proceedings of 2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 2017: 66–71.

    23. [23]

      YE Yuancheng, TIAN Yingli, HUENERFAUTH M, et al. Recognizing American sign language gestures from within continuous videos[C]. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, USA, 2018: 2064–2073.

    24. [24]

      LIANG Zhijie, LIAO Shengbin, and HU Bingzhang. 3D convolutional neural networks for dynamic sign language recognition[J]. The Computer Journal, 2018, 61(11): 1724–1736. doi: 10.1093/comjnl/bxy049

    25. [25]

      CATE H, DALVI F, and HUSSAIN Z. Sign language recognition using temporal classification[EB/OL]. http://arxiv.org/abs/1701.01875v1, 2017.

    26. [26]

      CHAI Xiujuan, LIU Zhipeng, YIN Fang, et al. Two streams recurrent neural networks for large-scale continuous gesture recognition[C]. Proceedings of the 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 2016: 31–36.

    27. [27]

      LIU Tao, ZHOU Wengang, and LI Houqiang. Sign language recognition with long short-term memory[C]. Proceedings of 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, USA, 2016: 2871–2875.

    28. [28]

      LI Xiaoxu, MAO Chensi, HUANG Shiliang, et al. Chinese sign language recognition based on SHS descriptor and encoder-decoder LSTM model[C]. Proceedings of the 12th Chinese Conference on Biometric Recognition. Shenzhen, China, 2017: 719–728.

    29. [29]

      HUANG Shiliang, MAO Chensi, TAO Jinxu, et al. A novel chinese sign language recognition method based on keyframe-centered clips[J]. IEEE Signal Processing Letters, 2018, 25(3): 442–446. doi: 10.1109/LSP.2018.2797228

    30. [30]

      YANG Su and ZHU Qing. Continuous Chinese sign language recognition with CNN-LSTM[C]. Proceedings of SPIE 10420, Ninth International Conference on Digital Image Processing (ICDIP 2017), Hong Kong, China, 2017: 104200F.

    31. [31]

      YANG Su and ZHU Qing. Video-based Chinese sign language recognition using convolutional neural network[C]. Proceedings of the IEEE 9th International Conference on Communication Software and Networks (ICCSN), Guangzhou, China, 2017: 929–934.

    32. [32]

      LIN Chi, WAN Jun, LIANG Yanyan, et al. Large-scale isolated gesture recognition using a refined fused model based on masked Res-C3D network and skeleton LSTM[C]. Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 2018: 52–58.

    33. [33]

      HALIM K and RAKUN E. Sign language system for Bahasa Indonesia (Known as SIBI) recognizer using TensorFlow and Long Short-Term Memory[C]. Proceedings of 2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Yogyakarta, Indonesia, 2018: 403–407.

    34. [34]

      MASOOD S, SRIVASTAVA A, THUWAL H C, et al. Real-time sign language gesture (word) recognition from video sequences using CNN and RNN[M]. BHATEJA V, COELLO C A C, and SATAPATHY S C. Intelligent Engineering Informatics: Proceedings of the 6th International Conference on FICTA. Singapore: Springer, 2018: 623–632.

    35. [35]

      BANTUPALLI K and XIE Ying. American Sign Language recognition using deep learning and computer vision[C]. Proceedings of 2018 IEEE International Conference on Big Data (Big Data), Seattle, USA, 2018: 4896–4899.

    36. [36]

      KONSTANTINIDIS D, DIMITROPOULOS K, and DARAS P. A deep learning approach for analyzing video and skeletal features in sign language recognition[C]. Proceedings of 2018 IEEE International Conference on Imaging Systems and Techniques (IST), Krakow, Poland, 2018: 1–6.

    37. [37]

      VINCENT H, TOMOYA S, and GENTIANE V. Convolutional and recurrent neural network for human action recognition: Application on American sign language[EB/OL]. http://biorxiv.org/content/10.1101/535492v1, 2019.

    38. [38]

      LIAO Yanqiu, XIONG Pengwen, MIN Weidong, et al. Dynamic sign language recognition based on video sequence with BLSTM-3D residual networks[J]. IEEE Access, 2019, 7: 38044–38054. doi: 10.1109/ACCESS.2019.2904749

    39. [39]

      CAMGOZ N C, HADFIELD S, KOLLER O, et al. SubUNets: End-to-end hand shape and continuous sign language recognition[C]. Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 3075–3084.

    40. [40]

      CUI Runpeng, LIU Hu, and ZHANG Changshui. A deep neural framework for continuous sign language recognition by iterative training[J]. IEEE Transactions on Multimedia, 2019, 21(7): 1880–1891. doi: 10.1109/TMM.2018.2889563

    41. [41]

      SHI Bowen, DEL RIO A M, KEANE J, et al. American Sign Language fingerspelling recognition in the wild[C]. Proceedings of 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece, 2018: 145–152.

    42. [42]

      KO S K, SON J G, and JUNG H. Sign language recognition with recurrent neural network using human keypoint detection[C]. Proceedings of 2018 Conference on Research in Adaptive and Convergent Systems, Honolulu, USA, 2018: 326–328.

    43. [43]

      ZHANG Qian, WANG Dong, ZHAO Run, et al. MyoSign: Enabling end-to-end sign language recognition with wearables[C]. Proceedings of the 24th International Conference on Intelligent User Interfaces, Marina del Ray, USA, 2019: 650–660.

    44. [44]

      MITTAL A, KUMAR P, ROY P P, et al. A modified LSTM model for continuous sign language recognition using leap motion[J]. IEEE Sensors Journal, 2019, 19(16): 7056–7063. doi: 10.1109/JSEN.2019.2909837

    45. [45]

      CAMGOZ N C, HADFIELD S, KOLLER O, et al. Using convolutional 3d neural networks for user-independent continuous gesture recognition[C]. Proceedings of the 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 2016: 49–54.

    46. [46]

      PU Junfu, ZHOU Wengang, and LI Houqiang. Dilated convolutional network with iterative optimization for continuous sign language recognition[C]. Proceedings of the 27th International Joint Conference on Artificial Intelligence, Wellington, New Zealand, 2018: 885–891. (未找到出版地信息, 请核对)

    47. [47]

      HUANG Jie, ZHOU Wengang, ZHANG Qilin, et al. Video-based sign language recognition without temporal segmentation[C]. Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 2257–2264.

    48. [48]

      WANG Shuo, GUO Dan, ZHOU Wengang, et al. Connectionist temporal fusion for sign language translation[C]. Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Korea, 2018: 1483–1491.

    49. [49]

      KOLLER O, ZARGARAN O, NEY H, et al. Deep sign: Hybrid CNN-HMM for continuous sign language recognition[C]. Proceedings of 2016 British Machine Vision Conference, York, UK, 2016: 1–2.

    50. [50]

      KOLLER O, ZARGARAN S, and NEY H. Re-sign: Re-aligned end-to-end sequence modelling with deep recurrent CNN-HMMs[C]. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 2017: 4297–4305.

    51. [51]

      KOLLER O, ZARGARAN S, NEY H, et al. Deep sign: Enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs[J]. International Journal of Computer Vision, 2018, 126(12): 1311–1325. doi: 10.1007/s11263-018-1121-3

    52. [52]

      PIGOU L, VAN HERREWEGHE M, and DAMBRE J. Gesture and sign language recognition with temporal residual networks[C]. Proceedings of 2017 IEEE International Conference on Computer Vision Workshops, Venice, Italy, 2017: 3086–3093.

    53. [53]

      CUI Runpeng, LIU Hu, and ZHANG Changshui. Recurrent convolutional neural networks for continuous sign language recognition by staged optimization[C]. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 7361–7369.

    54. [54]

      ARIESTA M C, WIRYANA F, SUHARJITO, et al. Sentence level Indonesian sign language recognition using 3D convolutional neural network and bidirectional recurrent neural network[C]. Proceedings of 2018 Indonesian Association for Pattern Recognition International Conference (INAPR), Jakarta, Indonesia, 2018: 16–22.

    55. [55]

      GUO Dan, ZHOU Wengang, LI Houqiang, et al. Hierarchical LSTM for sign language translation[C]. Proceedings of the 32nd AAAI Conference on Artificial Intelligence, the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, USA, 2018: 6845–6852.

    56. [56]

      FORSTER J, SCHMIDT C, HOYOUX T, et al. RWTH-PHOENIX-Weather: A large vocabulary sign language recognition and translation corpus[C]. Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, 2012: 3785–3789.

    57. [57]

      ESCALERA S, BARÓ X, GONZÀLEZ J, et al. Chalearn looking at people challenge 2014: Dataset and results[C]. European Conference on Computer Vision, Zurich, Switzerland, 2014: 459–473.

    58. [58]

      ONG E J, COOPER H, PUGEAULT N, et al. Sign language recognition using sequential pattern trees[C]. Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 2200–2207.

    59. [59]

      VON AGRIS U, ZIEREN J, CANZLER U, et al. Recent developments in visual sign language recognition[J]. Universal Access in the Information Society, 2008, 6(4): 323–362. doi: 10.1007/s10209-007-0104-x

    60. [60]

      EFTHIMIOU E and FOTINEA S E. GSLC: Creation and annotation of a Greek sign language corpus for HCI[C]. Proceedings of the 4th International Conference on Universal Access in Human-Computer Interaction, Beijing, China, 2007: 657–666.

    61. [61]

      NEIDLE C, THANGALI A, and SCLAROFF S. Challenges in development of the American Sign Language lexicon video dataset (ASLLVD) corpus[C]. Proceedings of the 5th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon, Istanbul, Turkey, 2012.

    62. [62]

      OSZUST M and WYSOCKI M. Polish sign language words recognition with Kinect[C]. Proceedings of the 6th International Conference on Human System Interactions (HSI), Sopot, Poland, 2013: 219–226.

    63. [63]

      RONCHETT F, QUIROGA F, ESTREBOU C A, et al. LSA64: An Argentinian sign language dataset[C]. Proceedings of the 22nd Congreso Argentino de Ciencias de la Computación (CACIC 2016), San Luis, USA, 2016: 794–803. (未找到出版地信息, 请核对)

    64. [64]

      CHAI Xiujuan, WANG Hanjie, and CHEN Xilin. The DEVISIGN large vocabulary of Chinese sign language database and baseline evaluations[R]. Technical Report VIPL-TR-14-SLR-001, 2014.

    65. [65]

      LU Pengfei and HUENERFAUTH M. Collecting and evaluating the CUNY ASL corpus for research on American sign language animation[J]. Computer Speech & Language, 2014, 28(3): 812–831. doi: 10.1016/j.csl.2013.10.004

    66. [66]

      SHOHIEB S M, ELMINIR H K, and RIAD A M. Signsworl