高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于多流多状态动态贝叶斯网络的音视频连续语音识别

吕国云 蒋冬梅 张艳宁 赵荣椿 HSahli IlseRavyse 

吕国云, 蒋冬梅, 张艳宁, 赵荣椿, HSahli, IlseRavyse . 基于多流多状态动态贝叶斯网络的音视频连续语音识别[J]. 电子与信息学报, 2008, 30(12): 2906-2911. doi: 10.3724/SP.J.1146.2007.00915
引用本文: 吕国云, 蒋冬梅, 张艳宁, 赵荣椿, HSahli, IlseRavyse . 基于多流多状态动态贝叶斯网络的音视频连续语音识别[J]. 电子与信息学报, 2008, 30(12): 2906-2911. doi: 10.3724/SP.J.1146.2007.00915
Lv Guo-Yun , Jiang Dong-Mei, Zhang Yan-Ning, Zhao Rong-Chun, H Sahli, Ilse Ravyse . DBN Based Multi-stream Multi-states Model for Continue Audio-Visual Speech Recognition[J]. Journal of Electronics and Information Technology, 2008, 30(12): 2906-2911. doi: 10.3724/SP.J.1146.2007.00915
Citation: Lv Guo-Yun , Jiang Dong-Mei, Zhang Yan-Ning, Zhao Rong-Chun, H Sahli, Ilse Ravyse . DBN Based Multi-stream Multi-states Model for Continue Audio-Visual Speech Recognition[J]. Journal of Electronics and Information Technology, 2008, 30(12): 2906-2911. doi: 10.3724/SP.J.1146.2007.00915

基于多流多状态动态贝叶斯网络的音视频连续语音识别

doi: 10.3724/SP.J.1146.2007.00915
基金项目: 

中国科技部与比利时弗拉芒大区科技合作项目([2004] 487)和西北工业大学英才培养计划项目(04XD0102)资助课题

DBN Based Multi-stream Multi-states Model for Continue Audio-Visual Speech Recognition

  • 摘要: 语音和唇部运动的异步性是多模态融合语音识别的关键问题,该文首先引入一个多流异步动态贝叶斯网络(MS-ADBN)模型,在词的级别上描述了音频流和视频流的异步性,音视频流都采用了词-音素的层次结构。而多流多状态异步DBN(MM-ADBN)模型是MS-ADBN模型的扩展,音视频流都采用了词-音素-状态的层次结构。本质上,MS-ADBN是一个整词模型,而MM-ADBN模型是一个音素模型,适用于大词汇量连续语音识别。实验结果表明:基于连续音视频数据库,在纯净语音环境下,MM-ADBN比MS-ADBN模型和多流HMM识别率分别提高35.91%和9.97%。
  • [1] Dupont S and Luettin J. Audio-visual speech modeling forcontinuous speech recognition[J].IEEE Trans. on Multimedia.2000, 2(3):141-151 [2] Potamianos G, and Neti C, et al.. Recent advances in theautomatic recognition of audiovisual speech[J].Proc. IEEE.2003, 91(9):1306-1326 [3] Nefian A, Liang L, and Pi X, et al.. Dynamic Bayesiannetworks for audio-visual speech recognition[J].EURASIP,Journal on Applied Signal Processing.2002, 2002(11):1274-1288 [4] Bilmes J and Zweig G. The graphical models toolkit: An opensource software system for speech and time-series processing.In Proc. IEEE Intl. Conf. Acoustics, Speech, and SignalProcessing, Orlando, USA, 2002, 4: 3916-3919. [5] Gowdy J N, Subramanya A, and Bartels C, et al.. DBN-basedmultistream models for audio-visual speech recognition. InProc. IEEE Int. Conf. Acoustics, Speech, and SignalProcessing, Philadelphia, USA, May 2004, 1: 993-996. [6] Bilmes J and Bartels C. Graphical model architectures forspeech recognition. IEEE Signal Processing Magazine, 2005,22(5): 89-100. [7] Ravyse Ilse, Jiang D M, and Jiang X Y, et al.. DBN basedmodels for audio-visual speech analysis and recognition. 2006Pacific-Rim Conference on Multimedia (PCM 2006),Hangzhou, China, Nov 2-4, 2006: 19-30. [8] L Guoyun, Jiang Dongmei, and Sahli H, et al.. A novel DBNmodel for large vocabulary continuous speech recognition andphone segmentation. International Conference on ArtificialIntelligence and Pattern Recognition (AIPR-07), Orlando,Florida, USA, July 2007: 397-402.
  • 加载中
计量
  • 文章访问数:  3171
  • HTML全文浏览量:  43
  • PDF下载量:  1076
  • 被引次数: 0
出版历程
  • 收稿日期:  2007-06-11
  • 修回日期:  2007-11-27
  • 刊出日期:  2008-12-19

目录

    /

    返回文章
    返回

    官方微信,欢迎关注