高级搜索

零样本图像识别

兰红 方治屿

引用本文: 兰红, 方治屿. 零样本图像识别[J]. 电子与信息学报, doi: 10.11999/JEIT190485 shu
Citation:  Hong LAN, Zhiyu FANG. Recent Advances in Zero-Shot Learning[J]. Journal of Electronics and Information Technology, doi: 10.11999/JEIT190485 shu

零样本图像识别

    作者简介: 兰红: 女,1969年生,教授,硕士生导师,主要研究方向为计算机视觉、图像处理与模式识别;
    方治屿: 男,1993年生,硕士生,研究方向为计算机视觉与深度学习
    通讯作者: 兰红,lanhong69@163.com
  • 基金项目: 国家自然科学基金(61762046),江西省自然科学基金(20161BAB212048)

摘要: 深度学习在人工智能领域已经取得了非常优秀的成就,在有监督识别任务中,使用深度学习算法训练海量的带标签数据,可以达到前所未有的识别精确度。但是,由于对海量数据的标注工作成本昂贵,对罕见类别获取海量数据难度较大,所以如何识别在训练过程中少见或从未见过的未知类仍然是一个严峻的问题。针对这个问题,该文回顾近年来的零样本图像识别技术研究,从研究背景、模型分析、数据集介绍、实验分析等方面全面阐释零样本图像识别技术。此外,该文还分析了当前研究存在的技术难题,并针对主流问题提出一些解决方案以及对未来研究的展望,为零样本学习的初学者或研究者提供一些参考。

English

    1. [1]

      SUN Yi, CHEN Yuheng, WANG Xiaogang, et al. Deep learning face representation by joint identification-verification[C]. The 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 1988–1996.

    2. [2]

      LIU Chenxi, ZOPH B, NEUMANN M, et al. Progressive neural architecture search[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 19–35.

    3. [3]

      LEDIG C, THEIS L, HUSZÁR F, et al. Photo-realistic single image super-resolution using a generative adversarial network[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 105–114.

    4. [4]

      BIEDERMAN I. Recognition-by-components: A theory of human image understanding[J]. Psychological Review, 1987, 94(2): 115–147. doi: 10.1037/0033-295X.94.2.115

    5. [5]

      LAROCHELLE H, ERHAN D, and BENGIO Y. Zero-data learning of new tasks[C]. The 23rd National Conference on Artificial Intelligence, Chicago, USA, 2008: 646–651.

    6. [6]

      PALATUCCI M, POMERLEAU D, HINTON G, et al. Zero-shot learning with semantic output codes[C]. The 22nd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2009: 1410–1418.

    7. [7]

      LAMPERT C H, NICKISCH H, and HARMELING S. Learning to detect unseen object classes by between-class attribute transfer[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009: 951–958. doi: 10.1109/CVPR.2009.5206594.

    8. [8]

      HARRINGTON P. Machine Learning in Action[M]. Greenwich, CT, USA: Manning Publications Co, 2012: 5–14.

    9. [9]

      ZHOU Dengyong, BOUSQUET O, LAL T N, et al. Learning with local and global consistency[C]. The 16th International Conference on Neural Information Processing Systems, Whistler, Canada, 2003: 321–328.

    10. [10]

      刘建伟, 刘媛, 罗雄麟. 半监督学习方法[J]. 计算机学报, 2015, 38(8): 1592–1617. doi: 10.11897/SP.J.1016.2015.01592
      LIU Jianwei, LIU Yuan, and LUO Xionglin. Semi-supervised learning methods[J]. Chinese Journal of Computers, 2015, 38(8): 1592–1617. doi: 10.11897/SP.J.1016.2015.01592

    11. [11]

      SUNG F, YANG Yongxin, LI Zhang, et al. Learning to compare: Relation network for few-shot learning[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1199–1208.

    12. [12]

      FU Yanwei, XIANG Tao, JIANG Yugang, et al. Recent advances in zero-shot recognition: Toward data-efficient understanding of visual content[J]. IEEE Signal Processing Magazine, 2018, 35(1): 112–125. doi: 10.1109/MSP.2017.2763441

    13. [13]

      XIAN Yongqin, LAMPERT C H, SCHIELE B, et al. Zero-shot learning—A comprehensive evaluation of the good, the bad and the ugly[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(9): 2251–2265. doi: 10.1109/TPAMI.2018.2857768

    14. [14]

      WANG Wenlin, PU Yunchen, VERMA V K, et al. Zero-shot learning via class-conditioned deep generative models[C]. The Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 4211–4218. (请核对会议名称信息)

    15. [15]

      FU Yanwei, HOSPEDALES T M, XIANG Tao, et al. Attribute learning for understanding unstructured social activity[C]. The 12th European Conference on Computer Vision, Florence, Italy, 2012: 530–543.

    16. [16]

      ANTOL S, ZITNICK C L, and PARIKH D. Zero-shot learning via visual abstraction[C]. The 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 401–416.

    17. [17]

      ROBYNS P, MARIN E, LAMOTTE W, et al. Physical-layer fingerprinting of LoRa devices using supervised and zero-shot learning[C]. The 10th ACM Conference on Security and Privacy in Wireless and Mobile Networks, Boston, USA, 2017: 58–63. doi: 10.1145/3098243.3098267.

    18. [18]

      YANG Yang, LUO Yadan, CHEN Weilun, et al. Zero-shot hashing via transferring supervised knowledge[C]. The 24th ACM international conference on Multimedia, Amsterdam, The Netherlands, 2016: 1286–1295. doi: 10.1145/2964284.2964319.

    19. [19]

      PACHORI S, DESHPANDE A, and RAMAN S. Hashing in the zero shot framework with domain adaptation[J]. Neurocomputing, 2018, 275: 2137–2149. doi: 10.1016/j.neucom.2017.10.061

    20. [20]

      LIU Jingen, KUIPERS B, and SAVARESE S. Recognizing human actions by attributes[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, USA, 2011: 3337–3344.

    21. [21]

      FU Yanwei, HOSPEDALES T M, XIANG Tao, et al. Learning multimodal latent attributes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(2): 303–316. doi: 10.1109/TPAMI.2013.128

    22. [22]

      JAIN M, VAN GEMERT J C, MENSINK T, et al. Objects2action: Classifying and localizing actions without any video example[C]. The IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 4588–4596.

    23. [23]

      XU Baohan, FU Yanwei, JIANG Yugang, et al. Video emotion recognition with transferred deep feature encodings[C]. The 2016 ACM on International Conference on Multimedia Retrieval, New York, USA, 2016: 15–22.

    24. [24]

      JOHNSON M, SCHUSTER M, LE Q V, et al. Google’s multilingual neural machine translation system: Enabling zero-shot translation[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 339–351. doi: 10.1162/tacl_a_00065

    25. [25]

      PRATEEK VEERANNA S, JINSEOK N, ENELDO L M, et al. Using semantic similarity for multi-label zero-shot classification of text documents[C]. The 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 2016: 423–428.

    26. [26]

      DALAL N and TRIGGS B. Histograms of oriented gradients for human detection[C]. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, United States, 2005: 886–893.

    27. [27]

      LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91–110. doi: 10.1023/B:VISI.0000029664.99615.94

    28. [28]

      BAY H, ESS A, TUYTELAARS T, et al. Speeded-up robust features (SURF)[J]. Computer Vision and Image Understanding, 2008, 110(3): 346–359. doi: 10.1016/j.cviu.2007.09.014

    29. [29]

      ROMERA-PAREDES B and TORR P H S. An embarrassingly simple approach to zero-shot learning[C]. The 32nd International Conference on International Conference on Machine Learning, Lille, France, 2015: 2152–2161.

    30. [30]

      ZHANG Li, XIANG Tao, and GONG Shaogang. Learning a deep embedding model for zero-shot learning[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 3010–3019.

    31. [31]

      LI Yan, ZHANG Junge, ZHANG Jianguo, et al. Discriminative learning of latent features for zero-shot recognition[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7463–7471.

    32. [32]

      WANG Xiaolong, YE Yufei, and GUPTA A. Zero-shot recognition via semantic embeddings and knowledge graphs[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 6857–6866.

    33. [33]

      WAH C, BRANSON S, WELINDER P, et al. The caltech-UCSD birds-200-2011 dataset[R]. Technical Report CNS-TR-2010-001, 2011.

    34. [34]

      MIKOLOV T, SUTSKEVER I, CHEN Kai, et al. Distributed representations of words and phrases and their compositionality[C]. The 26th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, 2013: 3111–3119.

    35. [35]

      LEE C, FANG Wei, YEH C K, et al. Multi-label zero-shot learning with structured knowledge graphs[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1576–1585.

    36. [36]

      JETLEY S, ROMERA-PAREDES B, JAYASUMANA S, et al. Prototypical priors: From improving classification to zero-shot learning[J]. arXiv: 1512.01192, 2015.

    37. [37]

      KARESSLI N, AKATA Z, SCHIELE B, et al. Gaze embeddings for zero-shot image classification[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6412–6421.

    38. [38]

      REED S, AKATA Z, LEE H, et al. Learning deep representations of fine-grained visual descriptions[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 49–58.

    39. [39]

      ELHOSEINY M, ZHU Yizhe, ZHANG Han, et al. Link the head to the "beak": Zero shot learning from noisy text description at part precision[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6288–6297. doi: 10.1109/CVPR.2017.666.

    40. [40]

      LAZARIDOU A, DINU G, and BARONI M. Hubness and pollution: Delving into cross-space mapping for zero-shot learning[C]. The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, 2015: 270–280.

    41. [41]

      WANG Xiaoyang and JI Qiang. A unified probabilistic approach modeling relationships between attributes and objects[C]. The IEEE International Conference on Computer Vision, Sydney, Australia, 2013: 2120–2127.

    42. [42]

      AKATA Z, PERRONNIN F, HARCHAOUI Z, et al. Label-embedding for attribute-based classification[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Portland, USA, 2013: 819–826.

    43. [43]

      JURIE F, BUCHER M, and HERBIN S. Generating visual representations for zero-shot classification[C]. The IEEE International Conference on Computer Vision Workshops, Venice, Italy, 2017: 2666–2673.

    44. [44]

      FARHADI A, ENDRES I, HOIEM D, et al. Describing objects by their attributes[C]. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009: 1778–1785. doi: 10.1109/CVPR.2009.5206772.

    45. [45]

      PATTERSON G, XU Chen, SU Hang, et al. The sun attribute database: Beyond categories for deeper scene understanding[J]. International Journal of Computer Vision, 2014, 108(1/2): 59–81.

    46. [46]

      XIAO Jianxiong, HAYS J, EHINGER K A, et al. Sun database: Large-scale scene recognition from abbey to zoo[C]. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, USA, 2010: 3485–3492. doi: 10.1109/CVPR.2010.5539970.

    47. [47]

      NILSBACK M E and ZISSERMAN A. Delving deeper into the whorl of flower segmentation[J]. Image and Vision Computing, 2010, 28(6): 1049–1062. doi: 10.1016/j.imavis.2009.10.001

    48. [48]

      NILSBACK M E and ZISSERMAN A. A visual vocabulary for flower classification[C]. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, USA, 2006: 1447–1454. doi: 10.1109/CVPR.2006.42.

    49. [49]

      NILSBACK M E and ZISSERMAN A. Automated flower classification over a large number of classes[C]. 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, Bhubaneswar, India, 2008: 722–729. doi: 10.1109/ICVGIP.2008.47.

    50. [50]

      KHOSLA A, JAYADEVAPRAKASH N, YAO Bangpeng, et al. Novel dataset for fine-grained image categorization: Stanford dogs[C]. CVPR Workshop on Fine-Grained Visual Categorization, 2011.

    51. [51]

      DENG Jia, DONG Wei, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009: 248–255.

    52. [52]

      CHAO Weilun, CHANGPINYO S, GONG Boqing, et al. An empirical study and analysis of generalized zero-shot learning for object recognition in the wild[C]. The 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 52–68.

    53. [53]

      SONG Jie, SHEN Chengchao, YANG Yezhou, et al. Transductive unbiased embedding for zero-shot learning[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1024–1033.

    54. [54]

      李亚南. 零样本学习关键技术研究[D]. [博士论文], 浙江大学, 2018: 40–43.
      LI Yanan. Research on key technologies for zero-shot learning[D]. [Ph.D. dissertation], Zhejiang University, 2018: 40–43

    55. [55]

      FU Yanwei, HOSPEDALES T M, XIANG Tao, et al. Transductive multi-view zero-shot learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(11): 2332–2345. doi: 10.1109/TPAMI.2015.2408354

    56. [56]

      KODIROV E, XIANG Tao, and GONG Shaogang. Semantic autoencoder for zero-shot learning[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 4447–4456.

    57. [57]

      STOCK M, PAHIKKALA T, AIROLA A, et al. A comparative study of pairwise learning methods based on kernel ridge regression[J]. Neural Computation, 2018, 30(8): 2245–2283. doi: 10.1162/neco_a_01096

    58. [58]

      ANNADANI Y and BISWAS S. Preserving semantic relations for zero-shot learning[J]. arXiv: 1803.03049, 2018.

    59. [59]

      LI Yanan, WANG Donghui, HU Huanhang, et al. Zero-shot recognition using dual visual-semantic mapping paths[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 5207–5215.

    60. [60]

      CHEN Long, ZHANG Hanwang, XIAO Jun, et al. Zero-shot visual recognition using semantics-preserving adversarial embedding networks[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1043–1052.

    1. [1]

      赵斐, 张文凯, 闫志远, 于泓峰, 刁文辉. 基于多特征图金字塔融合深度网络的遥感图像语义分割. 电子与信息学报,

    2. [2]

      盖杉, 鲍中运. 基于改进深度卷积神经网络的纸币识别研究. 电子与信息学报,

    3. [3]

      王鑫, 李可, 宁晨, 黄凤辰. 基于深度卷积神经网络和多核学习的遥感图像分类方法. 电子与信息学报,

    4. [4]

      张烨, 许艇, 冯定忠, 蒋美仙, 吴光华. 基于难分样本挖掘的快速区域卷积神经网络目标检测研究. 电子与信息学报,

    5. [5]

      杨宏宇, 王峰岩. 基于深度卷积神经网络的气象雷达噪声图像语义分割方法. 电子与信息学报,

    6. [6]

      汪荣贵, 韩梦雅, 杨娟, 薛丽霞, 胡敏. 多级注意力特征网络的小样本学习. 电子与信息学报,

    7. [7]

      侯志强, 王帅, 廖秀峰, 余旺盛, 王姣尧, 陈传华. 基于样本质量估计的空间正则化自适应相关滤波视觉跟踪. 电子与信息学报,

    8. [8]

      郭晨, 简涛, 徐从安, 何友, 孙顺. 基于深度多尺度一维卷积神经网络的雷达舰船目标识别. 电子与信息学报,

    9. [9]

      袁野, 贾克斌, 刘鹏宇. 基于深度卷积神经网络的多元医学信号多级上下文自编码器. 电子与信息学报,

    10. [10]

      刘嘉铭, 邢孟道, 符吉祥, 徐丹. 基于模型重建的深度卷积网络权值可视化方法. 电子与信息学报,

    11. [11]

      毕秀丽, 魏杨, 肖斌, 李伟生, 马建峰. 基于级联卷积神经网络的图像篡改检测算法. 电子与信息学报,

    12. [12]

      秦华标, 曹钦平. 基于FPGA的卷积神经网络硬件加速器设计. 电子与信息学报,

    13. [13]

      贺丰收, 何友, 刘准钆, 徐从安. 卷积神经网络在雷达自动目标识别中的研究进展. 电子与信息学报,

    14. [14]

      王巍, 周凯利, 王伊昌, 王广, 袁军. 基于快速滤波算法的卷积神经网络加速器设计. 电子与信息学报,

    15. [15]

      兰巨龙, 于倡和, 胡宇翔, 李子勇. 基于深度增强学习的软件定义网络路由优化机制. 电子与信息学报,

    16. [16]

      董书琴, 张斌. 基于深度特征学习的网络流量异常检测方法. 电子与信息学报,

    17. [17]

      唐伦, 魏延南, 马润琳, 贺小雨, 陈前斌. 虚拟化云无线接入网络下基于在线学习的网络切片虚拟资源分配算法. 电子与信息学报,

    18. [18]

      冯浩, 黄坤, 李晶, 高榕, 刘东华, 宋成芳. 基于深度学习的混合兴趣点推荐算法. 电子与信息学报,

    19. [19]

      张小恒, 李勇明, 王品, 曾孝平, 颜芳, 张艳玲, 承欧梅. 基于语音卷积稀疏迁移学习和并行优选的帕金森病分类算法研究. 电子与信息学报,

    20. [20]

      梁晓萍, 郭振军, 朱昌洪. 基于头脑风暴优化算法的BP神经网络模糊图像复原. 电子与信息学报,

  • 图 1  零样本学习技术结构图

    图 2  零样本学习示意图

    图 3  经典归纳式零样本模型示意图[7]

    图 4  AwA类-属性关系矩阵[7]

    图 5  3种视觉-语义映射示意图

    图 6  领域漂移示例图[55]

    图 7  语义间隔示例图

    表 1  机器学习方法对比表

    训练集$\{ \cal{X},\cal{Y}\} $测试集$\{ \cal{X},\cal{Z}\} $训练类$\cal{Y}$与测试类$\cal{Z}$间关系$R$最终分类器$C$
    无监督学习大量无标签图片已知类图片$\cal{Y} = \cal{Z}$$C:\cal{X} \to \cal{Y}$
    有监督学习大量带标签图片已知类图片$\cal{Y} = \cal{Z}$$C:\cal{X} \to \cal{Y}$
    半监督学习较少带标签图片和大量无标签图片已知类图片$\cal{Y} = \cal{Z}$$C:\cal{X} \to \cal{Y}$
    少样本学习极少带标签图片和大量无标签图片已知类图片$\cal{Y} = \cal{Z}$$C:\cal{X} \to \cal{Y}$
    零样本学习大量带标签图片未知类图片${\cal Y} \cap {\cal Z} = \varnothing$$C:\cal{X} \to \cal{Z}$
    下载: 导出CSV

    表 2  零样本学习中深度卷积神经网络使用情况统计表

    NetPaper Count
    VGG501
    GoogleNet271
    ResNet397
    下载: 导出CSV

    表 3  零样本学习性能比较(%)

    方法传统零样本学习泛化零样本学习
    AwACUBSUNAwACUBSUN
    SSPSSSPSSSPSUTS→THUTS→THUTS→TH
    IAP46.935.927.124.017.419.40.987.61.80.272.80.41.037.81.8
    DAP58.746.137.540.038.939.90.084.70.01.767.93.34.225.17.2
    DeViSE68.659.753.252.057.556.517.174.727.823.853.032.816.927.420.9
    ConSE67.944.536.734.344.238.80.590.61.01.672.23.16.839.911.6
    SJE69.561.955.353.957.153.78.073.914.423.559.233.614.730.519.8
    SAE80.754.133.433.342.440.31.182.22.27.854.013.68.818.011.8
    SYNC71.246.654.155.659.156.310.090.518.011.570.919.87.943.313.4
    LDF83.470.4
    SP-AEN58.555.459.223.390.937.134.770.646.624.938.630.3
    QFSL84.879.769.772.161.758.366.293.177.471.574.973.251.331.238.8
    下载: 导出CSV
  • 加载中
图(7)表(3)
计量
  • PDF下载量:  8
  • 文章访问数:  271
  • HTML全文浏览量:  134
文章相关
  • 通讯作者:  兰红, lanhong69@163.com
  • 收稿日期:  2019-07-01
  • 录用日期:  2019-11-03
  • 网络出版日期:  2019-11-13
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

/

返回文章