高级搜索

零样本图像识别

兰红 方治屿

引用本文: 兰红, 方治屿. 零样本图像识别[J]. 电子与信息学报, 2020, 42(5): 1188-1200. doi: 10.11999/JEIT190485 shu
Citation:  Hong LAN, Zhiyu FANG. Recent Advances in Zero-Shot Learning[J]. Journal of Electronics and Information Technology, 2020, 42(5): 1188-1200. doi: 10.11999/JEIT190485 shu

零样本图像识别

    作者简介: 兰红: 女,1969年生,教授,硕士生导师,主要研究方向为计算机视觉、图像处理与模式识别;
    方治屿: 男,1993年生,硕士生,研究方向为计算机视觉与深度学习
    通讯作者: 兰红,lanhong69@163.com
  • 基金项目: 国家自然科学基金(61762046),江西省自然科学基金(20161BAB212048)

摘要: 深度学习在人工智能领域已经取得了非常优秀的成就,在有监督识别任务中,使用深度学习算法训练海量的带标签数据,可以达到前所未有的识别精确度。但是,由于对海量数据的标注工作成本昂贵,对罕见类别获取海量数据难度较大,所以如何识别在训练过程中少见或从未见过的未知类仍然是一个严峻的问题。针对这个问题,该文回顾近年来的零样本图像识别技术研究,从研究背景、模型分析、数据集介绍、实验分析等方面全面阐释零样本图像识别技术。此外,该文还分析了当前研究存在的技术难题,并针对主流问题提出一些解决方案以及对未来研究的展望,为零样本学习的初学者或研究者提供一些参考。

English

    1. [1]

      SUN Yi, CHEN Yuheng, WANG Xiaogang, et al. Deep learning face representation by joint identification-verification[C]. The 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 1988–1996.

    2. [2]

      LIU Chenxi, ZOPH B, NEUMANN M, et al. Progressive neural architecture search[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 19–35.

    3. [3]

      LEDIG C, THEIS L, HUSZÁR F, et al. Photo-realistic single image super-resolution using a generative adversarial network[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 105–114.

    4. [4]

      BIEDERMAN I. Recognition-by-components: A theory of human image understanding[J]. Psychological Review, 1987, 94(2): 115–147. doi: 10.1037/0033-295X.94.2.115

    5. [5]

      LAROCHELLE H, ERHAN D, and BENGIO Y. Zero-data learning of new tasks[C]. The 23rd National Conference on Artificial Intelligence, Chicago, USA, 2008: 646–651.

    6. [6]

      PALATUCCI M, POMERLEAU D, HINTON G, et al. Zero-shot learning with semantic output codes[C]. The 22nd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2009: 1410–1418.

    7. [7]

      LAMPERT C H, NICKISCH H, and HARMELING S. Learning to detect unseen object classes by between-class attribute transfer[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009: 951–958. doi: 10.1109/CVPR.2009.5206594.

    8. [8]

      HARRINGTON P. Machine Learning in Action[M]. Greenwich, CT, USA: Manning Publications Co, 2012: 5–14.

    9. [9]

      ZHOU Dengyong, BOUSQUET O, LAL T N, et al. Learning with local and global consistency[C]. The 16th International Conference on Neural Information Processing Systems, Whistler, Canada, 2003: 321–328.

    10. [10]

      刘建伟, 刘媛, 罗雄麟. 半监督学习方法[J]. 计算机学报, 2015, 38(8): 1592–1617. doi: 10.11897/SP.J.1016.2015.01592
      LIU Jianwei, LIU Yuan, and LUO Xionglin. Semi-supervised learning methods[J]. Chinese Journal of Computers, 2015, 38(8): 1592–1617. doi: 10.11897/SP.J.1016.2015.01592

    11. [11]

      SUNG F, YANG Yongxin, LI Zhang, et al. Learning to compare: Relation network for few-shot learning[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1199–1208.

    12. [12]

      FU Yanwei, XIANG Tao, JIANG Yugang, et al. Recent advances in zero-shot recognition: Toward data-efficient understanding of visual content[J]. IEEE Signal Processing Magazine, 2018, 35(1): 112–125. doi: 10.1109/MSP.2017.2763441

    13. [13]

      XIAN Yongqin, LAMPERT C H, SCHIELE B, et al. Zero-shot learning—A comprehensive evaluation of the good, the bad and the ugly[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(9): 2251–2265. doi: 10.1109/TPAMI.2018.2857768

    14. [14]

      WANG Wenlin, PU Yunchen, VERMA V K, et al. Zero-shot learning via class-conditioned deep generative models[C]. The 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 4211–4218.

    15. [15]

      FU Yanwei, HOSPEDALES T M, XIANG Tao, et al. Attribute learning for understanding unstructured social activity[C]. The 12th European Conference on Computer Vision, Florence, Italy, 2012: 530–543.

    16. [16]

      ANTOL S, ZITNICK C L, and PARIKH D. Zero-shot learning via visual abstraction[C]. The 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 401–416.

    17. [17]

      ROBYNS P, MARIN E, LAMOTTE W, et al. Physical-layer fingerprinting of LoRa devices using supervised and zero-shot learning[C]. The 10th ACM Conference on Security and Privacy in Wireless and Mobile Networks, Boston, USA, 2017: 58–63. doi: 10.1145/3098243.3098267.

    18. [18]

      YANG Yang, LUO Yadan, CHEN Weilun, et al. Zero-shot hashing via transferring supervised knowledge[C]. The 24th ACM international conference on Multimedia, Amsterdam, The Netherlands, 2016: 1286–1295. doi: 10.1145/2964284.2964319.

    19. [19]

      PACHORI S, DESHPANDE A, and RAMAN S. Hashing in the zero shot framework with domain adaptation[J]. Neurocomputing, 2018, 275: 2137–2149. doi: 10.1016/j.neucom.2017.10.061

    20. [20]

      LIU Jingen, KUIPERS B, and SAVARESE S. Recognizing human actions by attributes[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Colorado, USA, 2011: 3337–3344.

    21. [21]

      FU Yanwei, HOSPEDALES T M, XIANG Tao, et al. Learning multimodal latent attributes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(2): 303–316. doi: 10.1109/TPAMI.2013.128

    22. [22]

      JAIN M, VAN GEMERT J C, MENSINK T, et al. Objects2action: Classifying and localizing actions without any video example[C]. The IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 4588–4596.

    23. [23]

      XU Baohan, FU Yanwei, JIANG Yugang, et al. Video emotion recognition with transferred deep feature encodings[C]. The 2016 ACM on International Conference on Multimedia Retrieval, New York, USA, 2016: 15–22.

    24. [24]

      JOHNSON M, SCHUSTER M, LE Q V, et al. Google’s multilingual neural machine translation system: Enabling zero-shot translation[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 339–351. doi: 10.1162/tacl_a_00065

    25. [25]

      PRATEEK VEERANNA S, JINSEOK N, ENELDO L M, et al. Using semantic similarity for multi-label zero-shot classification of text documents[C]. The 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 2016: 423–428.

    26. [26]

      DALAL N and TRIGGS B. Histograms of oriented gradients for human detection[C]. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005: 886–893.

    27. [27]

      LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91–110. doi: 10.1023/B:VISI.0000029664.99615.94

    28. [28]

      BAY H, ESS A, TUYTELAARS T, et al. Speeded-up robust features (SURF)[J]. Computer Vision and Image Understanding, 2008, 110(3): 346–359. doi: 10.1016/j.cviu.2007.09.014

    29. [29]

      ROMERA-PAREDES B and TORR P H S. An embarrassingly simple approach to zero-shot learning[C]. The 32nd International Conference on International Conference on Machine Learning, Lille, France, 2015: 2152–2161.

    30. [30]

      ZHANG Li, XIANG Tao, and GONG Shaogang. Learning a deep embedding model for zero-shot learning[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 3010–3019.

    31. [31]

      LI Yan, ZHANG Junge, ZHANG Jianguo, et al. Discriminative learning of latent features for zero-shot recognition[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7463–7471.

    32. [32]

      WANG Xiaolong, YE Yufei, and GUPTA A. Zero-shot recognition via semantic embeddings and knowledge graphs[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 6857–6866.

    33. [33]

      WAH C, BRANSON S, WELINDER P, et al. The caltech-UCSD birds-200-2011 dataset[R]. Technical Report CNS-TR-2010-001, 2011.

    34. [34]

      MIKOLOV T, SUTSKEVER I, CHEN Kai, et al. Distributed representations of words and phrases and their compositionality[C]. The 26th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, 2013: 3111–3119.

    35. [35]

      LEE C, FANG Wei, YEH C K, et al. Multi-label zero-shot learning with structured knowledge graphs[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1576–1585.

    36. [36]

      JETLEY S, ROMERA-PAREDES B, JAYASUMANA S, et al. Prototypical priors: From improving classification to zero-shot learning[J]. arXiv: 2015, 1512.01192.

    37. [37]

      KARESSLI N, AKATA Z, SCHIELE B, et al. Gaze embeddings for zero-shot image classification[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6412–6421.

    38. [38]

      REED S, AKATA Z, LEE H, et al. Learning deep representations of fine-grained visual descriptions[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 49–58.

    39. [39]

      ELHOSEINY M, ZHU Yizhe, ZHANG Han, et al. Link the head to the "beak": Zero shot learning from noisy text description at part precision[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6288–6297. doi: 10.1109/CVPR.2017.666.

    40. [40]

      LAZARIDOU A, DINU G, and BARONI M. Hubness and pollution: Delving into cross-space mapping for zero-shot learning[C]. The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 2015: 270–280.

    41. [41]

      WANG Xiaoyang and JI Qiang. A unified probabilistic approach modeling relationships between attributes and objects[C]. The IEEE International Conference on Computer Vision, Sydney, Australia, 2013: 2120–2127.

    42. [42]

      AKATA Z, PERRONNIN F, HARCHAOUI Z, et al. Label-embedding for attribute-based classification[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Portland, USA, 2013: 819–826.

    43. [43]

      JURIE F, BUCHER M, and HERBIN S. Generating visual representations for zero-shot classification[C]. The IEEE International Conference on Computer Vision Workshops, Venice, Italy, 2017: 2666–2673.

    44. [44]

      FARHADI A, ENDRES I, HOIEM D, et al. Describing objects by their attributes[C]. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009: 1778–1785. doi: 10.1109/CVPR.2009.5206772.

    45. [45]

      PATTERSON G, XU Chen, SU Hang, et al. The sun attribute database: Beyond categories for deeper scene understanding[J]. International Journal of Computer Vision, 2014, 108(1/2): 59–81.

    46. [46]

      XIAO Jianxiong, HAYS J, EHINGER K A, et al. Sun database: Large-scale scene recognition from abbey to zoo[C]. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, USA, 2010: 3485–3492. doi: 10.1109/CVPR.2010.5539970.

    47. [47]

      NILSBACK M E and ZISSERMAN A. Delving deeper into the whorl of flower segmentation[J]. Image and Vision Computing, 2010, 28(6): 1049–1062. doi: 10.1016/j.imavis.2009.10.001

    48. [48]

      NILSBACK M E and ZISSERMAN A. A visual vocabulary for flower classification[C]. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, USA, 2006: 1447–1454. doi: 10.1109/CVPR.2006.42.

    49. [49]

      NILSBACK M E and ZISSERMAN A. Automated flower classification over a large number of classes[C]. The 6th Indian Conference on Computer Vision, Graphics & Image Processing, Bhubaneswar, India, 2008: 722–729. doi: 10.1109/ICVGIP.2008.47.

    50. [50]

      KHOSLA A, JAYADEVAPRAKASH N, YAO Bangpeng, et al. Novel dataset for fine-grained image categorization: Stanford dogs[C]. CVPR Workshop on Fine-Grained Visual Categorization, 2011.

    51. [51]

      DENG Jia, DONG Wei, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009: 248–255.

    52. [52]

      CHAO Weilun, CHANGPINYO S, GONG Boqing, et al. An empirical study and analysis of generalized zero-shot learning for object recognition in the wild[C]. The 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 52–68.

    53. [53]

      SONG Jie, SHEN Chengchao, YANG Yezhou, et al. Transductive unbiased embedding for zero-shot learning[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1024–1033.

    54. [54]

      李亚南. 零样本学习关键技术研究[D]. [博士论文], 浙江大学, 2018: 40–43.
      LI Yanan. Research on key technologies for zero-shot learning[D]. [Ph.D. dissertation], Zhejiang University, 2018: 40–43

    55. [55]

      FU Yanwei, HOSPEDALES T M, XIANG Tao, et al. Transductive multi-view zero-shot learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(11): 2332–2345. doi: 10.1109/TPAMI.2015.2408354

    56. [56]

      KODIROV E, XIANG Tao, and GONG Shaogang. Semantic autoencoder for zero-shot learning[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 4447–4456.

    57. [57]

      STOCK M, PAHIKKALA T, AIROLA A, et al. A comparative study of pairwise learning methods based on kernel ridge regression[J]. Neural Computation, 2018, 30(8): 2245–2283. doi: 10.1162/neco_a_01096

    58. [58]

      ANNADANI Y and BISWAS S. Preserving semantic relations for zero-shot learning[J]. arXiv: 2018, 1803.03049.

    59. [59]

      LI Yanan, WANG Donghui, HU Huanhang, et al. Zero-shot recognition using dual visual-semantic mapping paths[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 5207–5215.

    60. [60]

      CHEN Long, ZHANG Hanwang, XIAO Jun, et al. Zero-shot visual recognition using semantics-preserving adversarial embedding networks[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1043–1052.

    1. [1]

      申铉京, 沈哲, 黄永平, 王玉. 基于非局部操作的深度卷积神经网络车位占用检测算法. 电子与信息学报, 2020, 41(0): 1-8.

    2. [2]

      缪祥华, 单小撤. 基于密集连接卷积神经网络的入侵检测技术研究. 电子与信息学报, 2020, 41(0): 1-7.

    3. [3]

      刘小燕, 李照明, 段嘉旭, 项天远. 基于卷积神经网络的PCB板色环电阻检测与定位方法. 电子与信息学报, 2020, 41(0): 1-10.

    4. [4]

      柳长源, 王琪, 毕晓君. 基于多通道多尺度卷积神经网络的单幅图像去雨方法. 电子与信息学报, 2020, 42(0): 1-8.

    5. [5]

      游凌, 李伟浩, 张文林, 王科人. 基于深度神经网络的Morse码自动译码算法. 电子与信息学报, 2020, 41(0): 1-6.

    6. [6]

      陈卓, 冯钢, 何颖, 周杨. 运营商网络中基于深度强化学习的服务功能链迁移机制. 电子与信息学报, 2020, 42(0): 1-7.

    7. [7]

      唐伦, 曹睿, 廖皓, 王兆堃. 基于深度强化学习的服务功能链可靠部署算法. 电子与信息学报, 2020, 42(0): 1-8.

    8. [8]

      归伟夏, 陆倩, 苏美力. 关于系统级故障诊断的烟花-反向传播神经网络算法. 电子与信息学报, 2020, 42(5): 1102-1109.

    9. [9]

      邵凯, 李述栋, 王光宇, 付天飞. 基于迟滞噪声混沌神经网络的导频分配. 电子与信息学报, 2020, 41(0): 1-8.

    10. [10]

      陈前斌, 管令进, 李子煜, 王兆堃, 杨恒, 唐伦. 基于深度强化学习的异构云无线接入网自适应无线资源分配算法. 电子与信息学报, 2020, 42(6): 1468-1477.

    11. [11]

      刘政怡, 刘俊雷, 赵鹏. 基于样本选择的RGBD图像协同显著目标检测. 电子与信息学报, 2020, 42(0): 1-8.

    12. [12]

      张文明, 姚振飞, 高雅昆, 李海滨. 一种平衡准确性以及高效性的显著性目标检测深度卷积网络模型. 电子与信息学报, 2020, 42(5): 1201-1208.

    13. [13]

      张天骐, 胡延平, 冯嘉欣, 张晓艳. 基于零空间矩阵匹配的极化码参数盲识别算法. 电子与信息学报, 2020, 41(0): 1-7.

    14. [14]

      蒋瀚, 刘怡然, 宋祥福, 王皓, 郑志华, 徐秋亮. 隐私保护机器学习的密码学方法. 电子与信息学报, 2020, 42(5): 1068-1078.

    15. [15]

      陈家祯, 吴为民, 郑子华, 叶锋, 连桂仁, 许力. 基于虚拟光学的视觉显著目标可控放大重建. 电子与信息学报, 2020, 42(5): 1209-1215.

    16. [16]

      刘文斌, 吴倩, 杜玉改, 方刚, 石晓龙, 许鹏. 基于个性化网络标志物的药物推荐方法研究. 电子与信息学报, 2020, 42(6): 1340-1347.

    17. [17]

      刘坤, 吴建新, 甄杰, 王彤. 基于阵列天线和稀疏贝叶斯学习的室内定位方法. 电子与信息学报, 2020, 42(5): 1158-1164.

    18. [18]

      李骜, 刘鑫, 陈德运, 张英涛, 孙广路. 基于低秩表示的鲁棒判别特征子空间学习模型. 电子与信息学报, 2020, 42(5): 1223-1230.

    19. [19]

      王一宾, 裴根生, 程玉胜. 基于标记密度分类间隔面的组类属属性学习. 电子与信息学报, 2020, 42(5): 1179-1187.

    20. [20]

      周牧, 李垚鲆, 谢良波, 蒲巧林, 田增山. 基于多核最大均值差异迁移学习的WLAN室内入侵检测方法. 电子与信息学报, 2020, 42(5): 1149-1157.

  • 图 1  零样本学习技术结构图

    图 2  零样本学习示意图

    图 3  经典归纳式零样本模型示意图[7]

    图 4  AwA类-属性关系矩阵[7]

    图 5  3种视觉-语义映射示意图

    图 6  领域漂移示例图[55]

    图 7  语义间隔示例图

    表 1  机器学习方法对比表

    训练集$\{ \cal{X},\cal{Y}\} $测试集$\{ \cal{X},\cal{Z}\} $训练类$\cal{Y}$与测试类$\cal{Z}$间关系$R$最终分类器$C$
    无监督学习大量无标签图片已知类图片$\cal{Y} = \cal{Z}$$C:\cal{X} \to \cal{Y}$
    有监督学习大量带标签图片已知类图片$\cal{Y} = \cal{Z}$$C:\cal{X} \to \cal{Y}$
    半监督学习较少带标签图片和大量无标签图片已知类图片$\cal{Y} = \cal{Z}$$C:\cal{X} \to \cal{Y}$
    少样本学习极少带标签图片和大量无标签图片已知类图片$\cal{Y} = \cal{Z}$$C:\cal{X} \to \cal{Y}$
    零样本学习大量带标签图片未知类图片${\cal Y} \cap {\cal Z} = \varnothing$$C:\cal{X} \to \cal{Z}$
    下载: 导出CSV

    表 2  零样本学习中深度卷积神经网络使用情况统计表

    网络论文数量
    VGG501
    GoogleNet271
    ResNet397
    下载: 导出CSV

    表 3  零样本学习性能比较(%)

    方法传统零样本学习泛化零样本学习
    AwACUBSUNAwACUBSUN
    SSPSSSPSSSPSUTS→THUTS→THUTS→TH
    IAP46.935.927.124.017.419.40.987.61.80.272.80.41.037.81.8
    DAP58.746.137.540.038.939.90.084.70.01.767.93.34.225.17.2
    DeViSE68.659.753.252.057.556.517.174.727.823.853.032.816.927.420.9
    ConSE67.944.536.734.344.238.80.590.61.01.672.23.16.839.911.6
    SJE69.561.955.353.957.153.78.073.914.423.559.233.614.730.519.8
    SAE80.754.133.433.342.440.31.182.22.27.854.013.68.818.011.8
    SYNC71.246.654.155.659.156.310.090.518.011.570.919.87.943.313.4
    LDF83.470.4
    SP-AEN58.555.459.223.390.937.134.770.646.624.938.630.3
    QFSL84.879.769.772.161.758.366.293.177.471.574.973.251.331.238.8
    下载: 导出CSV
  • 加载中
图(7)表(3)
计量
  • PDF下载量:  175
  • 文章访问数:  3604
  • HTML全文浏览量:  1527
文章相关
  • 通讯作者:  兰红, lanhong69@163.com
  • 收稿日期:  2019-07-01
  • 录用日期:  2019-11-03
  • 网络出版日期:  2019-11-13
  • 刊出日期:  2020-05-01
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

/

返回文章