ABSTRACT
Despite advances in machine learning and deep neural networks, there is still a huge gap between machine and human image understanding. One of the causes is the annotation process used to label training images. In most image categorization tasks, there is a fundamental ambiguity between some image categories and the underlying class probability differs from very obvious cases to ambiguous ones. However, current machine learning systems and applications usually work with discrete annotation processes and the training labels do not reflect this ambiguity. To address this issue, we propose an new image annotation framework where labeling incorporates human gaze behavior. In this framework, gaze behavior is used to predict image labeling difficulty. The image classifier is then trained with sample weights defined by the predicted difficulty. We demonstrate our approach's effectiveness on four-class image classification tasks.
References
- A. Borji and L. Itti. 2014. Human vs. computer in scene and object recognition. In Proc. CVPR. 113--120. Google ScholarDigital Library
- B.E. Boser, I.M. Guyon, and V.N. Vapnik. 1992. A training algorithm for optimal margin classifiers. In Proc. COLT. 144--152. Google ScholarDigital Library
- A. Bulling, C. Weichel, and H. Gellersen. 2013. EyeContext: Recognition of high-level contextual cues from human visual behaviour. In Proc. CHI. 305--308. Google ScholarDigital Library
- Y. Cui, F. Zhou, Y. Lin, and S. Belongie. 2016. Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In Proc. CVPR. 1153--1162.Google Scholar
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proc. CVPR.Google Scholar
- J.A. Fails and D.R. Olsen Jr. 2003. Interactive machine learning. In Proc. IUI. 39--45. Google ScholarDigital Library
- J. Fogarty, D. Tan, A. Kapoor, and S. Winder. 2008. CueFlik: Interactive concept learning in image search. In Proc. CHI. 29--38. Google ScholarDigital Library
- R.C. Fong, W.J. Scheirer, and D.D. Cox. 2018. Using human brain activity to guide machine learning. Scientific reports 8, 1 (2018), 5397.Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proc. CVPR. 770--778.Google Scholar
- A. Karpathy, G. Toderici, S. Shetty ans T. Leung, R. Sukthankar, and L. Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proc. CVPR. 1725--1732. Google ScholarDigital Library
- A. Krizhevsky, I. Sutskever, and G.E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proc. NIPS. 1097--1105. Google ScholarDigital Library
- R.T. Pramod and S.P. Arun. 2016. Do computational models differ systematically from human object perception?. In Proc. CVPR. 1601--1609.Google Scholar
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A.C. Berg, and L. Fei-Fei. 2015. ImageNet large scale visual recognition challenge . IJCV 115, 3 (2015), 211--252. Google ScholarDigital Library
- H. Sattar, S. Muller, M. Fritz, and A. Bulling. 2015. Prediction of search targets from fixations in open-world settings. In Proc. CVPR. 981--990.Google Scholar
- W.J. Scheirer, S.E. Anthony, K. Nakayama, and D.D. Cox. 2014. Perceptual annotation: Measuring human vision to improve computer vision. IEEE TPAMI 36, 8 (2014), 1679--1686. Google ScholarDigital Library
- S. Shimojo, C. Simion, E. Shimojo, and C. Scheier. 2003. Gaze bias both reflects and influences preference. Nature Neuroscience 6, 12 (2003), 1317--1322.Google ScholarCross Ref
- Y. Sugano, Y. Ozaki, H. Kasai, K. Ogaki, and Y. Sato. 2014. Image preference estimation with a data-driven approach: A comparative study between gaze and image features. JEMR 7, 3 (2014).Google Scholar
- B. Zhou, A. Lapedrizaa, J. Xiao, A. Torralba, and A. Oliva. 2014. Learning deep features for scene recognition using places database. In Proc. NIPS. 487--495. Google ScholarDigital Library
Index Terms
Gaze-guided Image Classification for Reflecting Perceptual Class Ambiguity
Comments