Selfie sign language recognition with convolutional neural networks

Автор: P.V.V. Kishore, G. Anantha Rao, E. Kiran Kumar, M. Teja Kiran Kumar, D. Anil Kumar

Журнал: International Journal of Intelligent Systems and Applications @ijisa

Статья в выпуске: 10 vol.10, 2018 года.

Бесплатный доступ

Extraction of complex head and hand movements along with their constantly changing shapes for recognition of sign language is considered a difficult problem in computer vision. This paper proposes the recognition of Indian sign language gestures using a powerful artificial intelligence tool, convolutional neural networks (CNN). Selfie mode continuous sign language video is the capture method used in this work, where a hearing-impaired person can operate the Sign language recognition (SLR) mobile application independently. Due to non-availability of datasets on mobile selfie sign language, we initiated to create the dataset with five different subjects performing 200 signs in 5 different viewing angles under various background environments. Each sign occupied for 60 frames or images in a video. CNN training is performed with 3 different sample sizes, each consisting of multiple sets of subjects and viewing angles. The remaining 2 samples are used for testing the trained CNN. Different CNN architectures were designed and tested with our selfie sign language data to obtain better accuracy in recognition. We achieved 92.88 % recognition rate compared to other classifier models reported on the same dataset.

Еще

Selfie sign language, Convolutional Neural Networks (CNN), Stochastic pooling, Sign language recognition (SLR), Deep learning

Короткий адрес: https://sciup.org/15016536

IDR: 15016536   |   DOI: 10.5815/ijisa.2018.10.07

Список литературы Selfie sign language recognition with convolutional neural networks

  • Parton, Becky Sue. "Sign language recognition and translation: A multidisciplined approach from the field of artificial intelligence." Journal of deaf studies and deaf education, winter:11, no.1, 2006, pp:94-101. doi:10.1093/deafed/enj003.
  • Mitra, Sushmita, and Tinku Acharya. "Gesture recognition: A survey." IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 37, no.3, 2007, pp: 311-324. doi: 10.1109/TSMCC.2007.893280.
  • Raffa, Giuseppe, Lama Nachman, and Jinwon Lee. "Efficient gesture processing." U.S. Patent 9,535,506, issued January 3, 2017.
  • Liu, Zhengzhe, Fuyang Huang, Gladys Wai Lan Tang, Felix Yim Binh Sze, Jing Qin, et al. "Real-time Sign Language Recognition with Guided Deep Convolutional Neural Networks." In Proceedings of the 2016 Symposium on Spatial User Interaction, pp. 187-187. ACM, 2016. doi:10.1145/2983310.2989187.
  • Chen, Feng-Sheng, Chih-Ming Fu, and Chung-Lin Huang. "Hand gesture recognition using a real-time tracking method and hidden Markov models." Image and vision computing 21, no.8, 2003,pp: 745-758. doi: 10.1016/S0262-8856(03)00070-2.
  • Cavender, Anna, Rahul Vanam, Dane K. Barney, Richard E. Ladner, and Eve A. Riskin. "MobileASL: Intelligibility of sign language video over mobile phones." Disability and Rehabilitation: Assistive Technology 3, no. 1-2 , 2008 pp: 93-105. doi: 10.1080/17483100701343475.
  • Starner, Thad, Joshua Weaver, and Alex Pentland. "Real-time american sign language recognition using desk and wearable computer based video." IEEE Transactions on Pattern Analysis and Machine Intelligence 20, no. 12, 1998, pp:1371-1375. doi: 10.1109/34.735811.
  • Kushwah, Mukul Singh, Manish Sharma, Kunal Jain, and Anish Chopra. "Sign Language Interpretation Using Pseudo Glove." In Proceeding of International Conference on Intelligent Communication, Control and Devices, pp. 9-18. Springer Singapore, 2017.
  • Kumar, Pradeep, Himaanshu Gauba, Partha Pratim Roy, and Debi Prosad Dogra. "Coupled HMM-based Multi-Sensor Data Fusion for Sign Language Recognition." Pattern Recognition Letters, Vol. 86, pp.1-8, 2017. doi: 10.1016/j.patrec.2016.12.004
  • Bhuyan, M. K., D. Ghoah, and P. K. Bora. "A framework for hand gesture recognition with applications to sign language." In India Conference, 2006 Annual IEEE, pp. 1-6. IEEE, 2006. doi: 10.1109/INDCON.2006.302823.
  • Yu Zhou and Xilin Chen, “Adaptive sign language recognition with Exemplar extraction and MAP/IVFS”, IEEE signal processing letters, Vol 17, No-3, March 2010, pp297-300. doi: 10.1109/LSP.2009.2038251.
  • Och, J., Ney, H., “A systematic comparison of various alignment models”. Computational Linguistics 29 (1), pp.19–51, 2003. doi: 10.1162/089120103321337421
  • Koehn, Philipp. "Pharaoh: a beam search decoder for phrase-based statistical machine translation models." In Conference of the Association for Machine Translation in the Americas, pp. 115-124. Springer, Berlin, Heidelberg, 2004.
  • Kishore PVV, Rajesh Kumar P. “A video based Indian Sign Language Recognition System (INSLR) using wavelet transform and fuzzy logic”. International Journal of Engineering and Technology. 4(5), pp.537-42, 2012. doi: 10.7763/IJET.2012.V4.427.
  • Inthiyaz Syed, B.T.P.Madhav, and P.V.V.Kishore. "Flower segmentation with level sets evolution controlled by colour, texture and shape features." Cogent Engineering 4, no.1(2017):1323572.doi:10.1080/23311916.2017.1323572.
  • Shimada, Mitsuaki, Satoshi Iwasaki, and Toshiyuki Asakura. "Finger spelling recognition using neural network with pattern recognition model." In SICE 2003 Annual Conference, vol. 3, pp. 2458-2463. IEEE, 2003.
  • Rätsch, Gunnar, Takashi Onoda, and K-R. Müller. "Soft margins for AdaBoost." Machine learning, vol.42, no.3, pp.287-320, 2001. doi: 10.1023/A:1007618119488.
  • Z. Dong, X. Tian, “Multi-level photo quality assessment with multi-view features”, Neurocomputing. Vol.168, pp.308-319, 2015. doi: 10.1016/j.neucom.2015.05.095.
  • Z. Dong, X. Shen, H. Li, X. Tian, “Photo quality assessment with DCNN that understands image well”, In proceedings of the International Conference on MultiMedia Modeling (MMM), 2015, pp.524-535.
  • X. Lu, Z. Lin, H. Jin, J. Yang, J. Wang, “Rating pictorial aesthetics using deep learning”, In proceedings of the ACM Conference on Multimedia, 2014, 457-466.
  • A. Krizhevsky, I.Sutskever, G.E. Hinton, “ImageNet classification with deep convolution neural networks”, In proceedings of the Annual Conference on Neural Information Processing System (NIPS), 2012, pp.1097-1105.
  • Y. Sun, X. Wang, X. Tang, “Deep learning face representation from predicting 10,000 classes”, In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1891-1898.
  • K. Jarrett, K. Kavukcuoglu, M. Ranzato, Y. LeCun, “What is the best multi-stage architecture for object recognition”, In proceedings of the IEEE International Conference on Computer Vision (ICCV), 2009, pp. 2146-2153. doi: 10.1109/ICCV.2009.5459469.
  • H. Lee, R. Grosse, R. Ranganath, A.Y.Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations”, In proceedings of the International Conference on Machine Learning (ICML), 2009, pp. 609-616. doi: 10.1145/1553374.1553453.
  • Y. Bengio, “Learning deep architectures for AI, Foundations and trends in Machine Learning”, Vol. 2, No. 1, pp. 1-127, 2009. doi: 10.1561/2200000006.
  • Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition”, In proceedings of the IEEE , Vol. 86, No. 11, pp. 2278-2324, 1998. doi: 10.1109/5.726791.
  • H. Lee, A. Battle, R. Raina and A. Y. Ng, “Efficient sparse coding algorithms”, In Advances in neural information processing systems, pp. 801-808, 2006.
  • R. Salakhutdinov and G. E. Hinton, “Deep Boltzmann Machines”, In proceedings of the International Conference on Artificial Intelligence and Statistics, Clearwater Beach, Florida USA, pp. 448-455, 2009.
  • Y. LeCun, Y. Bengio and G. Hinton, “Deep learning”, Nature, vol. 521, No. 7553, pp. 436-444, 2015. doi: 10.1038/nature14539.
  • Karpathy, Andrej, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. "Large-scale video classification with convolutional neural networks." In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725-1732. 2014. doi: 10.1109/CVPR.2014.223.
  • Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional networks for action recognition in videos." In Advances in neural information processing systems, pp. 568-576. 2014.
  • H. Lee, R. Grosse, R. Ranganath, A.Y.Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations”, In proceedings of the International Conference on Machine Learning (ICML), 2009, pp. 609-616. doi: 10.1145/1553374.1553453.
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, “ImageNet: a large-scale hierarchical image dataset”, In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) , 2009, pp. 248-255. doi: 10.1109/CVPR.2009.5206848.
  • A. Krizhevsky, I. Sutskever and G. E. Hinton, “Imagenet classification with deep convolutional neural networks”, In Advances in Neural Information Processing Systems(NIPS), Lake Tahoe, Nevada, USA pp. 1097-1105, 2012.
  • Rao, G. Anantha, and P. V. V. Kishore. "Sign language recognition system simulated for video captured with smart phone front camera." International Journal of Electrical and Computer Engineering 6.5 (2016): 2176. doi: 10.11591/ijece.v6i5.11384
  • Rao, G. Anantha, P. V. V. Kishore, D. Anil Kumar, and A. S. C. S. Sastry. "Neural network classifier for continuous sign language recognition with selfie video." Far East Journal of Electronics and Communications 17.1: 49,2017.
  • Rao, G. Anantha, and P. V. V. Kishore. "Selfie video based continuous Indian sign language recognition system." Ain Shams Engineering Journal (2017). doi: 10.1016/j.asej.2016.10.013
  • K. V. V. Kumar, P. V. V. Kishore, and D. Anil Kumar, “Indian Classical Dance Classification with Adaboost Multiclass Classifier on Multifeature Fusion,” Mathematical Problems in Engineering, vol. 2017, Article ID 6204742, 18 pages, 2017. doi: 10.1155/2017/6204742
Еще
Статья научная