Applying deep learning to C# call sequence synthesis

Автор: Chebykin A.E., Kirilenko I.A.

Журнал: Труды Института системного программирования РАН @trudy-isp-ran

Статья в выпуске: 3 т.30, 2018 года.

Бесплатный доступ

Many common programming tasks, like connecting to a database, drawing an image, or reading from a file, are long implemented in various frameworks and are available via corresponding Application Programming Interfaces (APIs). However, to use them, a software engineer must first learn of their existence and then of the correct way to utilize them. Currently, the Internet seems to be the best and the most common way to gather such information. Recently, a deep-learning-based solution was proposed in the form of DeepAPI tool. Given English description of the desired functionality, sequence of Java function calls is generated. In this paper, we show the way to apply this approach to a different programming language (C# over Java) that has smaller open code base; we describe techniques used to achieve results close to the original, as well as techniques that failed to produce an impact. Finally, we release our dataset, code and trained model to facilitate further research.

Еще

Api, deep learning, code search, rnn, transfer learning

Короткий адрес: https://sciup.org/14916551

IDR: 14916551 | DOI: 10.15514/ISPRAS-2018-30(3)-5

Список литературы Applying deep learning to C# call sequence synthesis

M. P. Robillard and R. Deline. A field study of api learning obstacles. Empirical Software Engineering, vol. 16, no. 6, 2011, pp. 703-732.
M. P. Robillard. What makes apis hard to learn? Answers from developers. IEEE software, vol. 26, no. 6, 2009, pp, 27-34.
J. Stylos and B. A. Myers. Mica: A web-search tool for finding api components and examples. In Proc. of the IEEE Symposium on Visual Languages and Human-Centric Computing, 2006, pp. 195-202.
J. Fowkes and C. Sutton. Parameter-free probabilistic api mining across github. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016, pp. 254-265.
S. Shoham, E. Yahav, S. J. Fink, and M. Pistoia. Static specification mining using automata-based abstractions, IEEE Transactions on Software Engineering, vol. 34, no. 5, 2008, pp. 651-666.
M. Raghothaman, Y. Wei, and Y. Hamadi. Swim: Synthesizing what i mean-code search and idiomatic snippet synthesis, In Proc. of the IEEE/ACM 38th International Conference on Software Engineering (ICSE), 2016, pp. 357-367.
X. Gu, H. Zhang, D. Zhang, and S. Kim. Deep api learning In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016, pp. 631-642.
A. Chebykin, M. Kita, and I. Kirilenko. Deepapi#: Clr/c# call sequence synthesis from text query. In Proceedings of the Second Conference on Software Engineering and Information Management, vol. 1864. CEUR-WS.org, 2017, pp. 6-11. . Available: http://ceur-ws.org/Vol-1864/paper 5.pdf
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, 2014, pp. 3104-3112.
K. Cho, B. Van Merri¨ enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation, arXiv preprint arXiv:1406.1078, 2014.
M. Schuster and K. K. Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, vol. 45, no. 11, 1997, pp. 2673-2681.
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
P. H. Calais Guerra, A. Veloso, W. Meira Jr, and V. Almeida. From bias to opinion: a transfer-learning approach to real-time sentiment analysis. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011, pp. 150-158.
B. Zoph, D. Yuret, J. May, and K. Knight. Transfer learning for low-resource neural machine translation. arXiv preprint arXiv:1604.02201, 2016.
G. Klein, Y. Kim, Y. Deng, J. Senellart, and A. M. Rush. Opennmt: Open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810, 2017.
F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with lstm. Neural Computation, vol. 12, issue 10, 2000, pp. 2451-2471
A. Graves, A.-R. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In Proceedings of the IEEE international conference on Acoustics, speech and signal processing, 2013, pp. 6645-6649.
J. Kiefer and J. Wolfowitz. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, vol. 23, 1952, pp. 462-466.
P. Koehn. Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In Proceedings of the Conference of the Association for Machine Translation in the Americas, 2004, pp. 115-124.
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, 2002, pp. 311-318.
L. A. Granka, T. Joachims, and G. Gay. Eye-tracking analysis of user behavior in www search. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, 2004, pp. 478-479.
T. Xie and J. Pei. Mapo: Mining api usages from open source repositories. In Proceedings of the 2006 international workshop on Mining software repositories, 2006, pp. 54-57.
J. Wang, Y. Dang, H. Zhang, K. Chen, T. Xie, and D. Zhang. Mining succinct and high-coverage api usage patterns from source code. In Proceedings of the 10th Working Conference on Mining Software Repositories, 2013, pp. 319-328.
M. Allamanis and C. Sutton. Mining idioms from source code. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2014, pp. 472-483.
A. Desai, S. Gulwani, V. Hingorani, N. Jain, A. Karkare, M. Marron, S. Roy. Program synthesis using natural language. In Proceedings of the 38th International Conference on Software Engineering, 2016, pp. 345-356.
S. Gulwani and M. Marron. Nlyze: Interactive programming by natural language for spreadsheet data analysis and manipulation. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, 2014, pp. 803-814.
W. Ling, E. Grefenstette, K. M. Hermann, T. Koˇ cisk` y, A. Senior, F. Wang, and P. Blunsom. Latent predictor networks for code generation. arXiv preprint arXiv:1603.06744, 2016.
P. Yin and G. Neubig. A syntactic neural model for general-purpose code generation. arXiv preprint arXiv:1704.01696, 2017.
X. Gu, H. Zhang, D. Zhang, and S. Kim. Deepam: Migrate apis with multi-modal sequence to sequence learning. arXiv preprint arXiv:1704.07734, 2017.
X. V. Lin, C. Wang, D. Pang, K. Vu, and M. D. Ernst. Program synthesis from natural language using recurrent neural networks. Technical Report UW-CSE-17-03-01, University of Washington, Department of Computer Science and Engineering, 2017.

Еще

Статья научная