On usage of machine learning for natural language processing tasks as illustrated by educational content mining

Автор: Melnikov A.V., Botov D.S., Klenin J.D.

Журнал: Онтология проектирования @ontology-of-designing

Рубрика: Методы и технологии принятия решений

Статья в выпуске: 1 (23) т.7, 2017 года.

Бесплатный доступ

In this paper, we review most popular approaches to a variety of natural language processing (NLP) tasks, primarily those, which involve machine learning: from classics to state-of-the-art technologies. Most modern approaches can be separated into three rough categories: ones based on distributional hypothesis, those extracting information from graph-like structures (such as ontologies) and the ones that look for lexico-syntactic patterns in text documents. We focus mainly on the former of the three. Before the analysis can even begin, one of the important steps in preparation stage of NLP is the task of representing words and documents as numeric vectors. There exists a variety of approaches from the most simplistic Bag-of-Words to sophisticated machine learning methods, such as word embedding. Today, in the task of information retrieval the best quality for both English and Russian languages is achieved by approaches based on word embedding algorithms, trained on carefully picked text corpora in conjunction with deep syntactic and semantic analysis using various deep neural networks. A big variety of different machine learning algorithms is being applied for NLP tasks such as Part-of-Speech-tagging, text summarization, named entity recognition, document classification, topic and relation extraction and natural language question answering. We also review possibilities of applying these approaches and methods to educational content analysis, and propose the novel approach to utilizing NLP and machine learning capabilities in analyzing and synthesizing educational content in a form of a decision support systems.

Еще

Machine learning, natural language processing, educational data mining, semantic similarity, deep learning, neural networks

Короткий адрес: https://sciup.org/170178741

IDR: 170178741 | DOI: 10.18287/2223-9537-2017-7-1-34-47

Список литературы On usage of machine learning for natural language processing tasks as illustrated by educational content mining

Damashek M. Gauging similarity with n-grams: Language-independent categorization of text // Science, New Series, 1995, vol. 267, pp. 843-848.
Harris Zellig S. Distributional Structure [Journal] // WORD, 1954, no. 10, pp. 146-162.
Jones K. A Statistical Interpretation of Term Specificity and Its Application in Retrieval // Journal of Documentation, 1972, no. 28, pp. 11-21.
Manning C., Raghavan P., Schutze H. Scoring, term weighting, and the vector space model. Introduction to Infor-mation Retrieval, 2008, 100 p.
Panchenko A., Morozova O., Naets H. A Semantic Similarity Measure Based on Lexico-Syntactic Patterns. Proceed-ings of the 11th Conference on Natural Language Processing (KONVENS 2012), Vienna (Austria), 2012, pp. 174-178.

Статья научная