Analysis of the state of aircraft crew member by the speech using Gaussian models of mixtures

Основное

Автор: Andriyanov N.A., Dementiev V.E.

Журнал: Известия Самарского научного центра Российской академии наук @izvestiya-ssc

Рубрика: Информатика, вычислительная техника и управление

Статья в выпуске: 1 т.23, 2021 года.

Бесплатный доступ

The work is devoted to the study of the effectiveness of the application of models of Gaussian mixtures for the recognition of abnormal deviations in the speaker’s speech. The practical application of the developed algorithms for revealing the emotional state of the crew member by the phrase uttered by such crew member is proposed. The spectral characteristics of the speech signal are used as the main criterion for distinguishing using the Gaussian mixture model. In connection with a rather small sampling step in frequency and, accordingly, with the presence of 255 frequency components in the signal spectrum, it is proposed to compress the spectrum to 10 components. This approach made it possible to reduce the number of key parameters in the Gaussian model to 10, which, in turn, made it possible to simplify the analysis process when constructing multivariate distributions. To assess the quality of the proposed algorithm, test phrases were recorded. At the same time, various psychological states of the speaker were imitated. We used both simple unregulated speech structures and messages regulated in accordance with the Federal Aviation Rules when conducting radio exchange in civil aviation on the territory of the Russian Federation. Taking into account the limitations on the prior knowledge of the model and clustering by spectral characteristics, all recordings of the model were made by one speaker. Three classes of the speaker’s emotional state were considered. At the output, the recognition system put such marks as a calm state, a tired state, a stressful state. Various states were artificially simulated during data preparation. On a test sample of 48 messages, a Gaussian model of 3 components and 10 parameters without preliminary training immediately allowed to achieve a result of about 65%, while the probability of recognizing the correct class with 3 equal classes a priori is 33%. As further research, it is proposed to apply preliminary training using neural networks or correlation algorithms. This approach will allow further clustering at a deeper level, when, for example, the gender of the speaker is determined, a typical message of the radio exchange is determined, and then the emotional state of the speaker is revealed.

Еще

Spectral analysis, speech emotional state recognition, data mining, flight safety, gaussian mixture model

Короткий адрес: https://sciup.org/148312708

IDR: 148312708 | DOI: 10.37313/1990-5378-2021-23-1-97-102

Статья научная