The measure of text similarity as a tool for assessment of intertextuality in large collections of documents

The method for detection of intertextual relations by finding implicit links using linguistic and statistical methods is suggested. Intertextuality is a presence in one text of elements and ideas from other texts. The possibility of identifying of cross-language migration of terms and ideas for prognosis and determination of ideological trajectories is demonstrated. A new text similarity measure is suggested. The measure was tested using collection of scientific documents. The measure was improved by maximizing correlation between explicit and implicit links. A method for documents clustering according to the measure of text similarity is suggested. The possible application of the proposed measure for analysis of extremist texts from the Internet is suggested.


Intertextuality, migration of ideas, measure of similarity, texts clustering, implicit links

