•  
  •  
 

Article Type

Original Study

Abstract

The rapid expansion of the Internet has revolutionized access to information, especially in the area of unstructured data, most of which consists of textual content. While instant access to information brings many advantages, it has also given rise to a prevalent problem – plagiarism. Copying and reusing materials without proper permission poses a significant threat to academic integrity and integrity. Rates of plagiarism, especially in academic and scientific publications, have risen with the advent of the Internet, reaching alarming levels, such as 60% in student projects. This study examines the proposed model that includes computation of similarity using cosine coefficients, Euclidean similarity, and Jaccard similarity between training and test texts, providing a variety of metrics for comparison and analysis. These sequential steps combine automated analysis with human interpretation, enhancing the effectiveness and accuracy of the plagiarism checker and making it easier to use in many different fields and applications. The results showed that it is possible to accurately determine the similarity between texts.

Keywords

NLP, Cosine, Euclidean similarity, Jaccard, Text similarity

Share

COinS