Article Type
Original Study
Abstract
The rapid expansion of the Internet has revolutionized access to information, especially in the area of unstructured data, most of which consists of textual content. While instant access to information brings many advantages, it has also given rise to a prevalent problem – plagiarism. Copying and reusing materials without proper permission poses a significant threat to academic integrity and integrity. Rates of plagiarism, especially in academic and scientific publications, have risen with the advent of the Internet, reaching alarming levels, such as 60% in student projects. This study examines the proposed model that includes computation of similarity using cosine coefficients, Euclidean similarity, and Jaccard similarity between training and test texts, providing a variety of metrics for comparison and analysis. These sequential steps combine automated analysis with human interpretation, enhancing the effectiveness and accuracy of the plagiarism checker and making it easier to use in many different fields and applications. The results showed that it is possible to accurately determine the similarity between texts.
Keywords
NLP, Cosine, Euclidean similarity, Jaccard, Text similarity
Recommended Citation
Jaafar, Noor Abdulmuttaleb
(2024)
"A Study on Improving the Accuracy and Effectiveness of Similarity Detection Processes in Text Files Using NLP Techniques,"
Al-Esraa University College Journal for Engineering Sciences: Vol. 6:
Iss.
9, Article 1.
DOI: https://doi.org/10.70080/2790-7732.1001
Included in
Biomedical Engineering and Bioengineering Commons, Chemical Engineering Commons, Civil and Environmental Engineering Commons, Computer Engineering Commons, Materials Science and Engineering Commons, Mechanical Engineering Commons