Designing an Efficient Deduplication Algorithm for Audio Files in Cloud Storage

Ammar Zakzouk, Al-Esraa University – Baghdad – Iraq AND Homs University – Homs - SyriaFollow
Alaa Al Sebae, Warwick University – Coventry – United KingdomFollow
Hasan Hasan, Homs University – Homs - SyriaFollow

Article Type

Article

Abstract

Data duplication is a significant challenge in large-scale data storage systems, as it consumes storage space and impacts data organization, management, and processing. An optimal storage system effectively utilizes available storage space. To solve this problem, hash algorithms are employed to generate hash keys for files. Matching files have the same hash key. However, the hash key for two different files in the data may match, and this is what we refer to as a collision. The collision issue is related to the length of the hash key. As the length of the hash key increases, the probability of a collision occurring decreases. When a file is uploaded to the cloud storage system, its hash key is compared with the existing keys stored in the system. However, as the amount of data stored in the cloud increases, the time required for searching and matching also increases. In this paper, we will introduce a File-Level Deduplication technique to deduplicate audio data in the cloud storage system. The proposed technique aims to reduce the search time for hash values by creating a table with multiple indexes. These indexes are categorized based on the format of the audio file, such as uncompressed formats, formats with lossy compression, and formats with lossless compression. Each table contains multiple indexes in the hash table, specifically designed for a particular audio file format. To reduce the probability of data collision, Message Digest-6 (MD6) algorithm will be used, which generates a 512-bit hash key.

Keywords

Deduplication, Hash table, MD6, Audio files, Cloud storage

Recommended Citation

Zakzouk, Ammar; Sebae, Alaa Al; and Hasan, Hasan (2025) "Designing an Efficient Deduplication Algorithm for Audio Files in Cloud Storage," Al-Esraa University College Journal for Engineering Sciences: Vol. 7: Iss. 11, Article 1.
DOI: https://doi.org/10.70080/2790-7732.1056

Download

Included in

Biomedical Engineering and Bioengineering Commons, Chemical Engineering Commons, Civil and Environmental Engineering Commons, Computer Engineering Commons, Materials Science and Engineering Commons, Mechanical Engineering Commons

COinS

Designing an Efficient Deduplication Algorithm for Audio Files in Cloud Storage

Authors

Article Type

Abstract

Keywords

Recommended Citation

Included in

Share

Search