Methodology for image retrieval based on binary space partitioning and perceptual image hashing
DOI:
https://doi.org/10.15276/aait.05.2022.10Keywords:
Content-based image retrieval, binary space partitioning, perceptual hashing, vantage-point tree, discrete cosine transform, discrete wavelet transformAbstract
The paper focuses on the content-based image retrieval systems building. The main challenges in the construction of such systems are considered, the components of such systems are reviewed, and a brief overview of the main methods and techniques that
have been used in this area to implement the main components of image search systems is given. As one of the options for solving
such a problem, an image retrieve methodology based on the binary space partitioning method and the perceptual hashing method is
proposed. Space binary partition trees are a data structures obtained as follows: the space is partitioned by a hyperplane into two halfspaces, and then each half-space is recursively partitioned until each node contains only a trivial part of the input features. Perceptual
hashing algorithms make it possible to represent an image as a 64-bit hash value, with similar images represented by similar hash
values. As a metric for determining the distance between hash values, the Hamming distance is used, this counts the number of distinct bits. To organize the base of hash values, a vp-tree is used, which is an implementation of the binary space partitioning structure. For the experimental study of the methodology, the Caltech-256 data set was used, which contains 30607 images divided into
256 categories, the Difference Hash, P-Hash and Wavelet Hash algorithms were used as perceptual hashing algorithms, the study was
carried out in the Google Colab environment. As part of an experimental study, the robustness of hashing algorithms to modification,
compression, blurring, noise, and image rotation was examined. In addition, a study was made of the process of building a vp-tree
and the process of searching for images in the tree. As a result of experiments, it was found that each of the hashing algorithms has
its own advantages and disadvantages. So, the hashing algorithm based on the difference in adjacent pixel values in the image turned
out to be the fastest, but it turned out to be not very robust to modification and image rotation. The P-Hash algorithm, based on the
use of the discrete cosine transform, showed better resistance to image blurring, but turned out to be sensitive to image compression.
The W-Hash algorithm based on the Haar wavelet transform made it possible to construct the most efficient tree structure and proved
to be resistant to image modification and compression. The proposed technique is not recommended for use in general-purpose image
retrieval systems; however, it can be useful in searching for images in specialized databases. As ways to improve the methodology,
one can note the improvement of the vp-tree structure, as well as the search for a more efficient method of image representation, in
addition to perceptual hashing.