Proc. of Int. Conf. on Recent Trends in Information, Telecommunication and Computing, ITC A New Robust Watermarking Scheme for Document Images by Randomized Distribution of Watermark Segments S Nirmala1, P Naghabhushan2 and Chetan K.R.3 1 JNN College of Engineering/Dept. of ISE, Shimoga, India Email: [email protected] 2 Mysore University/Department of Studies in Computer Science, Mysore, India Email: [email protected]_mysore.ac.in 3 JNN College of Engineering/Dept. of CSE, Shimoga, India Email: [email protected] Abstract— A new robust watermarking scheme for protection of document image contents using redundant watermark segments is proposed in this work. A wavelet-based watermarking scheme for embedding a logo has been developed for robust watermarking. At the sender side, a level-2 wavelet transformation is applied on the source document image. LL-sub-band of level-2 of the transformed image is subdivided into blocks of uniform size. A logo watermark of size same as the transformed image block is considered. The watermark is divided into a number of segments. A number of sets of transformed image blocks are formed pseudo-randomly and the total size of all the blocks in a set is equal to the size of the watermark. The watermark segments are embedded into blocks of each set using quantization technique. The amount of quantization is controlled based on its strength. Thus, multiple copies of watermark are available and each input image block need not include the entire watermark. At the receiver side, the extracted segments from each set of blocks are merged to obtain a single extracted watermark. Based on the quantization step-size, size of the logo and the level of wavelet transform, the watermark is extracted without accessing the original image. The experimental results show that the proposed technique is highly robust. The performance evaluation results show that the proposed approach is better than the existing method [1]. Index Terms— Document Image, Robust watermarking, Haar Wavelet, Quantization based Embedding, Watermark Extraction I. INTRODUCTION Digitization of documents is essential in this digital age. People share files on digital platforms rather than physical papers. This digitization also facilitates unauthorized use, misappropriation, and misrepresentation. Thus, there is great interest in developing technology that will help protect the integrity of a digital media element and the intellectual property rights of its owners. Digital watermarking is the art of protecting the digital content by inserting the proprietary mark which may be easily retrieved by the owner to verify about its ownership or authenticity [1]. Generally watermarking algorithm consists of three parts: (i) watermark, which is unique to the owner, (ii) the encoder for embedding the watermark into the data and (iii) the decoder DOI: 02.ITC.2014.5.93 © Association of Computer Electronics and Electrical Engineers, 2014 for extraction and verification [1]. The main properties of a digital watermarking system to be addressed are: the data payload or capacity (amount of information that can be embedded within an image), robustness (watermark resistance against intentional and unintentional image processing operations) and fidelity (similarity between original and watermarked images). A gain in one of these properties usually comes at the expense of loss in others [1]. A variety of digital watermarking methods have been developed in recent past for authentication and tamper detection of digital information [2-5]. The problem of detecting any sort of intentional manipulation inside the digital images has been addressed in [6]. The digital watermarking techniques are broadly categorized as can be divided into block-wise and pixel-wise techniques [6]. In block-wise watermarking, the host image is divided into non-overlapping blocks for tamper detection. In pixel-wise watermarking techniques, each pixel is used for tamper detection. Digital documents bring various challenges for copyright protection [7]. They have limited capacity for watermark embedding, since there is no redundancy in text as can be found in images, audio, and videos. In addition, any transformation on the digital documents should preserve the meaning, fluency, syntactic structure, and the order of the content present in it. Preserving the writing style of the author is very important in some domains such as literature writing or editorial columns in e-news. Sensitive nature of some documents such as legal documents, poetry, and quotes does not allow semantic transformations, because a simple transformation might destroy both the semantic connotation and the meaning of text. Thus a more robust watermarking is desired for digital documents. II. RELATED WORK Some works have been reported in the literature towards robust watermarking. In [8], a novel feature-based watermarking method using scale-invariant keypoints is described. The feature points are extracted using the scale-invariant keypoint extractor and are then decomposed into a set of disjoint triangles. These triangles are watermarked by an additive way on the spatial domain. Tang and Hang [9] proposed a watermarking algorithm based on image segmentation and Discrete Cosine Transform (DCT). The image is segmented using Expectation Maximization (EM) algorithm [10]. Wu and Liu [11], developed a novel method for image watermarking based on embedding multiple identical watermarks in the spatial and frequency domains of the image representation. In the spatial domain, the processing method uses a non-linear neural network segmentation to output the different zones of watermark embedding with respect to the image characteristics. In [12], a novel content-based watermarking approach that uses geometric warping to embed watermarks is presented. This approach provides greater robustness to strong lossy compression [12]. Chareyron et.al., [13] proposed a robust watermarking technique against geometric distortions. This robust technique is based on the modification of the two dimensional color histogram. In [14], a localized image watermarking scheme for resisting geometric attacks is presented. The watermark synchronization scheme is based on local invariant regions, which can be extracted using scale normalization and image feature points. The extracted local regions are invariant to rotation, scaling and various signal processing attacks. Garg et.al. [15] proposed a robust scaling-based multi-bit watermarking approach in the wavelet transform domain. The host image is segmented into blocks of smaller size. Further, blocks with higher entropy are selected for embedding. In [16], a watermarking scheme for binary document image involving DCT and spatial domains is discussed. The watermark patterns are generated as the DCT domain signals, then perceptually shaped through weighting its components in the spatial domain with the perceptual masks. In [1], a wavelet-based logo watermarking scheme is presented that performs embedding into all sub-blocks of the LLn sub-band of the transformed host image using quantization technique. For document images, the approach [1] does not provide good PSNR and high degree of robustness. From detailed literature survey, it is evident that some of the works resist common image processing attacks like rotation, translation, histogram equalization and noise. However, they are not efficient against document image attacks like semantic transformations of the text, text repositioning, and font style. This paper presents a robust and efficient watermarking scheme for document images. The remainder of the paper is organized as follows: Section III discusses on the proposed watermarking system. Experimental results are presented in Section IV. The comparative analysis is discussed in Section V. Conclusions are summarized in the section VI. 383 III. PROPOSED WATERMARKING SYSTEM In the proposed work, a new watermarking scheme is proposed, where the watermark is spread throughout the input document. The block diagram of the proposed watermarking system is as shown in Fig. 1. It comprises of two modules: (i) Embedding the watermark and (ii) Extraction of watermark. The embedding and extraction modules are explored in Fig. 2 and 3 respectively. The embedding and extraction techniques are discussed in detail in subsequent subsections. A. Robust Watermark Embedding Technique In the robust watermark embedding mechanism, the original image is transformed using Haar wavelets upto 2 levels. The use of wavelets allows decomposing of signals into coarse and fine details and LL sub-band itself captures most of the energy present in the signal [17]. The transformed image is divided into blocks of equal size. We have conducted experiments by decomposing transformed image into blocks of size 8 X 8, 16 X 16 and 32 X 32. From experimental evaluation it is observed that an optimum value to balance the good quality of the watermarked image and increased robustness was found for the input image blocks of size 16 X 16. Hence the LL-2 sub-band coefficients have been divided into blocks of size 16 X 16. A logo image is used as watermark. The watermark is decomposed into number of segments. Experiments are conducted by dividing the watermark into 2 segments to a maximum of 5 segments. Input Image Embedding Watermarked image Extraction Watermark Watermark Figure 1. Proposed watermarking system Figure 2. Embedding Module Figure 3. Extraction Module A block set is formed by a set of pseudo randomly selected blocks of the Haar wavelet transformed image. The number of blocks in the set is equal to the number of watermark segments. The watermark segments are embedded into the Haar wavelet transformed image blocks of the block set. The embedding process is illustrated in Fig. 4. 384 A sample water mark logo has been divided into 5 segments. The input image is transformed using wavelet transformation for 2 levels and the LL-2 subband is also shown in Fig. 4. The LL-2 subband has been divided into blocks of uniform size. Some of these blocks have been pseudo randomly selected (highlighted blocks in the Fig. 4) and they form the block set. The watermark segments are embedded into the blocks of this blockset. All the blocks in the block-set are not adjacent to each other. Thus, the watermark segments are distributed. Figure 4. Watermark Distribution There are some advantages of this distributed embedding:. The amount of embedding is reduced and consequently noise of watermarked image is greatly reduced. The randomness used in embedding watermark allows more spreading of the watermark, which also accounts for greater robustness. Further, security is also provided as the receiver needs to know the pseudo-random permutation of blocks for proper extraction and authentication. Having more number of watermark segments, increases robustness during extraction and allows watermarked image to sustain more attacks. Hence, even if watermark is altered in few regions, it could be extracted from other regions and thus robust to various attacks such as resizing, cropping and other geometrical attacks [1]. The embedding of watermark bit into a wavelet coefficient is achieved through the following quantization based spreading algorithm: Algorithm If W(i,j) is 0 then 385 = (⌊(. )⌋, ) if(rem > 0.75*Q) (, ) = ∗ ⌊(. )/⌋ + 0.8 × end-if else = (⌊ (. )⌋, ) if(rem <= 0.75*Q) (, ) = ∗ ⌊(. )/ ⌋ + 0.8 × end-if end-if where qk(i,j) and qk’(i,j) represents the wavelet coefficient of a block k, before and after quantization respectively and Q represents the quantization factor. In this embedding technique, it can be observed that embedding is done only if the remainder of the wavelet coefficient on Quantization factor is more than 75%. This is because, more than 75% of the Quantization factor needs to be added to existing wavelet coefficient, if the coefficient has to sustain most of the common image processing and document image attacks. B. Robust Watermark Extraction Technique In the robust watermark extraction mechanism, the watermarked image is transformed using Haar wavelets upto two levels. Watermark is extracted from the wavelet coefficients of level-2 LL sub-band. The LL-2 sub-band coefficients are divided into blocks of size 16 X 16. The block-set comprising of pseudo randomly selected blocks used in the watermark embedding is securely communicated to the receiver. The watermark segments are extracted from the blocks of the block-set and are merged, which is illustrated in the Figure 5. In this figure, the LL-2 subband of the watermarked image is shown along with the subdivided blocks. The LL-2 subands of the watermarked image is visually similar to the LL-2 subbands of the original image. The same block-set (shown as highlighted blocks in Fig. 5) used at the sender is used for extracting the watermark segments. The watermark segments are further merged to get the watermark logo as shown in Figure 5. The extraction mechanism is detailed in the following algorithm: Algorithm = (⌊(. )⌋, ) if(rem >= 0.75*Q) wm(wgcnt,j,k)=1; else wm(wgcnt,j,k)=0; end where (. ) ) represents the wavelet coefficient of a block k, Q represents the quantization factor and wm represents the watermark segment, wgcnt gives the segment number of the watermark useful during merging the segments back to watermark. In the extraction algorithm, the remainder after dividing wavelet coefficient and quantization factor is computed. If this remainder exceeds 75% of the Quantization factor, watermark extracted is set to 1. This setting allows great degree of robustness against most of the common image processing and document image attacks. The extracted logos should be exactly same as watermarked logos, under no distortions. However, in the case of distortions, decision on the authentication of the image is obtained by selecting the watermark with highest normalized Correlation coefficient [18] between original and extracted watermarks. IV. EXPERIMENTAL RESULTS For the experimental study of the proposed watermarking system, we have created an image corpus. The corpus contains 60 images belonging to five different classes (Markscards, Certificates, ID-Cards, Cheques and Bills). The robustness of the proposed watermark scheme is tested by applying various attacks such as horizontal cropping, vertical cropping, resizing, noise, JPEG compression and rotation. The degree of robustness obtained is evaluated in terms of Normalized Correlation Coefficient (NCC): (2) NCC= 386 Figure 5. Merging of Watermark segments where represents original and the extracted watermark logo. Normalized correlation is one of the methods used for template matching, a process used for finding incidences of a pattern or object within an image. It ranges between 0 to 1. Higher values of NCC are desired for robust watermarking for all different types of incidental attacks. The embedding capacity and quality of the watermarked image is evaluated using Peak-Signal-to-Noise Ratio (PSNR) [18]. The formulae for Mean Square Error (MSE) and Peak Signal to Noise Ratio (PSNR) are as follows: (3) ∑ ∑ = ( (, ) − (, )) ∗ ! (4) "#$ = 20 ∗ %& ('*- ) − 10 ∗ %& () where ‘I’ and ‘W’ represent the pixels of original and watermarked image respectively. AXI is the maximum possible pixel value of the image. Fig. 6(a) shows the original document image (e.g. marks card), Fig. 6(b) the watermark logo and Fig. 6(c) the watermarked image. The robustness of any watermarked document image under various image processing attacks were analyzed by varying the number of segments of the watermark from 1 to 5. The values of NCC for different attack scenarios on a sample watermarked document image in the image corpus for different number of segments 387 are tabulated in Table I. It could be observed from the results shown in Table I that NCC values in all the cases are above 0.7. From the experimental evaluations, it is observed that with a threshold value of 0.5, extracted watermark is clearly identifiable. Based on this, threshold value of 0.5 is selected. All the values obtained are sufficiently above the chosen criterion and thus sufficiently robust. Further, it could be observed that NCC values do not vary much, with increase in the number of segments of the watermark. Also the desired level of robustness could also be achieved with more number of watermark segments. Consequently, there is a significant improvement in the quality of the watermarked image (as less watermark is embedded) and less time for embedding and extraction without compromising on the level of robustness being achieved. The measure of robustness (NCC) was also tested for a large image corpus and the effect on increasing the number of segments was analyzed. Fig. 7 depicts robustness achieved, when all the images in the corpus were watermarked and subjected to various attacks. The number of watermark segments was varied from 1 to 5. It is evident for the Fig. 7 that the range of average NCC values of all the images in the corpus for different attacks is 0.76 to 0.86. Thus, even with more number of segments, NCC values are not affected and hence, one can use many number of watermark segments. Fig. 8 shows the perceptual quality of the watermarked image for varying number of watermark segments (1 to 5). The advantage of more number of segments is the improved perceptual quality of the watermarked image and reduced time taken for embedding and execution. The perceptual quality of the watermarked image is measured in terms of PSNR. The PSNR values of the watermarked image for different number of segments are depicted in Fig. 8. It is clearly evident that higher PSNR values are possible with more number of segments. This quantifies our claim of better perceptual quality. However, the robustness level falls below 0.7, when number of segments of the watermark is increased beyond 5. Hence the maximum number of segments to achieve good robustness is selected as 0.5. (a) (b) (c) Figure 6. (a) Original Document Image (b)Watermark logo and (c) Watermarked Image TABLE I. R OBUSTNESS OF THE PROPOSED SCHEME FOR VARYING NUMBER OF SEGMENTS FOR A SAMPLE IMAGE IN T HE C ORPUS No. of segments of watermark logo Value of NCC No attack 1 2 3 4 5 1 1 1 1 1 JPEG compres sion 0.85 0.789 0.82 0.74 0.75 Salt & pepper noise 0.88 0.85 0.86 0.80 0.85 Horizontal crop Vertical crop Resize Rotate 1 1 1 0.98 0.98 1 0.98 1 0.98 0.99 1 1 1 1 0.81 1 1 1 1 0.82 V. COMPARATIVE ANALYSIS For comparative analysis we implemented an existing method [1]. In this method, the image is wavelet transformed into 2-levels. The LL-2 sub-band is divided into blocks of size 16 X 16. In this method, watermark is embedded into all image block using Quantization based embedding. The watermarked image in both existing method [1] and proposed method are subjected to many image processing attacks for instance, horizontal and vertical cropping of the segments of the document image, rotation of the core portion of the document image, JPEG compression (about 70%), adding noise (salt and pepper ,Gaussian noise) and 388 Figure 7. Average NCC values for different attacks Figure 8. Average PSNR resizing of the image. The results of the proposed method (number of watermark segments = 3) and existing method [1] for various attacks are shown in Fig. 9. It is evident from the results shown in Fig. 9 that the visual appearance of the extracted watermark of the proposed approach is of good quality and hence the watermark scheme is robust. In the case of horizontal cropping, the central region of the watermarked image was cropped. The proposed method was able to recover a clearly visible watermark logo compared to the existing method [1], in which recovered watermark logo had some noises and some portions were not clearly visible. In the case of vertical cropping the watermark logo extracted with existing method [1] is partly visible where as the proposed method extracts a properly visible watermark logo. Similarly in case of rotation and resizing some portions of the watermarked document image, the watermark logo extracted has some noise on the top portions and proposed method extracts a clearly visible watermark logo. In case of compression, the existing method [1] produces a lot of noise is added to the watermark logo, as entire watermarked image is affected from JPEG compression. Even in this case, the watermark logo extracted from the proposed method is clearly perceivable compared to existing method [1]. Noise in the form of salt and pepper also affects the watermarked image in its entirety. Since, in the proposed method watermark is randomly and distantly distributed, proper of extraction of the watermark logo has been possible. Watermarked Image with Attacks Extracted Watermarks Existing Method [1] (a) Horizontal crop 389 Proposed Method (b) Vertical crop (c) Rotation of the portion of the document image (e.g. name of the candidate in the marks card) (d) Resizing portion of the document (example increasing size of the logo in the marks card) (e) JPEG compression (70%) (f) Salt and Pepper noise Figure 9. Extracted Watermark from proposed and existing method [1] after various attacks 390 The robustness of the proposed watermarking scheme has been analyzed by computing NCC values for various attacks and tested for varying number of image segments. Various analysis of the results were carried out on all images in the image corpus. First, the robustness performance was compared between the existing method [1] and proposed method for 5 segments. The NCC values of a watermarked document image for various attacks are plotted and corresponding graph is shown in Fig. 10. It can be observed that the proposed method clearly outperforms the existing method [1]. From the plot shown in Fig. 10, it is observed that NCC values of the watermarked document image are closer to 1. Hence the proposed method outperforms the existing method [1]. Figure 10. Robustness of Existing[1] and Proposed methods VI. CONCLUSIONS In this paper, a highly robust watermarking scheme using wavelets and randomized distribution of segments of the watermark has been proposed. The robustness of the proposed work is justified by the higher NCC values (>0.75) for various image processing attacks on the watermarked document images. The use of varying number of segments of the watermark has been analyzed. It was observed that a maximum of five segments result in good quality watermark extraction. Further, it benefits in improving visual clarity of watermarked image. This was quantified using PSNR as perceptual metric and higher PSNR values were exhibited as number of segments was increased. The dynamic number of segments of the watermark and each segment of the watermark of variable size is considered as a future enhancement of the current work. REFERENCES [1] W.-T. Huang, S-Y. Tan, Y.-J Chang and C.-H. Chen, “A robust watermarking technique for copyright protection using discrete wavelet transform”, WSEAS Trans. on Computers, Vol. 9, No. 5, 2010, pp. 485-495. [2] H. Mirza, H. Thai, and Z. Nakao, “Color image watermarking and self-recovery based on independent component analysis,” Lecture Notes in Computer Science, 2008, Vol. 5097, pp. 839-849. [3] M. S. Wang and W. C. Chen, “A majority-voting based watermarking scheme for color image tamper detection and recovery”, Computer Standards & Interfaces, 2007, Vol. 29, pp. 561-571. [4] R. Poli, J. Kennedy and T. Blackwell, “Particle swarm optimisation: an overview”, Swarm Intelligence Journal, 2007, Vol. 1, No. 1, pp. 33–57. [5] T.-Y. Lee and S. D. Lin, “Dual watermark for tamper detection and recovery,” Pattern Recognition, 2008, Vol. 41, No. 11, pp. 3497-3506. [6] Xiaochuan Gao, Chun Qi and Haitao Zhou, “An Adaptive Compressed-DCT-Domain Watermarking”, 8th International Conference on Signal Processing, 2006, Vol. 4, pp. 1-4. [7] J. T. Brassil, S. Low, and N. F. Maxemchuk, “Copyright Protection for the Electronic Distribution of Text Documents”, Proceedings of the IEEE Trans. on Multimedia, Vol. 87, No. 7, 1999, pp.1181-1196 [8] Hae-yeoun Lee , Choong-hoon Lee , Heung-kyu Lee and Jeho Nam , “Feature-based image watermarking method using scale-invariant keypoints”, Lecture Notes In Computer Science, 2005, pp. 312-324. 391 [9] Tang CW and Hang HM, “A feature-based robust digital image watermarking scheme. IEEE Trans Signal Process”, 2003, Vol. 51, No. 4, pp. 950–959. [10] Tim-kun Lin and Chung-Lin Huang , “Digital Image Forensics Using EM Algorithm”, Advances in Multimedia Information Processing, Lecture Notes in Computer Science, 2009, Vol. 5879, pp. 994-998. [11] M. Wu and B. Liu, “Data hiding in binary images for authentication and annotation”, IEEE Trans. Multimedia, 2004, Vol. 6, No. 4, pp. 528–538. [12] D. Pröfrock, M. Schlauweg and E. Müller, “Video Watermarking by Using Geometric Warping Without Visible Artifacts”, Lecture Notes in Computer Science- Information Hiding, 2007, Vol. 4437, pp. 78-92 [13] Chareyron, G., Macq, B., and Tremeau A, ‘Watermarking of color images based on segmentation of the XYZ color space’, Second Eur. Conf. on Color in Graphics, Imaging and Vision, Aachen, Germany, April 2004, pp. 178-182. [14] Alghoniemy M and Tewfik AH. “Geometric invariance in image watermarking”, IEEE Trans. on Image Processing, 2004, Vol. 13, No. 2, pp. 145–53. [15] T. M. Ng and H. K. Garg, “Maximum-likelihood detection in DWT domain image watermarking using Laplacian modeling,” IEEE Signal Process. Lett., 2005, Vol. 12, No. 4, pp. 345–348. [16] Xinshan Zhu, Member, IEEE, Lei Wen, and Yanming Chen, “Novel Binary Document Image Watermarking Exploiting the Features of Double Domains”, International Journal of Computer and Electrical Engineering, 2012, Vol.4, No.1, pp. 87-92 [17] Soheili, M.R., “A Robust Digital Image Water marking Scheme Based on DWT”, Journal of Computer Engineering, 2009. Vol. 1, pp. 3-11. [18] Jaejin Lee, Chee Sun Won “A Watermarking Sequence Using Parities of Error Control Coding For Image Authentication And Correction”, Consumer Electronics, IEEE Transactions, Vol. 46, No. 2, 2000, pp. 313 -317. 392
© Copyright 2024 ExpyDoc