Multi-digital watermarking method for GIS vector data
Technical Field
The invention belongs to the field of geographic information copyright protection, and particularly relates to a multi-digital watermarking method for embedding and extracting watermarks in a comprehensive mode aiming at GIS vector data so as to improve algorithm robustness.
Background
In recent years, scholars at home and abroad pay attention to the design of Geographic Information System (GIS) vector data digital watermarking algorithms, and various GIS vector data digital watermarking algorithms are provided. With the continuous deepening of the digital watermarking technology in the application of GIS vector data products, the single watermark has obvious targeting property in the aspect of robustness and limited attack resistance, is easy to be mastered or damaged by attackers, and cannot meet the requirements of people, so that the multiple digital watermarking technology is generated. The multiple watermarks are a technology for embedding multiple watermarks in the same carrier in multiple ways, can be used for transmission tracking, multiple authentication and the like of digital products, and have practical application values in the aspects of digital product copyright protection, ownership authentication and the like. At present, the research on multiple watermarks at home and abroad mainly focuses on the multimedia fields such as images and audios, and achieves some achievements. For example, ma qiang (computer application and software, vol.24, No.8, 186-. And the research on multiple digital watermarks of GIS vector data is relatively less. Wherein, Min's connecting right and so on (computer application and software, Vol.24, No.1, 146 and 148, 174, 2007) designs a digital map watermarking algorithm based on DCT, firstly extracting the feature points of the map data to form a feature image; then, discrete cosine transform is carried out on the characteristic image, and watermark information is embedded on the medium and low frequency coefficients. The algorithm embeds watermarks on medium and low frequency coefficients at the same time, the embedding capacity is improved to a certain extent, but the coefficient characteristics of medium frequency and low frequency are not considered. In addition, this can be regarded as a multiple digital watermarking algorithm in a non-strict sense.
Disclosure of Invention
The invention aims to: aiming at the problems that the robustness of the existing single watermark algorithm is not strong and GIS vector data cannot be effectively protected, the method for embedding and extracting the multiple digital watermarks aiming at the GIS vector data features is provided, so that the method has better robustness in the aspects of resisting coordinate system transformation attack, cutting attack, compression attack, noise attack, editing attack, data attack increase and the like.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a multiple digital watermarking method aiming at GIS vector data mainly comprises the following processes:
(1) and (3) watermark embedding process:
the method comprises the following steps: reading and processing data, reading GIS vector data, configuring parameters of an embedded algorithm, and checking the legality of all input data and configuration parameters;
step two: the parity embedding algorithm is characterized in that an embedding object is the whole source data, the parity of the number of element nodes is used for representing watermark information of '1' or '0', one element is embedded with one bit of watermark information, and the parity of the current element is changed by adding a redundant point in a node sequence;
step three: based on the embedding position and the embedding mode of the traditional watermark, after a node sequence of an element is compressed by using a Douglas pock method, the watermark is respectively embedded in a spatial domain of a horizontal coordinate and a vertical coordinate, a discrete wavelet transform domain and a discrete cosine transform domain by using a low-order additive method and a least significant bit substitution method, and a watermark embedding object is a specific geographic spatial element;
step four: the zero watermark algorithm constructs watermark information by using the important characteristics of carrier data, and the step is to divide the regions according to the range, then count the number of nodes in each region, construct the zero watermark according to the result and store the zero watermark;
step five: storing the data after embedding the watermark;
(2) the watermark extraction process is the inverse process of the embedding process (1):
the method comprises the following steps: reading and processing data, namely reading GIS vector data of the watermark to be extracted, reading a configuration parameter file, and checking the legality of input data and configuration parameters;
step two: extracting watermark information by a zero watermark algorithm;
step three: extracting watermark information by using an odd-even method;
step four: after compression by a Douglas pock compression method, extracting watermarks in a space domain of an abscissa and an ordinate, a discrete wavelet transform domain and a discrete cosine transform domain respectively by a low-order additive method and a least significant bit substitution method;
step five: comprehensively comparing and analyzing the watermark information extracted in the steps to obtain an optimal extraction result.
The embedding steps of the multiple digital watermarking method for GIS vector data are as follows:
a first link: data preparation prior to embedding the watermark.
Reading GIS vector data, and then selecting polygons embedded with watermarks by adopting an interval polygon method and organizing the polygons into a custom structure S0; copyright marking watermark information W1 is input and converted into watermark W2 to be embedded in the algorithm; configuring parameters of an embedding algorithm; all input data and configuration parameters are checked for validity.
And a second link: the parity method embeds watermark information. The method belongs to a customized watermarking algorithm, is only applicable to line and plane data, and utilizes the parity of the total number of nodes of an element in a single element to represent a watermarking bit, namely, even numbers represent 0 and odd numbers represent 1. In which the parity of the current element is changed by adding a redundant point to the node sequence, since an element has a large number of nodes, adding a point (which may be the midpoint between two adjacent nodes) has no effect on the element.
The algorithm processes as follows:
1) the embedding times are calculated according to the following formula:
where c is the number of embedding times, n is the number of elements, l is the length of the watermark,
is a rounded symbol. If c is less than 1, returning.
2) Sequentially taking one-bit watermark information Wi (0 & lti & gt < 1), and an element node set S0j (0 & ltj & ltn); if the watermark information is '0' and the number of the element nodes is odd, adding a point in the middle of the point sequence, and taking the coordinate of the point as the average value of the two points before and after the point; if the watermark information is '1' and the number of element nodes is even, adding a point in the middle of the point sequence, and taking the average value of the two points before and after the point coordinate;
and a third step: s0j (0 < ═ j < n) is compressed by the Douglas-k compression method, and the simplified node sequence R0j (0 < ═ j < n) is obtained as input data of the next stage, and a look-up table T is embedded.
And a fourth step of: the abscissa DWT low frequency additive algorithm embeds the watermark. Firstly, selecting vertexes in R0j (0 & ltj & lt n) according to a group of 8 bits to form a data sequence, carrying out Discrete Wavelet Transform (DWT) on the data sequence to separate low, medium and high frequency coefficients, then embedding watermark information on the low frequency coefficients according to the following formula, and finally outputting watermark information embedded data R1j (0 & ltj & lt n) through DWT inverse transformation.
xw=x+w (2)
Wherein,
is a water-containing printing carrier, x ═ x
iI is 0 ≦ N and w ═ w
iI is more than or equal to 0 and less than N, which are respectively the original carrier and the watermark.
And a fifth step: and embedding the watermark by using a least significant bit replacement algorithm in an abscissa airspace. In other words, within a range allowed by the map data accuracy, the watermark information is directly embedded in the abscissa of the input data (the output data R1j (0 ≦ j ≦ n) in link four) by using the least significant bit replacement algorithm, and the output data R2j (0 ≦ j ≦ n) is obtained.
And a sixth link: and embedding watermark information by using a vertical coordinate DCT low-frequency additive algorithm. In the same way, except that the carrier data (R2j (0 ≦ j < n)) is subjected to discrete cosine transform of the ordinate, and then watermark information is embedded into the low-frequency coefficients, resulting in output data R3j (0 ≦ j < n).
The concrete embedded formula is the same as the fourth link.
And a seventh link: and embedding watermark information by an intermediate frequency additive algorithm in the DCT of the ordinate. In the same way, except that the carrier data (R3j (0 ≦ j < n)) of this link is subjected to discrete cosine transform of the ordinate, and then watermark information is embedded into the intermediate frequency coefficients, resulting in output data R4j (0 ≦ j < n).
The concrete embedded formula is the same as the fourth link.
And eighth step: de-compressing by the douglas method. According to the embedding comparison table T generated in item three, the carrier data with the watermark (R4j (0 ≦ j < n)) is integrated into the uncompressed total data S0, and the uncompressed data with the watermark S1 is obtained.
And a ninth link: and (4) zero watermark algorithm. The watermark information is constructed by using the important characteristics of the input data S1, and any information of S1 is not modified in the construction process. The specific process is as follows:
1) traversing all nodes in the S1 to obtain the maximum spatial position distribution range D;
2) obtaining a statistical interval number C (C is H multiplied by Z) according to the set horizontal block number H and the vertical block number Z;
3) averagely dividing D in the step 1) into C sub-intervals, and storing the boundary range of each sub-interval;
4) traversing all nodes, judging which subinterval the node belongs to, and increasing the total number Ccount of the node in the subinterval by 1;
5) and constructing a zero watermark according to the total number of the nodes, the number of the statistical intervals and the number of the nodes in each interval, and storing the zero watermark as a physical file.
And a tenth step: and saving the file after embedding the watermark. The specific process is as follows:
1) reassembling the nodes embedded with the watermark information back to the data elements;
2) and saving the data elements to the hard disk.
The invention aims at the extraction of a GIS vector data multiple digital watermarking method, which is an embedded inverse process, and comprises the following specific steps:
a first link: and (4) preparing data before extracting the watermark.
Reading GIS vector data of the watermark to be extracted, and organizing the GIS vector data into a user-defined structure S0; reading a configuration parameter file; and checking the legality of the input data and the configuration parameters.
And a second link: and extracting a zero watermark algorithm. The watermark information is constructed by using the important characteristics of the input data S0, and any information of S0 is not modified in the construction process. The specific process is as follows:
1) traversing all nodes in the S0 to obtain the maximum spatial position distribution range D;
2) obtaining a statistical interval number C (C is H multiplied by Z) according to the number H of the horizontal blocks and the number Z of the vertical blocks in the configuration parameters;
3) averagely dividing D in the step 1) into C sub-intervals, and storing the boundary range of each sub-interval;
4) traversing all nodes, judging which subinterval the node belongs to, and increasing the total number Ccount of the node in the subinterval by 1;
5) and constructing a zero watermark W0 according to the total number of the nodes, the number of the statistical intervals and the number of the nodes in each interval, and storing the zero watermark W0 as a physical file.
And a third step: and extracting watermark information by using the parity method.
The algorithm processes as follows:
1) sequentially taking an element node set S0j (j is 0 ═ n), and obtaining M by summing the number of nodes, and adding M to a watermark character string W1;
2) and analyzing the W1 according to the rules of a previous system to obtain the copyright watermark W1.
And a fourth step of: s0j (0 < ═ j < n) is compressed by the Douglas-k compression method, and the simplified node sequence R0j (0 < ═ j < n) is obtained as input data of the next stage.
And a fifth step: and extracting the watermark by an abscissa DWT low-frequency additive algorithm. Firstly, selecting vertexes in R0j (j is 0 & ltn & gt) according to a group of 8 bits to form a data sequence, separating three frequency coefficients of low frequency, medium frequency and high frequency by performing Discrete Wavelet Transform (DWT) on the data sequence, and then extracting watermark information on the low frequency coefficients according to the following formula to obtain watermark information W2.
w=xw-x (3)
Wherein w ═ { w ═ w
iI is more than or equal to 0 and less than N is extracted watermark information,
x={x
iand i is more than or equal to 0 and less than N is the data of the carrier containing the watermark and the data of the original carrier.
And a sixth link: and extracting the watermark by using a least significant bit substitution algorithm in an abscissa airspace. And extracting watermark information by using a least significant bit substitution algorithm directly in an abscissa sequence of R0j (0 < ═ j < n) according to the read parameters, and analyzing to obtain copyright mark information W3.
And a seventh link: and extracting watermark information by a vertical coordinate DCT low-frequency additive algorithm. In the same way, except that the link performs discrete cosine transform of vertical coordinates on the carrier data (R0j (0 < ═ j < n)), and then extracts watermark information from low-frequency coefficients and analyzes the watermark information to obtain copyright marking information W4.
The concrete extraction formula is the same as the fifth step.
And eighth step: and extracting watermark information by an intermediate frequency additive algorithm in the vertical coordinate DCT. In the same link four, except that the carrier data (R0j (0 ═ j < n)) of the link is subjected to discrete cosine transform of vertical coordinates, and then watermark information is extracted from the intermediate frequency coefficient and is analyzed to obtain copyright marking information W5.
The concrete extraction formula is the same as the fifth step.
And a ninth link: and comprehensively comparing and analyzing W0, W1, W2, W3, W4 and W5 to obtain an optimal extraction result W.
The invention relates to a multi-digital watermarking method which is specially designed according to the characteristics of GIS vector data and a form which is easy to attack. The method overcomes the defects that the traditional single watermark algorithm is low in robustness and cannot effectively protect the limitation of the GIS vector data copyright, and can provide effective guarantee for the production, the transmission and the application of the GIS vector data.
Drawings
Fig. 1 is a flow chart of watermark embedding according to the present invention.
Fig. 2 is a flow chart of watermark extraction according to the present invention.
Fig. 3 is vector data of administrative divisions at chinese county level.
Fig. 4 is a diagram of the results of the cropping attack after embedding the watermark of fig. 3.
Detailed Description
The following is a more detailed description of the embodiments with reference to the drawings.
Example 1
The present embodiment selects a typical GIS vector data, and provides an embodiment of the present invention for the whole process of data reading, preprocessing, watermark embedding, result saving, and watermark extraction, and further describes the present invention in detail. In this embodiment, vector data (as shown in fig. 3) of the chinese county-level administrative division is selected as experimental data, the data format is shp, the number of data records is 3407, and the surface layer is shown. The watermark marking content is text information of Nanjing university, and the corresponding binary watermark sequence W is 11000100110011111 … … 100111 with the length of 96.
1. A multiple digital watermark embedding method for GIS vector data.
Fig. 1 is an embedded workflow diagram of the GIS vector data multiple digital watermarking method proposed by the present invention. The whole process is divided into the following parts:
the method comprises the following steps: reading and preprocessing carrier data.
1) The corresponding carrier data information is read by using the read function of the MapWinGIS open source plug-in, and the data selected in the embodiment is as shown in FIG. 3. It is the surface data in the GIS vector data.
2) The data preprocessing comprises two steps: firstly, reading a node sequence in surface data, and organizing the data into a List < IFMEOPoint > [ ] structure, wherein each List object is a node sequence in a polygon element; secondly, a strategy of embedding the watermark into the interval polygons is adopted to screen out the polygons with the embedded watermark, so that the watermark is only embedded once on the common edge of any two polygons. The number of embedded polygons after screening was 1057.
Step two: the parity method embeds the watermark.
1) And (4) calculating the embedding times. Obtaining a formula (1) of a link II in the embedding step
2) Taking embedding a watermark as an example, a bit of information of a binary watermark is sequentially obtained and set as wiI is more than or equal to 0 and less than l, and l is the watermark length. Taking the number n (n ═ 3) of nodes of the current embedded element, if wiNot equal to n% 2,% is a remainder symbol, add a point p at position 2 in the node sequence2,p2The horizontal and vertical coordinates of the point are the average values of the front and back points;
3) repeat c-1 times 2);
step three: and performing node compression on the embedded polygon according to a Douglas Pock method to obtain a List < IFMEOPoint > [ ] type feature point data List CompressedPoint, and a List < int > [ ] type compression corresponding List CompressedTable. The douglas pock method is a well-established vector data compression method, and is not described herein.
Step four: and the horizontal coordinate DWT low-frequency additive way is used for embedding the watermark. Taking the node sequence list of each element as a unit, performing discrete wavelet transform on 8 groups, and not performing transform when the number of the node sequence lists is less than 8; then, adding the watermark information and the embedding position information through a formula (2) to embed the watermark information;
step five: the watermark is embedded instead of the abscissa least significant bit method. According to the set embedding point position (value is 4) and the embedding length (value is 4), the watermark information directly replaces the corresponding numerical value in the abscissa so as to embed the watermark;
step six: and respectively embedding the watermarks in the middle frequency and the low frequency of the ordinate DCT in an additive mode. The concrete process is the same as the fourth step;
step seven: and (3) inverse Dow Larsuck compression, integrating the node sequence embedded with the watermark into a whole node list according to a parameter CompressedTable generated during compression.
Step eight: and extracting the zero watermark. Determining a distribution range according to the data size of all nodes, where the distribution range in this embodiment is Xmax equals 134.3835199, Xmin equals 74.848011010179221, Ymax equals 52.90461345, Ymin equals 18.3902364, dividing the distribution range into specific statistical intervals according to the set number of blocks (6 × 6 equals 36 blocks), then traversing all nodes to determine which interval it is located in, recording the total number of nodes in the interval, and finally storing information such as the total number of nodes, interval information, and the number of nodes in the interval as a character string in a text form, that is, "zero watermark", 002200606000000000000000008000028000021000000000000000036000022000007000028000004000038000140000099000055000002000000000144000302000314000196000012000000000037000218000258000280000071000018000000000000000000000003000085000026.
Step nine: the data after embedding the watermark is saved. And writing the data elements into a new data file according to the parameter information adopted during data reading to obtain a vector data file containing watermark information.
2. A multiple digital watermark extraction method for GIS vector data.
And after the embedding is finished, extracting and verifying the watermark information. The essence of this part is the inverse of the embedding process, as fig. 2 is the workflow of the extraction method, unlike the embedding process, which does not require "common edge removal" processing during data preprocessing. The method comprises the following specific steps:
step 1: reading and preprocessing carrier data.
1) And reading the corresponding watermark-containing carrier data by using a reading function of the MapWinGIS open source plug-in, namely the data processed by the embedding process in the figure 3. It is the surface data in the GIS vector data.
2) Data preprocessing: reading the node sequence in the surface data, and organizing the data into a self-defined structure of List < IFMEOPoint > [ ], which is denoted as S0, wherein each List object is a node sequence in a polygon element.
Step 2: and extracting the zero watermark.
This section constructs watermark information using important features of the input data S0. The specific process is as follows:
1) traversing all nodes in the S0 to obtain the maximum spatial position distribution range D;
2) obtaining a statistical interval number C (C is 6 multiplied by 6) according to the number H of the horizontal blocks and the number Z of the vertical blocks in the configuration parameters;
3) averagely dividing D in the step 1) into C sub-intervals, and storing the boundary range of each sub-interval;
4) traversing all nodes, judging which subinterval the node belongs to, and increasing the total number Ccount of the node in the subinterval by 1;
5) and constructing a zero watermark W0 according to the total number of the nodes, the number of the statistical intervals and the number of the nodes in each interval, and storing the zero watermark W0 as a physical file.
And step 3: and extracting the watermark by using the parity method.
The algorithm processes as follows:
1) sequentially taking an element node set S0j (j is 0 ═ n), and obtaining M by summing the number of nodes, and adding M to a watermark character string W1;
2) and analyzing the W1 according to the rules of a previous system to obtain the copyright watermark W1.
And 4, step 4: s0j (0 < ═ j < n) is compressed by the Douglas-k compression method, and the simplified node sequence R0j (0 < ═ j < n) is obtained as input data of the next stage.
And 5: and extracting the watermark by an abscissa DWT low-frequency additive algorithm. Firstly, selecting vertexes in R0j (j is 0 & ltn & gt) according to a group of 8 bits to form a data sequence, separating three frequency coefficients of low frequency, medium frequency and high frequency by performing Discrete Wavelet Transform (DWT) on the data sequence, and then extracting watermark information on the low frequency coefficients to obtain watermark information W2.
Step 6: and extracting the watermark by using a least significant bit substitution algorithm in an abscissa airspace. And extracting watermark information by using a least significant bit substitution algorithm directly in an abscissa sequence of R0j (0 < ═ j < n) according to the read parameters, and analyzing to obtain copyright mark information W3.
And 7: and extracting watermark information by a vertical coordinate DCT low-frequency additive algorithm. In the same step 5, the carrier data (R0j (0 ≦ j < n)) is subjected to discrete cosine transform of the ordinate, and then watermark information is extracted from the low-frequency coefficient and analyzed to obtain copyright marking information W4.
And 8: and extracting watermark information by an intermediate frequency additive algorithm in the vertical coordinate DCT. In the same step 5, the carrier data (R0j (0 ≦ j < n)) in this step is subjected to discrete cosine transform of the ordinate, and then watermark information is extracted from the intermediate frequency coefficient and is parsed to obtain copyright marking information W5.
And step 9: and calculating the similarity between W0, W1, W2, W3, W4 and W5 in pairs by using the editing distance, and selecting to obtain an optimal extraction result W (11000100110011111 … … 100111), wherein the original copyright information is 'Nanjing university of teachers' after reduction.
3. And (6) testing and analyzing.
The method provided by the invention is algorithm integration, and a GIS vector data multiple digital watermarking system can be developed and realized by adopting the method.
(1) Coordinate system transformation attack
The data containing the watermark is subjected to coordinate conversion, and the coordinate of the point of the component element is slightly changed due to the coordinate system conversion, so the watermark cannot be extracted by the algorithm of embedding the watermark by changing the coordinate, but the watermark extraction by the odd-even method and the zero watermark algorithm cannot be influenced by the attack. Experiments show that the extraction rates of the parity method and the zero-watermark algorithm reach 100 percent, so that the method can resist the coordinate system transformation attack.
(2) Cutting attack
The effect of performing a cropping attack on watermark-containing data is shown in fig. 4. As can be seen from the analysis of the algorithm principle, the embedded carrier of the watermark is all polygons conforming to the spacing policy, so that the embedded polygons are uniformly distributed in all polygons. Therefore, even if a large-range shearing attack is received, the method can extract the watermark as long as one polygon belongs to the embedded polygon, and the extraction rate of the watermark in the experiment is 100 percent like the general method in the invention. However, the parity method and the zero watermark are based on global data to embed and extract the watermark, so that the clipping attack cannot be resisted.
(3) Compression attack
Analysis of the work flow chart 1 of the method shows that the carrier data is compressed by the way of the grassland method before the watermark is embedded in a general way, and then the embedding position of the watermark is only a certain characteristic point in the element. The method of the present invention is resistant to compression attacks as long as the compression attacks are not malicious, i.e., the compression approach still retains the basic form of the elements. Experiments of compression attacks based on the generalized compression command of ArcGIS 9.2 show that 100% of watermarks can still be extracted when the compression ratio reaches 30%.
(4) Noise adding attack, editing attack, data adding attack
The attack is carried out on nodes in the carrier data, and once the attack is used for the carrier data in a large scale, the use value of the map data is greatly reduced, and the attack is determined to be a local attack. Analyzing the principle of the present invention, it can be known that the target objects embedded with watermarks in a general way are all global element objects, that is, the watermarks are uniformly distributed in the carrier data, so that such local attacks cannot affect the normal extraction of the watermarks. The experimental results show that the comprehensive extraction results of the three attacks reach 100%.
In conclusion, the embedding and extracting methods in the invention can not cause mutual interference and conflict, and on the contrary, the methods can complement the advantages and effectively improve the robustness, thereby effectively protecting GIS vector data in production, transmission and use.