CN110070483B - Portrait cartoon method based on generation type countermeasure network - Google Patents
Portrait cartoon method based on generation type countermeasure network Download PDFInfo
- Publication number
- CN110070483B CN110070483B CN201910235651.1A CN201910235651A CN110070483B CN 110070483 B CN110070483 B CN 110070483B CN 201910235651 A CN201910235651 A CN 201910235651A CN 110070483 B CN110070483 B CN 110070483B
- Authority
- CN
- China
- Prior art keywords
- cartoon
- network
- face
- image
- mask
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000012549 training Methods 0.000 claims abstract description 30
- 210000004209 hair Anatomy 0.000 claims description 18
- 230000011218 segmentation Effects 0.000 claims description 9
- 238000010586 diagram Methods 0.000 claims description 6
- 238000009826 distribution Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000009499 grossing Methods 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 101150113005 cyc2 gene Proteins 0.000 claims description 2
- 238000012545 processing Methods 0.000 claims description 2
- 210000000697 sensory organ Anatomy 0.000 claims description 2
- 238000010276 construction Methods 0.000 claims 2
- 101100422770 Caenorhabditis elegans sup-1 gene Proteins 0.000 claims 1
- 238000002203 pretreatment Methods 0.000 claims 1
- 239000000463 material Substances 0.000 abstract description 12
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 230000000694 effects Effects 0.000 description 9
- 210000003128 head Anatomy 0.000 description 8
- 230000001815 facial effect Effects 0.000 description 6
- 210000001508 eye Anatomy 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 210000001331 nose Anatomy 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 4
- 210000000887 face Anatomy 0.000 description 4
- 210000000214 mouth Anatomy 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 239000003086 colorant Substances 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 241000283086 Equidae Species 0.000 description 1
- 241000283070 Equus zebra Species 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 210000004709 eyebrow Anatomy 0.000 description 1
- 230000037308 hair color Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000011282 treatment Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Processing Or Creating Images (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a portrait cartoon method based on a generated type countermeasure network. The invention divides the face and the background through a generating type network, converts the face into the cartoon face, converts the background into the cartoon background, synthesizes the cartoon background to obtain a cartoon picture, and judges the cartoon picture by a judging type network; training the generating type network and the discriminant type network through a loss function; and finally, inputting the face image to be processed into a trained generation type network to generate a corresponding cartoon image. The invention is beneficial to fully automatically generating the portrait cartoon picture according to the input face picture or providing a recommended cartoon scheme according to the face picture input by the user, so that the user can select or modify the cartoon scheme and the material splicing time selected by the user can be saved.
Description
Technical Field
The invention relates to the field of computer vision, in particular to a portrait cartoon method based on a generated type countermeasure network.
Background
And (3) the cartoon of the human face, namely taking the human face picture as input to obtain the corresponding cartoon human face with personal characteristics. The technology has wider application scenes and significance. In recent years, a lot of applications such as "face lovely", "magic diffusion cameras" have also emerged, which are popular. However, face cartoon is a complex problem related to multiple image fields, and it is challenging to quickly and automatically generate a cartoon with portrait characteristics.
The difficulty of face cartoon is as follows: many cartoon styles are designed to maintain the style of hair, face, etc., while changing the structure such as eyes, nose, etc., while removing details such as hair, facial skin, etc., and maintaining the overall color. That is, the face cartoonization involves a change of part of the structure, a removal of detail texture and a retention of color, which are very different from the traditional style conversion, and some classical style conversion algorithms cannot be used on the problem.
In the portrait cartoon field, commercial products mainly adopt a material splicing method. Taking 'face lovely' as an example, a great number of cartoon picture materials are required to be drawn in advance by a painter, the placement positions of the materials are also fixed, and the materials are required to be manually selected by a user for splicing so as to form a cartoon picture. Therefore, similar commercial software cannot automatically generate the portrait cartoon, and the effect is limited by the fixed materials, so that the cartoon similar to a human face is not generated.
In the scientific research field, a great part of full-automatic cartoon generation method capable of being classified as a cartoon generation method based on components mainly comprises the steps of cutting structures such as facial features, hairs and the like, matching the closest materials in a material library, and fusing the materials into a cartoon head portrait through deformation and other treatments. Such methods first require the preparation of a large amount of hair, face, facial, etc., which requires a lot of time and expertise. The finally generated cartoon head portrait is hard, and the structures of all parts of the cartoon faces in the material are required to be similar to the structures of the faces.
In recent years, generation of a contrast network (GAN) has gained great attention in cross-domain image transformation, and provides a new idea for image cartoon. At present, the research of using GAN to carry out face cartoon is less, and the problems are that: (1) In the study of using GAN to carry out cartoon, standard CycleGAN or its variant is often used, and CycleGAN has good effect in the scene of converting horses into zebra pictures, daytime into night pictures and the like, but has poor performance effect when the structural difference of two domains is large, such as the face becomes cartoon. (2) Many research effects of using GAN to carry out facial cartoon can only generate a picture like a cartoon head portrait, personal characteristics are lost, and effects on hairstyle, facial form and color maintenance are poor.
One existing method is Domain Transformation Network (DTN). For images of two domains (face picture and cartoon face picture), the generated model G contains two networks: the pre-trained encoder f and decoder g send the face picture x into f, extract a code f (x) with high-level semantic information of the input image, and the code may contain face information such as hairstyle, color, etc. The decoder g is responsible for decoding the previously obtained code into a cartoon. Judging whether the input picture is a cartoon picture in the sample or not by the judging model D, and generating the cartoon picture by the generating model. Unlike conventional GAN, the network also inputs the picture y from the cartoon head into the generated model, resulting in a generated cartoon graph G (y), which is bounded by the loss function G (y) to be consistent with the input cartoon graph y. In addition, f (G (x)) is limited to be consistent with f (x) through a loss function, so that the cartoon graph G (x) generated according to the face picture x is ensured to keep high-level semantic features of x.
The disadvantage of this method is that:
it can be seen from the DTN solution that for images of two domains, the same encoder f is used to extract the high-level semantic information of the images, which limits the difference between the images of the two domains not to be too large. Through experiments, when the images of two domains are greatly different, namely, the cartoon style deviates from the human style, the method cannot achieve satisfactory effect.
In addition, the input face image and the generated cartoon image are kept consistent in high-level semantic information by limiting, so that the generated image is very weak in supervision, and the color, the face shape, the hairstyle and the like of the generated image cannot always be kept consistent with the input face image, namely the individuation effect is not strong.
Disclosure of Invention
The invention aims to overcome the defects of the existing method and provides a portrait cartoon method based on a generated type countermeasure network. The problems solved by the invention are mainly two: firstly, only the consistency of high-level semantics is maintained, so that the generated cartoon head portrait loses the personalized features, and the semantic information of lower layers such as hairstyles, facial forms, color, skin colors and the like cannot be well maintained; secondly, the existing work converts the face segmentation, the face conversion into the cartoon face and the background conversion into the background in the cartoon picture, and the three subtasks are processed by the same network, so that the quality of the generated cartoon picture is poor.
In order to solve the above problems, the present invention provides a portrait cartoon method based on a generated type countermeasure network, the method comprising:
step one, acquiring a face data training set and a cartoon data training set;
step two, preprocessing a human face data training set to obtain a hair mask, a face mask and a five sense organs mask;
step three, constructing a generating network, converting face images in a face data training set into cartoon images, and converting the cartoon images in the cartoon data training set into the face images;
step four, constructing a discriminant network, and respectively judging the converted cartoon image and the converted face image;
step five, calculating a loss function value and optimizing the generation type network and the discriminant network according to the mask generated in the step two, the face and the cartoon image generated in the step three and the discriminant result obtained in the step four;
step six, repeating the step three to the step five, and circularly iterating for a plurality of rounds to obtain a trained cartoon image generation network;
and step seven, inputting the face image to be processed into the finally obtained cartoon generation type network, so that a corresponding cartoon image with personal characteristics can be obtained.
Preferably, the step of converting the face image in the face data training set into the cartoon image specifically includes:
acquiring a foreground mask using a segmentation network;
encoding the face image using an encoding network;
decoding the face image code by using a background decoding network to obtain a face image background;
decoding the face image code by using a foreground decoding network to obtain a face image foreground;
and obtaining the generated cartoon graph by using the foreground mask, the face image background and the face image foreground.
Preferably, the step of discriminating the converted cartoon image specifically includes:
sending the generated cartoon image and the cartoon image sampled from the cartoon data training set to a cartoon image original image discrimination network, wherein the network is used for judging whether the input image is a non-generated cartoon image sample;
and obtaining an edge map of the cartoon map generated and the edge map of the cartoon map sampled in the cartoon data training set, and judging the edge map by using an edge judging network.
The invention provides a portrait cartoon method based on a generated type countermeasure network, which solves the common problems in the prior face cartoon work by using GAN: the bottom semantic information cannot be kept and the effect of generating the cartoon head is poor. The method is beneficial to automatically generating the portrait cartoon picture according to the input face picture or providing a recommended cartoon scheme according to the face picture input by the user, so that the user can select or modify the cartoon scheme, and the time for splicing materials selected by the user is saved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a general flow chart of a portrait cartoonization method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a generative network of an embodiment of the present invention;
fig. 3 is a block diagram of a discriminant network in accordance with an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a general flow chart of a portrait cartoonization method according to an embodiment of the present invention, as shown in fig. 1, the method includes:
s1, acquiring a face data training set and a cartoon data training set;
s2, preprocessing a human face data training set to obtain a hair mask, a face mask and a five-sense organ mask;
s3, constructing a generating network, converting face images in the face data training set into cartoon images, and converting the cartoon images in the cartoon data training set into the face images;
s4, constructing a discriminant network, and respectively judging the converted cartoon image and the converted face image;
s5, calculating a loss function value and optimizing a generating network and a discriminant network according to the mask generated in the S2, the face and cartoon image generated in the S3 and the discriminant result obtained in the S4;
s6, repeating the steps S3 to S5, and circularly iterating for a plurality of rounds to obtain a trained cartoon image generation network;
s7, inputting the face image to be processed into the finally obtained cartoon generation type network, and obtaining a corresponding cartoon image with personal characteristics.
Step S1, specifically, the following steps are performed:
the aligned face data is obtained from the disclosed aligned face data set CelebA to construct a face training set, and cartoon head portrait pictures are obtained by a web crawler or a mode of using a public data set to serve as a cartoon training set, wherein the cartoon data set disclosed by Google is used in the embodiment.
Step S2, specifically, the following steps are performed:
s2-1, inputting the input face picture into a pre-trained semantic segmentation network to obtain rough hair and face masks. The pre-training network structure in this embodiment uses a semantic segmentation model proposed by google, training data from a human component profile dataset;
s2-2, extracting face characteristic points of an input face by using an active shape model (Active Shape Model, ASM), and respectively calculating convex hulls of eyebrow, eye, nose and mouth characteristic points as masks of the face characteristic points;
s2-3, because of a plurality of cartoon styles, compared with a real human face, larger deformation occurs at the positions of eyes, noses, mouths and the like, a convex hull of the characteristic points of the eyes, the noses and the mouths is calculated to obtain a face structure change area, the face structure change area is marked as Aface1, the rough face mask obtained in the S2-1 is used for marking the area, from which the Aface1 is removed, as Aface2, and the Aface2 is usually the part of which the structure does not need to be changed and the color needs to be maintained. Finally, the hair area obtained in S21 is denoted as ahai.
Step S3, specifically, the following steps are performed:
the two network structures are symmetrical, namely a generating network for converting the human face into a personalized cartoon image and a generating network for converting the cartoon image into the human face. The following mainly describes a generation flow of converting a face image into a cartoon image, a network structure is shown in fig. 2, and an output personalized cartoon image is obtained from an input face image x, which needs to be subjected to the following steps:
s3-1, the input face diagram is marked as x, and the cartoon diagram is marked as y when training GAN. x is used as an input to a segmentation network G1attn for segmenting the foreground to obtain a foreground mask, the output G1attn (x) is a mask whose size corresponds to the size of the input picture, the number of channels is 1, each pixel has a value between 0 and 1, and the closer to 0, the higher the likelihood that the pixel is background, the closer to 1, the higher the likelihood that the pixel is face or hair. So that the background mask can then be represented as 1-G1attn (x);
s3-2, simultaneously, the face input image x is also input into a face feature coding network (marked as e 1) to obtain a high-level semantic information coding vector e1 (x);
s3-3, using the high-level semantic coding vector obtained in the S3-2 as an input of a cartoon background decoding network (denoted as d1 bg), hopeing that d1bg is focused on generating a background similar to the background in the cartoon data set, and not focusing on the generation of cartoon faces;
s3-4, using the high-level semantic coding vector obtained in the S3-2 as input of a cartoon face decoding network (denoted as d1 cnt), hopeing that d1cnt is focused on generating faces in the cartoon data set, and not focusing on background generation of the cartoon data set;
s3-5, obtaining a finally generated cartoon chart by using the face and hair mask and the background mask obtained in the S3-1 and the better cartoon background and the generation result of the cartoon face obtained in the S3-3 and the S3-4:
G1(x)=G1attn(x)⊙d1cnt(e1(x))+(1-G1attn(x))⊙d1bg(e1(x))
wherein, as indicated by ";
s3-6, the structure of the discrimination model from cartoon to face is similar, and the generated model G2 comprises a segmentation network G2attn, a coding network e2, a background decoding network d2bg and a face decoding network d2cnt, which are not described in detail herein.
Step S4, as shown in fig. 3, is specifically as follows:
and S4-1, sending the generated cartoon graph obtained in the step S3-5 and the cartoon graph sampled from the cartoon data training set into a cartoon graph original graph distinguishing network D1, wherein the network is used for judging whether the input image is a non-generated cartoon graph sample.
And S4-2, in addition, the color of the generated cartoon is also influenced by the color distribution of the cartoon data set, and the generated network is difficult to generate the colors which are not in the cartoon data set, because the discrimination network classifies the pictures according to the color distribution of the cartoon data set, so that the color generated by the generated network is close to the color of the cartoon data set. Therefore, during training, a new cartoon graph discrimination network D1Edg is added by reducing the influence weight of the cartoon graph discrimination model, and the discrimination network takes the cartoon edge graph after passing through an edge extraction network (EdgeExtraNet) as input. In this embodiment, the EdgeExtraNet network is composed of two convolution layers, where a first convolution layer is used to obtain a gray scale map, and the other convolution layer is used for edge extraction, and the convolution kernel is an edge operator in the image processing field, and parameters of the convolution kernel are as follows:
s4-3, the face image discriminator network is composed of an original image discriminating network, and unlike the cartoon image discriminator, an edge discriminating network is not needed.
Step S5, specifically, the following steps are performed:
s5-1, calculating the color loss. In the process of cartoon at human edge, in order to keep low-level semantic information such as color development, skin color and the like, the generated cartoon graph obtained in the input image x and S3-5 is input into a smoothing network (SmoothNet), and in the invention, a smoothing network is simply formed by using convolution layers with the same 20 parameters, and the parameters of the convolution kernel are as follows:
according to the data preprocessing module in S-2 we have obtained a rough hair mask ahai and a face part Aface2 that removes nose, eyes and mouth areas, two areas being areas where the structure and color do not need to be changed much, for which the loss is calculated:
L color =||SmoothNet(x)⊙(Ahair+Aface2)-SmoothNet(G1(x))⊙(Ahair+Aface2)|| 1
s5-2, calculating the cycle consistency loss, referring to the CycleGAN, hopefully generating a cartoon graph G1 (x) through G1, and recovering the cartoon graph to x after the cartoon graph G2 is blocked to a human body. In the application of changing a human face into a cartoon, when the human face is changed into a cartoon, that is, x is converted into G1 (x), usually a lot of detail information, especially a human face photo with relatively more hairs and relatively larger background parts, is lost, and when the human face photo is converted into a cartoon picture, the texture of the hair is lost, and a lot of information is lost after the background of the human face photo is converted. Therefore, when the face photo is restored by using G2 (x), the information is almost impossible and unnecessary to be completely restored, and for this purpose, in the cycle that the person becomes cartoon and is restored back to the person, the cycle consistency loss expression of the invention is as follows:
L cyc1 =||x⊙(Aface1+Aface2)-G2(G1(x))⊙(Aface1+Aface2)|| 1 +||SmoothNet(x)⊙Ahair-SmoothNet(G2(G1(x)))⊙Ahair|| 1
for cartoon input diagram y, we pay more attention to the restoration of human face and hair color in this cycle of the person changing to cartoon and then reverting back to person. In the cycle of converting the cartoon into the human and then recovering the cartoon, as the cartoon is usually converted into the human, a certain amount of background information, hair textures and the like are added, and the human face recovery is realized only by deleting the added information, the cycle consistency loss of the CycleGAN can be used:
L cyc2 =||y-G1(G2(y))|| 1
s5-3, calculating mask circulation loss, wherein the foreground region of the face photo is consistent with the foreground region of the cartoon in the cartoon process, so that the mask circulation loss is defined as follows:
L cycattn1 =||G1attn(x)-G2attn(G1(x))|| 1
mask cycle consistency L during cartoon human change cycattn2 And L is equal to cycattn1 The definition is similar.
S5-4, calculating mask supervision loss, wherein the rough mask obtained by the preprocessing module S2 can provide a certain degree of supervision for the mask:
L msksup1 =||G1attn(x)-(Ahair+Aface2+Aface1)|| 1
similarly, if a method is available, semantic segmentation can be performed on the foreground and the background of the cartoon data set, and mask supervision loss can be added to the cartoon data set, so that the segmentation network can be helped to better generate a correct mask.
S5-5, calculating GAN countermeasures, wherein the generation model is expected to generate pictures which can deceive the discrimination model, the discrimination model can correctly distinguish which images are generated by the generation model, X is set as face image distribution, Y is set as cartoon image distribution, and the optimization targets are as follows:
wherein θ is G1 Representing parameters, θ, in the G1 model D1 Representing parameters in model D1, the corresponding GAN challenge loss from the cartoon to the face is similar.
S5-6, calculating GAN gradient map countermeasures against loss, wherein in the process of changing a person into a cartoon, a cartoon gradient map discriminator is added, so that a generator model focuses more on the structure of the cartoon rather than the color, and the optimization target is as follows:
wherein θ is D1Edg Is a parameter of the discrimination model D1 Edg.
S5-7, the final loss function value is the linear combination of the results from S5-1 to S5-7, the generating type network is fixed firstly, the discriminator network is optimized through back propagation, then the discriminator network is fixed, and the generating type network is optimized.
Step S6, specifically, the following steps are performed:
repeating S3-S5, and in the embodiment, carrying out loop iteration for 200 rounds to obtain the trained cartoon image generation type network.
The portrait cartoon method based on the generated type countermeasure network solves the common problems in the prior face cartoon work by using GAN: the bottom semantic information cannot be kept and the effect of generating the cartoon head is poor. The method is beneficial to automatically generating the portrait cartoon picture according to the input face picture or providing a recommended cartoon scheme according to the face picture input by the user, so that the user can select or modify the cartoon scheme, and the time for splicing materials selected by the user is saved.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
In addition, the foregoing describes in detail a portrait cartoon method based on a generated type countermeasure network, and specific examples are applied to illustrate the principles and embodiments of the present invention, and the description of the foregoing examples is only for helping to understand the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Claims (1)
1. A portrait cartoon method based on a generated type countermeasure network, which is characterized in that the method comprises the following steps:
step one, acquiring a face data training set and a cartoon data training set;
step two, preprocessing a human face data training set to obtain a hair mask, a face mask and a five sense organs mask;
step three, constructing a generating network, converting face images in a face data training set into cartoon images, and converting the cartoon images in the cartoon data training set into the face images;
step four, constructing a discriminant network, and respectively judging the converted cartoon image and the converted face image;
step five, calculating a loss function value and optimizing the generation type network and the discriminant network according to the mask generated in the step two, the face and the cartoon image generated in the step three and the discriminant result obtained in the step four;
step six, repeating the step three to the step five, and circularly iterating for a plurality of rounds to obtain a trained cartoon image generation network;
step seven, inputting the face image to be processed into the finally obtained cartoon generation network, so that a corresponding cartoon image with personal characteristics can be obtained;
the construction generation type network is used for converting face images in a face data training set into cartoon images, and specifically comprises the following steps:
the input face drawing is marked as x, and the cartoon drawing used in training GAN is marked as y; x is taken as an input of a segmentation network G1attn, the network is used for segmenting a foreground to obtain a foreground mask, the output G1attn (x) is a mask, the size of the mask is consistent with the size of an input picture, the number of channels is 1, the value of each pixel is between 0 and 1, the closer to 0 is the higher the possibility that the pixel is a background, the closer to 1 is the higher the probability that the pixel is a face or hair is; whereby the background mask is denoted as 1-G1attn (x);
meanwhile, the face image x is also input into a face feature coding network e1 to obtain a high-level semantic information coding vector e1 (x);
using the high-level semantic information coding vector e1 (x) as an input of a cartoon background decoding network d1bg, wherein d1bg focuses on generating a background similar to the background in a cartoon data set, and does not focus on generating a cartoon face;
using the high-level semantic information coding vector e1 (x) as an input of a cartoon face decoding network d1cnt, wherein d1cnt focuses on generating faces in the cartoon data set, but does not focus on background generation of the cartoon data set;
obtaining a finally generated cartoon chart by using the face and hair mask and the background mask obtained from the G1attn (x) and the cartoon background and cartoon face generating results obtained from the d1bg and d1 cnt:
G1(x)=G1attn(x)⊙d1cnt(e1(x))+(1-G1attn(x))⊙d1bg(e1(x));
the ". Iy represents pixel-wise multiplication;
the construction of the discriminant network is characterized in that the discriminant network is constructed to respectively discriminate the converted cartoon image and the converted face image, and the discriminant network is specifically as follows:
sending the generated cartoon image and the cartoon image sampled from the cartoon data training set into a cartoon image original image discrimination network D1, wherein the network is used for judging whether the input image is a non-generated cartoon image sample;
during training, the influence weight of the cartoon graph discrimination model is reduced, and a new cartoon graph discrimination network D1Edg is added, wherein the discrimination network D1Edg takes a cartoon edge graph after edge extraction network EdgeExtraNet as input; the EdgeExtraNet network consists of two convolution layers, wherein the first convolution layer is used for obtaining a gray level image, the other convolution layer is used for extracting edges, a convolution kernel is an edge operator in the field of image processing, and parameters of the convolution kernel are as follows:
the human face image discriminator network is composed of an original image discriminating network, and unlike a cartoon image discriminator, an edge discriminating network is not needed;
the loss function value calculation and optimization generation type network and discriminant network specifically comprise:
calculating the color loss: in the process of changing a person into a cartoon, in order to keep low-level semantic information, an input image x and a generated cartoon graph are input into a smoothing network smoothNet, a smoothing network is formed by using convolution layers with the same 20 parameters, and the convolution kernel parameters are as follows:
from the rough hair mask ahai obtained by the pre-treatment, and the face part Aface2 of the nose, eye and mouth regions, two regions are regions for which the structure and color do not need to be changed much, the loss is calculated:
L color =||SmoothNet(x)⊙(Ahair+Afce2)-SmoothNet(G1(x))⊙(Ahair+Aface2)|| 1 ;
calculating a loop consistency loss: in this cycle of the person becoming cartoon and being restored back to the person, the cycle consistency loss expression is as follows:
L cyc1 =||x⊙(Aface1+Aface2)-G2(G1(x))⊙(Aface1+Aface2)|| 1 +||SmoothNet(x)⊙Ahair-SmoothNet(G2(G1(x)))⊙Ahair|| 1
for cartoon input diagram y, in the cycle of converting cartoon into human and then reducing cartoon back, the cycle consistency loss of CycleGAN is used:
L cyc2 =||y-G1(G2(y))|| 1 ;
calculating mask circulation loss: to keep the foreground region of the face photo consistent with the foreground region of the generated cartoon, the mask cycle penalty is defined as follows:
L cycattn1 =||G1attn(x)-G2attn(G1(x))|| 1 ;
mask cycle consistency L during cartoon human change cycattn2 And L is equal to cycattn1 Definition is similar;
calculating mask supervision loss:
L msk sup1 =||G1attn(x)-(Ahair+Aface2+Aface1)|| 1 ;
calculation of GAN fight loss: hope that the generating model can generate the picture which can deceive the discriminating model, and the discriminating model can correctly distinguish which images are generated by the generating model, let X be the face image distribution, Y be the cartoon image distribution, and optimize the goal as:
wherein θ is G1 Representing parameters, θ, in the G1 model D1 Representing parameters in model D1, corresponding GAN fight loss from cartoon to face being similar;
the GAN gradient map is calculated against loss: in the process of changing the cartoon, a cartoon gradient map discriminator is added, so that the generator model focuses more on the structure of the cartoon rather than the color, and the optimization targets are as follows:
wherein θ is D1Edg Parameters for the discrimination model D1 Edg;
the final loss function value is the linear combination of all the loss results, the generating type network is fixed firstly, the discriminator network is optimized through back propagation, then the discriminator network is fixed, and the generating type network is optimized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910235651.1A CN110070483B (en) | 2019-03-26 | 2019-03-26 | Portrait cartoon method based on generation type countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910235651.1A CN110070483B (en) | 2019-03-26 | 2019-03-26 | Portrait cartoon method based on generation type countermeasure network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110070483A CN110070483A (en) | 2019-07-30 |
CN110070483B true CN110070483B (en) | 2023-10-20 |
Family
ID=67366782
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910235651.1A Active CN110070483B (en) | 2019-03-26 | 2019-03-26 | Portrait cartoon method based on generation type countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110070483B (en) |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112419328B (en) * | 2019-08-22 | 2023-08-04 | 北京市商汤科技开发有限公司 | Image processing method and device, electronic equipment and storage medium |
CN110517200B (en) * | 2019-08-28 | 2022-04-12 | 厦门美图之家科技有限公司 | Method, device and equipment for obtaining facial sketch and storage medium |
CN110796593A (en) * | 2019-10-15 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Image processing method, device, medium and electronic equipment based on artificial intelligence |
CN110852942B (en) * | 2019-11-19 | 2020-12-18 | 腾讯科技(深圳)有限公司 | Model training method, and media information synthesis method and device |
CN111046763B (en) * | 2019-11-29 | 2024-03-29 | 广州久邦世纪科技有限公司 | Portrait cartoon method and device |
CN111275784B (en) * | 2020-01-20 | 2023-06-13 | 北京百度网讯科技有限公司 | Method and device for generating image |
CN111260545B (en) * | 2020-01-20 | 2023-06-20 | 北京百度网讯科技有限公司 | Method and device for generating image |
CN111368796B (en) * | 2020-03-20 | 2024-03-08 | 北京达佳互联信息技术有限公司 | Face image processing method and device, electronic equipment and storage medium |
CN111833240B (en) * | 2020-06-03 | 2023-07-25 | 北京百度网讯科技有限公司 | Face image conversion method and device, electronic equipment and storage medium |
CN112001939B (en) | 2020-08-10 | 2021-03-16 | 浙江大学 | Image foreground segmentation algorithm based on edge knowledge conversion |
CN112102153B (en) * | 2020-08-20 | 2023-08-01 | 北京百度网讯科技有限公司 | Image cartoon processing method and device, electronic equipment and storage medium |
CN112381709B (en) * | 2020-11-13 | 2022-06-21 | 北京字节跳动网络技术有限公司 | Image processing method, model training method, device, equipment and medium |
CN112508991B (en) * | 2020-11-23 | 2022-05-10 | 电子科技大学 | Panda photo cartoon method with separated foreground and background |
WO2022116161A1 (en) * | 2020-12-04 | 2022-06-09 | 深圳市优必选科技股份有限公司 | Portrait cartooning method, robot, and storage medium |
CN112529978B (en) * | 2020-12-07 | 2022-10-14 | 四川大学 | Man-machine interactive abstract picture generation method |
CN112581358B (en) * | 2020-12-17 | 2023-09-26 | 北京达佳互联信息技术有限公司 | Training method of image processing model, image processing method and device |
CN112561786A (en) * | 2020-12-22 | 2021-03-26 | 作业帮教育科技(北京)有限公司 | Online live broadcast method and device based on image cartoonization and electronic equipment |
CN112308770B (en) * | 2020-12-29 | 2021-03-30 | 北京世纪好未来教育科技有限公司 | Portrait conversion model generation method and portrait conversion method |
CN112907708B (en) * | 2021-02-05 | 2023-09-19 | 深圳瀚维智能医疗科技有限公司 | Face cartoon method, equipment and computer storage medium |
CN113222058B (en) * | 2021-05-28 | 2024-05-10 | 芯算一体(深圳)科技有限公司 | Image classification method, device, electronic equipment and storage medium |
CN113570689B (en) * | 2021-07-28 | 2024-03-01 | 杭州网易云音乐科技有限公司 | Portrait cartoon method, device, medium and computing equipment |
CN113838159B (en) * | 2021-09-14 | 2023-08-04 | 上海任意门科技有限公司 | Method, computing device and storage medium for generating cartoon images |
CN114170065B (en) * | 2021-10-21 | 2024-08-02 | 河南科技大学 | Cartoon loss-based cartoon-like method for generating countermeasure network |
CN113822798B (en) * | 2021-11-25 | 2022-02-18 | 北京市商汤科技开发有限公司 | Method and device for training generation countermeasure network, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08251404A (en) * | 1995-03-13 | 1996-09-27 | Minolta Co Ltd | Method and device for discriminating attribute of image area |
CN1458791A (en) * | 2002-04-25 | 2003-11-26 | 微软公司 | Sectioned layered image system |
WO2015014131A1 (en) * | 2013-08-02 | 2015-02-05 | 成都品果科技有限公司 | Method for converting picture into cartoon |
CN107577985A (en) * | 2017-07-18 | 2018-01-12 | 南京邮电大学 | The implementation method of the face head portrait cartooning of confrontation network is generated based on circulation |
CN108596839A (en) * | 2018-03-22 | 2018-09-28 | 中山大学 | A kind of human-face cartoon generation method and its device based on deep learning |
CN109376582A (en) * | 2018-09-04 | 2019-02-22 | 电子科技大学 | A kind of interactive human face cartoon method based on generation confrontation network |
CN109377448A (en) * | 2018-05-20 | 2019-02-22 | 北京工业大学 | A kind of facial image restorative procedure based on generation confrontation network |
-
2019
- 2019-03-26 CN CN201910235651.1A patent/CN110070483B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08251404A (en) * | 1995-03-13 | 1996-09-27 | Minolta Co Ltd | Method and device for discriminating attribute of image area |
CN1458791A (en) * | 2002-04-25 | 2003-11-26 | 微软公司 | Sectioned layered image system |
WO2015014131A1 (en) * | 2013-08-02 | 2015-02-05 | 成都品果科技有限公司 | Method for converting picture into cartoon |
CN107577985A (en) * | 2017-07-18 | 2018-01-12 | 南京邮电大学 | The implementation method of the face head portrait cartooning of confrontation network is generated based on circulation |
CN108596839A (en) * | 2018-03-22 | 2018-09-28 | 中山大学 | A kind of human-face cartoon generation method and its device based on deep learning |
CN109377448A (en) * | 2018-05-20 | 2019-02-22 | 北京工业大学 | A kind of facial image restorative procedure based on generation confrontation network |
CN109376582A (en) * | 2018-09-04 | 2019-02-22 | 电子科技大学 | A kind of interactive human face cartoon method based on generation confrontation network |
Non-Patent Citations (1)
Title |
---|
Auto-painter: Cartoon image generation from sketch by using conditional Wasserstein generative adversarial networks;Yifan Liu 等;《Neurocomputing》;20180522;第311卷;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110070483A (en) | 2019-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110070483B (en) | Portrait cartoon method based on generation type countermeasure network | |
CN111489287B (en) | Image conversion method, device, computer equipment and storage medium | |
Dolhansky et al. | Eye in-painting with exemplar generative adversarial networks | |
Wang et al. | Nerf-art: Text-driven neural radiance fields stylization | |
Shi et al. | Warpgan: Automatic caricature generation | |
Hou et al. | Improving variational autoencoder with deep feature consistent and generative adversarial training | |
Upchurch et al. | Deep feature interpolation for image content changes | |
WO2022078041A1 (en) | Occlusion detection model training method and facial image beautification method | |
CN110852941B (en) | Neural network-based two-dimensional virtual fitting method | |
CN111862294B (en) | Hand-painted 3D building automatic coloring network device and method based on ArcGAN network | |
CN111127309B (en) | Portrait style migration model training method, portrait style migration method and device | |
CN113705290A (en) | Image processing method, image processing device, computer equipment and storage medium | |
WO2024109374A1 (en) | Training method and apparatus for face swapping model, and device, storage medium and program product | |
Yuan et al. | Line art colorization with concatenated spatial attention | |
Di et al. | Facial synthesis from visual attributes via sketch using multiscale generators | |
Huang et al. | Real-world automatic makeup via identity preservation makeup net | |
Li et al. | Detailed 3D human body reconstruction from multi-view images combining voxel super-resolution and learned implicit representation | |
CN111612687B (en) | Automatic makeup method for face image | |
CN112825188A (en) | Occlusion face completion algorithm for generating confrontation network based on deep convolution | |
Peng et al. | Difffacesketch: High-fidelity face image synthesis with sketch-guided latent diffusion model | |
Li et al. | High-quality face sketch synthesis via geometric normalization and regularization | |
CN112862672B (en) | Liu-bang generation method, device, computer equipment and storage medium | |
Kim et al. | Game effect sprite generation with minimal data via conditional GAN | |
CN113947520A (en) | Method for realizing face makeup conversion based on generation of confrontation network | |
CN118015110A (en) | Face image generation method and device, computer readable storage medium and terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |