CN107454336B

CN107454336B - Image processing method and apparatus, electronic apparatus, and computer-readable storage medium

Info

Publication number: CN107454336B
Application number: CN201710813698.2A
Authority: CN
Inventors: 张学勇
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2017-09-11
Filing date: 2017-09-11
Publication date: 2021-04-30
Anticipated expiration: 2037-09-11
Also published as: CN107454336A

Abstract

The invention discloses an image processing method for merging images. The merged image is formed by fusing a predetermined two-dimensional background image with a predetermined foreground image that can follow the current user action. And rendering the preset foreground image according to the action information of the figure region image of the current user in the scene image. The image processing method comprises the steps of judging whether the position change of two human figure area images corresponding to two continuous combined images is larger than a preset threshold value or not, and taking a previous combined image or a preset two-dimensional background image as a current frame combined image when the position change is larger than the preset threshold value. The invention also discloses an image processing device, an electronic device and a computer readable storage medium. According to the image processing method and device, the electronic device and the computer readable storage medium, the stable merged image is used for replacing the two continuous frames of character image areas associated with the two continuous frames of the preset foreground images when the two continuous frames of character image areas are unstable, so that the merged image is stable in the whole process, and the user experience is improved.

Description

Image processing method and apparatus, electronic apparatus, and computer-readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, an electronic apparatus, and a computer-readable storage medium.

Background

When the electronic device is used for video, the electronic device may cause the current user to instantly leave the shooting view field of the camera in the state of accidental drop and the like, so that the action of the current user cannot be acquired to render the predetermined foreground image in the combined image, the foreground image in the displayed combined image disappears, and the user experience is poor.

Disclosure of Invention

Embodiments of the present invention provide an image processing method, an image processing apparatus, an electronic apparatus, and a computer-readable storage medium.

The image processing method of the embodiment of the invention is used for processing the merged image. The merged image is formed by fusing a preset two-dimensional background image and a preset foreground image which can follow the current user action. And rendering the preset foreground image according to the action information of the figure region image of the current user in the scene image. The image processing method comprises the following steps:

judging whether the position change of two frames of figure region images corresponding to two continuous frames of the combined images is larger than a preset threshold value or not, wherein the two continuous frames of the combined images comprise a previous frame of combined image and a current frame of combined image; and

and when the position change is larger than a preset threshold value, taking the previous frame combined image or the preset two-dimensional background image as the current frame combined image.

The image processing device is used for processing the merged image, and the merged image is formed by fusing a preset two-dimensional background image and a preset foreground image which can follow the current user action. And rendering the preset foreground image according to the action information of the figure region image of the current user in the scene image. The image processing device comprises a processor, wherein the processor is used for judging whether the position change of two frames of the figure region images corresponding to two continuous frames of the combined image is larger than a preset threshold value or not, the two continuous frames of the combined image comprise a previous frame combined image and a current frame combined image, and when the position change is larger than the preset threshold value, the previous frame combined image or the preset two-dimensional background image is used as the current frame combined image.

The electronic device of an embodiment of the present invention includes one or more processors, memory, and one or more programs. Wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs including instructions for performing the image processing method described above.

The computer-readable storage medium of an embodiment of the present invention includes a computer program for use in conjunction with an electronic device capable of image capture, the computer program being executable by a processor to perform the image processing method described above.

According to the image processing method, the image processing device, the electronic device and the computer readable storage medium, when the combined image of the rendered preset foreground image and the preset two-dimensional background image is processed, the stability degree of the portrait is judged through the position change of the corresponding character image areas of the two continuous frames associated with the two continuous frames of the preset foreground images, and the stable combined image is used for replacing when the stability degree is not stable, so that the combined image is always stable in the whole process, and the use experience of a user is improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow diagram illustrating an image processing method according to some embodiments of the invention.

FIG. 2 is a schematic diagram of an image processing apparatus according to some embodiments of the invention.

Fig. 3 is a schematic structural diagram of an electronic device according to some embodiments of the invention.

FIG. 4 is a flow diagram illustrating an image processing method according to some embodiments of the invention.

FIG. 5 is a flow chart illustrating an image processing method according to some embodiments of the present invention.

FIG. 6 is a flow chart illustrating an image processing method according to some embodiments of the invention.

Fig. 7(a) to 7(e) are schematic views of a scene of structured light measurement according to an embodiment of the present invention.

FIGS. 8(a) and 8(b) are schematic views of a scene for structured light measurement according to one embodiment of the present invention.

FIG. 9 is a flow chart illustrating an image processing method according to some embodiments of the invention.

FIG. 10 is a flow chart illustrating an image processing method according to some embodiments of the invention.

FIG. 11 is a flow chart illustrating an image processing method according to some embodiments of the invention.

FIG. 12 is a schematic diagram of an image processing apparatus according to some embodiments of the invention.

FIG. 13 is a schematic view of an electronic device according to some embodiments of the inventions.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

Referring to fig. 1, an image processing method according to an embodiment of the invention is used for processing a merged image. The merged image is formed by fusing a predetermined two-dimensional background image with a predetermined foreground image that can follow the current user action. And rendering the preset foreground image according to the action information of the figure region image of the current user in the scene image. The image processing method comprises the following steps:

02: judging whether the position change of two human figure region images corresponding to two continuous merged images is larger than a preset threshold value or not, wherein the two continuous merged images comprise a previous merged image and a current merged image; and

04: and when the position change is larger than a preset threshold value, the previous frame of combined image or a preset two-dimensional background image is used as the current frame of combined image.

Referring to fig. 2 to 3, the image processing method according to the embodiment of the invention can be implemented by the image processing apparatus 100 according to the embodiment of the invention. The image processing apparatus 100 of the embodiment of the present invention is used to process a merged image. The merged image is formed by fusing a predetermined two-dimensional background image with a predetermined foreground image that can follow the current user action. And rendering the preset foreground image according to the action information of the figure region image of the current user in the scene image. The image processing apparatus 100 includes a processor 20. Both step 02 and step 04 may be implemented by the processor 20.

That is, the processor 20 may be configured to determine whether a position change of two human figure region images corresponding to two consecutive combined images is greater than a predetermined threshold, where the two consecutive combined images include a previous combined image and a current combined image, and when the position change is greater than the predetermined threshold, the previous combined image or the predetermined two-dimensional background image is taken as the current combined image.

In some embodiments, the predetermined foreground image comprises a two-dimensional and/or three-dimensional predetermined foreground image. The predetermined foreground image includes at least one of a virtual character, a real character, and an animal or plant, the real character excluding the current user itself. Wherein the virtual character may be an animated character, such as marrio, caucasian, son big head, crayon shin, etc.; the real characters may be famous characters such as yerbadi herben, soy Mr. and hali potter, and the animals and plants may be animated animals or plants such as Mickey mouse, Donald duck, pea shooter, etc.

The predetermined foreground image may follow information mimicking the current user's motion. Wherein the action information includes at least one of an expression and a limb action of the current user. That is, the current motion information may include only the expression or the limb motion of the current user, or may include both the expression and the limb motion of the current user.

In some embodiments, the predetermined two-dimensional background image can be randomly selected or selected by the current user.

The image processing apparatus 100 according to the embodiment of the present invention can be applied to the electronic apparatus 1000 according to the embodiment of the present invention. That is, the electronic apparatus 1000 according to the embodiment of the present invention includes the image processing apparatus 100 according to the embodiment of the present invention.

In some embodiments, the electronic device 1000 includes a mobile phone, a tablet computer, a notebook computer, a smart band, a smart watch, a smart helmet, smart glasses, and the like.

When the image processing method, the image processing device 100 and the electronic device 1000 of the embodiment of the invention process the combined image of the rendered predetermined foreground image and the predetermined two-dimensional background image, the stability degree of the portrait is judged through the position change of the corresponding character image areas of the two continuous frames associated with the two continuous frames of the predetermined foreground image, and the stable combined image is used for replacing when the stability degree is not stable, so that the combined image is always stable in the whole process, and the use experience of a user is improved.

Referring to fig. 4, in some embodiments, the image processing method according to the embodiments of the present invention further includes:

011: acquiring scene images of multiple frames of current users at a preset frequency;

012: collecting a plurality of frames of depth images of a current user at a preset frequency;

013: processing each frame of scene image and each frame of depth image to extract action information of a current user;

014: rendering the preset foreground image according to the action information so that each frame of the preset foreground image follows the action of the current user; and

015: and fusing the rendered preset foreground image and the preset two-dimensional background image of each frame to obtain a multi-frame combined image so as to output a video image.

Referring to fig. 3 again, the image processing apparatus 100 further includes a visible light camera 11 and a depth image capturing component 12. Step 011 can be implemented by the visible light camera 11, step 012 can be implemented by the depth image acquisition component 12, and steps 013, 014, and 015 can be implemented by the processor 20.

That is, the visible light camera 11 may be configured to collect scene images of multiple frames of current users at a preset frequency; the depth image acquisition component 12 can be used for acquiring a plurality of frames of depth images of the current user at a preset frequency; the processor 20 may be configured to process each frame of scene image and each frame of depth image to extract motion information of the current user, render a predetermined foreground image according to the motion information so that each frame of predetermined foreground image follows the motion of the current user, and fuse each frame of rendered predetermined foreground image and a predetermined two-dimensional background image to obtain a multi-frame merged image to output a video image.

The preset frequency refers to the frame rate of the visible light camera 11 and the depth image capturing component 12 capturing images per second, and the frame rate may be 30 frames per second, 60 frames per second, 120 frames per second, or the like. The higher the frame rate, the smoother the video image. The scene image collected by the visible light camera 11 is a gray scale image or a color image, and the depth image representation collected by the depth image collecting component 12 contains the depth information of each person or object in the scene of the current user. In the specific embodiment of the present invention, the visible light camera 11 and the depth image capturing component 12 should perform image capturing with the same preset frequency, and thus, multiple frames of scene images correspond to multiple frames of depth images one to one, and the motion information obtained after the processor 20 processes each frame of scene image and corresponding depth image can render one frame of corresponding predetermined foreground image, thereby facilitating the fusion processing of each frame of predetermined foreground image and predetermined two-dimensional background image in step 015. In addition, the scene range of the scene image is consistent with the scene range of the depth image, and each pixel in the scene image can find the depth information corresponding to the pixel in the depth image.

When each frame of predetermined foreground image rendered by the motion information of the current user is merged with the corresponding predetermined two-dimensional background image, if the portrait of the user cannot be extracted from a certain frame of scene image due to the severe shaking of the electronic device 1000, the processor 20 cannot render the predetermined foreground image of the corresponding frame according to the motion information of the current user of the frame of scene image, thereby affecting the merging of the predetermined foreground image of the frame and the predetermined two-dimensional background image of the corresponding frame. And the action information of the current user can be extracted from the scene image of the previous frame, so that the preset foreground image of the previous frame can be rendered, and a combined image is obtained. Thus, the merged image of the previous frame can be used as the merged image of the current frame for display. Certainly, the predetermined two-dimensional background image can also be directly displayed, so that poor use experience brought to a user by severe shaking and transformation of the image is avoided.

Referring to fig. 5, in some embodiments, the step 012 of acquiring a plurality of frames of depth images of the current user at a predetermined frequency includes:

0121: projecting structured light to a current user;

0122: shooting a plurality of frames of structured light images modulated by a current user at a preset frequency; and

0123: and demodulating phase information corresponding to each pixel of each frame of structured light image to obtain a multi-frame depth image.

Referring back to fig. 2, in some embodiments, the depth image capture assembly 12 includes a structured light projector 121 and a structured light camera 122. Step 0121 may be implemented by the structured light projector 121. Both

steps

0122 and 0123 may be implemented by the optical camera 122.

That is, the structured light projector 121 may be used to project structured light to a current user; the structured light camera 122 may be configured to capture multiple frames of structured light images modulated by a current user at a preset frequency, and demodulate phase information corresponding to each pixel of each frame of structured light image to obtain multiple frames of depth images.

Specifically, after the structured light projector 121 projects a certain pattern of structured light onto the face and the body of the current user, a structured light image modulated by the current user is formed on the surface of the face and the body of the current user. The structured light camera 122 captures multiple frames of modulated structured light images at a preset frame rate, and demodulates each frame of structured light image to obtain a depth image corresponding to the frame of structured light image, so that multiple frames of depth images can be obtained after demodulating the multiple frames of structured light images. The pattern of the structured light may be laser stripes, gray codes, sinusoidal stripes, non-uniform speckles, etc.

Referring to fig. 6, in some embodiments, the demodulating, by step 0123, phase information corresponding to each pixel of each frame of the structured light image to obtain a multi-frame depth image includes:

01231: demodulating phase information corresponding to each pixel in each frame of structured light image;

01232: converting the phase information into depth information; and

01233: and generating a depth image according to the depth information.

Referring back to FIG. 3, in some embodiments,

steps

01231, 01232, and 01233 can all be implemented by structured light camera 122.

That is, the structured light camera 122 may be further configured to demodulate phase information corresponding to each pixel in each frame of the structured light image, convert the phase information into depth information, and generate a depth image according to the depth information.

Specifically, the phase information of the modulated structured light is changed compared with the unmodulated structured light, and the structured light displayed in the structured light image is the distorted structured light, wherein the changed phase information can represent the depth information of the object. Therefore, the structured light camera 122 first demodulates phase information corresponding to each pixel in each frame of structured light image, and then calculates depth information according to the phase information, so as to obtain a depth image corresponding to the frame of structured light image.

In order to make the process of acquiring the depth image of the face and the body of the current user according to the structured light more clear to those skilled in the art, a widely-applied raster projection technique (fringe projection technique) is taken as an example to illustrate the specific principle. The grating projection technology belongs to the field of surface structured light in a broad sense.

As shown in fig. 7(a), when the surface structured light is used for projection, firstly, a sinusoidal stripe is generated by computer programming, and is projected to a measured object through the structured light projector 121, then the structured light camera 122 is used to shoot the bending degree of the stripe after being modulated by an object, and then the bending stripe is demodulated to obtain a phase, and then the phase is converted into depth information, so as to obtain a depth image. To avoid the problem of error or error coupling, the depth image capturing assembly 12 needs to be calibrated before using the structured light to capture the depth information, and the calibration includes calibration of geometric parameters (e.g., relative position parameters between the structured light camera 122 and the structured light projector 121, etc.), calibration of internal parameters of the structured light camera 122 and internal parameters of the structured light projector 121, and so on.

Specifically, in a first step, the computer is programmed to generate sinusoidal stripes. Since the phase is acquired by the distorted stripes, for example, by a four-step phase shift method, four stripes with a phase difference of pi/2 are generated, and then the structured light projector 121 projects the four stripes onto the object to be measured (the mask shown in fig. 7 (a)) in a time-sharing manner, and the structured light camera 122 acquires the image on the left side of fig. 7(b) and reads the stripes on the reference surface shown on the right side of fig. 7 (b).

And secondly, phase recovery is carried out. The structured light camera 122 calculates a modulated phase according to the four acquired modulated fringe patterns (i.e., structured light images), and the obtained phase pattern is a truncated phase pattern. Since the result of the four-step phase-shifting algorithm is calculated by the arctan function, the phase after the light modulation of the structure is limited to between-pi, i.e. it starts again each time the modulated phase exceeds-pi, pi. The resulting phase principal value is shown in fig. 7 (c).

In the phase recovery process, the jump-canceling process is required, that is, the truncated phase is recovered to the continuous phase. As shown in fig. 7(d), the modulated continuous phase diagram is on the left and the reference continuous phase diagram is on the right.

And thirdly, subtracting the modulated continuous phase from the reference continuous phase to obtain a phase difference (namely phase information), wherein the phase difference represents the depth information of the measured object relative to the reference surface, and substituting the phase difference into a phase and depth conversion formula (parameters related in the formula are calibrated), so that the three-dimensional model of the object to be measured shown in the figure 7(e) can be obtained.

It should be understood that, in practical applications, the structured light used in the embodiments of the present invention may be any pattern other than the grating, according to different application scenarios.

As a possible implementation mode, the invention can also use speckle structure light to collect the depth information of the current user.

Specifically, the method for acquiring depth information by using speckle structure light is to use a substantially flat diffraction element, wherein the diffraction element is provided with a relief diffraction structure with a specific phase distribution, and the cross section of the diffraction element is provided with a step relief structure with two or more concave-convex parts. The thickness of the substrate in the diffraction element is approximately 1 micron, the height of each step is not uniform, and the height can be in the range of 0.7-0.9 micron. The structure shown in fig. 8(a) is a partial diffraction structure of the collimating beam splitting element of the present embodiment. Fig. 8(b) is a cross-sectional side view taken along section a-a, with the abscissa and ordinate both in units of microns. Speckle patterns generated by speckle structured light are highly random and can shift pattern with distance. Therefore, before obtaining depth information using speckle structured light, firstly, a speckle pattern in a space needs to be calibrated, for example, a reference plane is taken every 1 cm within a range of 0-4 m from the structured light camera 122, 400 speckle images are saved after calibration is completed, and the smaller the calibrated interval is, the higher the accuracy of the obtained depth information is. Then, the structured light projector 121 projects the speckle structured light onto a measured object (i.e., a current user), and the speckle pattern of the speckle structured light projected onto the measured object is changed by the height difference of the surface of the measured object. After the structured light camera 122 shoots the speckle pattern (i.e., structured light image) projected onto the measured object, the speckle pattern and 400 speckle images stored after previous calibration are subjected to cross-correlation operation one by one, and then 400 correlation images are obtained. The position of the measured object in the space can display a peak value on the correlation image, and the peak values are superposed together and subjected to interpolation operation to obtain the depth information of the measured object.

Because a common diffraction element diffracts a light beam to obtain a plurality of diffracted lights, the difference of the light intensity of each diffracted light beam is large, and the risk of injury to human eyes is also large. Even if the diffracted light is diffracted twice, the uniformity of the obtained light beam is low. Therefore, the effect of projecting the object to be measured by using the light beam diffracted by the ordinary diffraction element is poor. In this embodiment, the collimating beam splitting element is adopted, and the collimating beam splitting element not only has the function of collimating the non-collimated light beam, but also has the function of splitting light, that is, the non-collimated light reflected by the reflector exits a plurality of collimated light beams at different angles after passing through the collimating beam splitting element, the cross-sectional areas of the emitted collimated light beams are approximately equal, the energy fluxes are approximately equal, and further, the effect of projecting by using the scattered light diffracted by the light beams is better. Meanwhile, the laser emergent light is dispersed to each beam of light, the risk of damaging human eyes is further reduced, and compared with other uniformly-arranged structured light, the speckle structured light has the advantage that the electric quantity consumed by the speckle structured light is lower when the same collecting effect is achieved.

Referring to fig. 9, in some embodiments, the processing 013 processes each frame of the scene image and each frame of the depth image to extract motion information of the current user includes:

0131: identifying a face area in each frame of scene image;

0132: acquiring depth information corresponding to a face region from a depth image corresponding to a scene image;

0133: determining the depth range of the character region according to the depth information of the face region;

0134: determining a person region which is connected with the face region and falls into the depth range according to the depth range of the person region to obtain a person region image; and

0137: the person region image is processed to acquire motion information of the current user.

Referring back to fig. 3, in some embodiments, step 0131, step 0132, step 0133, step 0134, and step 0137 may be implemented by processor 20.

That is, the processor 20 may be further configured to identify a face region in each frame of the scene image, acquire depth information corresponding to the face region from the depth image corresponding to the scene image, determine a depth range of a character region according to the depth information of the face region, determine a character region connected to the face region and falling within the depth range according to the depth range of the character region to obtain a character region image, and process the character region image to acquire motion information of the current user.

Specifically, firstly, a trained depth learning model can be used for recognizing a face region in each frame of scene image, and then depth information of the face region in each frame of scene image can be determined according to the one-to-one correspondence relationship between each frame of scene image and each frame of depth image. Because the face region includes features such as a nose, eyes, ears, lips, and the like, the depth data corresponding to each feature in the face region in the depth image is different, for example, when the face is directly facing the depth image capturing component 12, the depth data corresponding to the nose may be smaller, and the depth data corresponding to the ears may be larger in the depth image captured by the depth image capturing component 12. Therefore, the depth information of the face region may be a value or a range of values. When the depth information of the face area is a numerical value, the numerical value can be obtained by averaging the depth data of the face area; alternatively, it may be obtained by taking the median of the depth data of the face region.

Since the character region includes a face region, that is, the character region and the face region are located in a certain depth range, after the processor 20 determines the depth information of the face region, the depth range of the character region may be set according to the depth information of the face region, and then the character region falling within the depth range and connected to the face region is extracted according to the depth range of the character region, so as to obtain a character region image.

The processor 20 may process the person region image after calculating the person region image. Specifically, the processor 20 may first recognize a face region in the person region image, and then perform expression recognition on the face region; alternatively, the processor 20 directly processes the face area obtained in step 0131 to recognize the current user's expression. Subsequently, the processor 20 processes the image of the person region in each frame of the scene image to obtain the information of the current body motion of the user. Wherein, the information of the current user limb motion can be obtained by means of template matching. The processor 20 matches the person region in the person region image with a plurality of person templates. Firstly, matching the head of a character area; after the head matching is finished, matching the next limb of the rest of the character templates matched with the head, namely matching the upper body of the body; after the matching of the upper body is completed, the next limb matching is performed on the remaining plurality of character templates with the matched head and upper body, namely the matching of the upper limb and the lower limb, so that the information of the current user limb action is determined according to the template matching method. Then, the processor 20 renders the recognized expression and body movement of the current user to the predetermined foreground image, so that the characters or animals and plants in the predetermined foreground image can follow and imitate the expression and body movement of the current user. Finally, the processor 20 fuses the rendered predetermined foreground image with the predetermined two-dimensional background image to obtain a merged image.

In this way, a predetermined foreground image that can follow the expression and the limb movement imitating the current user can be obtained. Because the character area image is segmented from each frame of scene image according to the depth information, and the depth information is not influenced by the image of factors such as illumination, color temperature and the like in the environment, the extracted character area image is more accurate, the expression and the limb action of the current user obtained by processing the character area image by the processor 20 are more accurate, and therefore the processor 20 can render the preset foreground image by using the more accurate action information to obtain a better following simulation effect.

Referring to fig. 10, in some embodiments, the processing 013 processes each frame of scene image and each frame of depth image to extract motion information of the current user further includes:

0135: processing each frame of scene image to obtain a full-field edge image of each frame of scene image; and

0136: and correcting the figure region image corresponding to each frame of full-field edge image according to each frame of full-field edge image.

Step 0137 of processing the human figure region image to acquire the action information of the current user comprises the following steps:

01371: and processing the corrected human figure area image to acquire the action information of the current user.

Referring back to fig. 2, in some embodiments, step 0135, step 0136 and step 01371 may be implemented by processor 20.

That is, the processor 20 is further configured to process each frame of scene image to obtain a full-field edge image of each frame of scene image, modify a person region image corresponding to the full-field edge image of each frame according to each frame of full-field edge image, and process the modified person region image to obtain the motion information of the current user.

The processor 20 first performs edge extraction on each frame of scene image to obtain a plurality of full-field edge images, where edge lines in the full-field edge images include edge lines of the current user and a background object in a scene where the current user is located. Specifically, edge extraction can be performed on each frame of scene image through a Canny operator. The core of the algorithm for edge extraction by the Canny operator mainly comprises the following steps: firstly, a 2D Gaussian filtering template is used for carrying out convolution on a scene image so as to eliminate noise; then, obtaining the gradient value of the gray scale of each pixel by using a differential operator, calculating the gradient direction of the gray scale of each pixel according to the gradient value, and finding out adjacent pixels of the corresponding pixels along the gradient direction through the gradient direction; then, each pixel is traversed, and if the gray value of a certain pixel is not the maximum compared with the gray values of two adjacent pixels in front and back in the gradient direction, the pixel is not considered as the edge point. Therefore, pixel points at the edge position in the scene image can be determined, and the full-field edge image after edge extraction is obtained.

Each frame of scene image corresponds to one frame of full-field edge image, and similarly, each frame of scene image corresponds to one frame of character area image, so that the full-field edge image and the character area image are in one-to-one correspondence. After the processor 20 acquires the full-field edge image, the person region image corresponding to the full-field edge image is corrected based on the full-field edge image. It is understood that the human figure region is obtained by merging all pixels in the scene image, which are connected with the human face region and fall within the set depth range, and in some scenes, there may be some objects connected with the human face region and fall within the depth range. Therefore, the image of the person region can be corrected by using the full-field edge map to obtain a more accurate person region.

Further, the processor 20 may perform a secondary correction on the corrected human figure region, for example, perform a dilation process on the corrected human figure region to enlarge the human figure region to retain edge details of the human figure region. In this way, the image of the person region processed by the processor 20 is more accurate.

After the processor 20 obtains the multiple frames of merged images, the multiple frames of merged images are sequentially arranged and stored, the multiple frames of merged images can be stored as video images in a video format by the processor 20 to form video images, and when the video images are displayed on the display 50 (shown in fig. 12) of the electronic device 1000 at a certain frame rate, a user can view smooth video images.

Referring to fig. 11, in some embodiments, the image processing method according to the embodiments of the present invention further includes:

05: when the position change is larger than a preset threshold value, judging whether the duration time of the position change is larger than preset time or not; and

06: and when the duration is less than the preset time, taking the combined image of the previous frame or the preset two-dimensional background image as the combined image of the current frame in the duration.

Referring back to fig. 3, in some embodiments, steps 05 and 06 may be implemented by processor 20. That is, the processor 20 is further configured to determine whether a duration of the position change is greater than a predetermined time when the position change is greater than a predetermined threshold, and to determine the merged image of the previous frame or the predetermined two-dimensional background image as the merged image of the current frame during the duration when the duration is less than the predetermined time.

Specifically, it can be understood that, when the position of the person region in the scene image changes due to an accident such as shaking or falling of the electronic device 1000, the user may adjust the electronic device 1000 in time, so the duration is usually short, a predetermined time is set, and it can be effectively determined whether the change of the shooting angle is caused by the real intention of the user, and when the duration is less than the predetermined time, the change is considered to be caused by the accident, so that the merged image of the previous frame or the predetermined two-dimensional background image can be used as the merged image of the current frame in the duration, thereby maintaining the stability of the picture. After the shooting angle is adjusted to the normal state, the combined image can be generated again through the preset foreground image and the preset two-dimensional background image of the current frame.

Referring to fig. 11, in some embodiments, the image processing method of the present invention further includes:

07: and when the duration is longer than the preset time, taking the previous frame of combined image or the preset two-dimensional background image as the current frame of combined image in the preset time, and fusing the preset foreground image of the current frame and the preset two-dimensional background image of the current frame after the preset time to form the current frame of combined image.

Referring back to fig. 3, in some embodiments, step 07 may be implemented by processor 20. That is, the processor 20 is further configured to use the previous frame merged image or the predetermined two-dimensional background image as the current frame merged image in the predetermined time period when the duration is greater than the predetermined time period, and fuse the predetermined foreground image of the current frame with the predetermined two-dimensional background image of the current frame after the predetermined time period to form the current frame merged image.

When the duration is longer than the preset time, the change of the shooting angle can be considered as the real intention of the user, the stable combined image or the preset two-dimensional background image is used as the current frame combined image in the preset time, the continuity of the combined image is ensured to a certain extent, after the preset time, the preset foreground image of the current frame and the preset two-dimensional background image of the current frame are fused to form the current frame combined image, and at the moment, the scene image used for rendering the preset foreground image of the current frame may or may not include the person region image. When the scene image used for rendering the current frame of the preset foreground image contains a person area image, rendering the preset foreground image of the corresponding frame by using the frame of the person area image, and fusing the rendered preset foreground image and the corresponding preset two-dimensional background image to form a current frame of the combined image. When the scene image used for rendering the preset foreground image of the current frame contains the human area image, the unrendered preset foreground image and the corresponding preset two-dimensional background image are directly fused to form the current frame combined image, so that the picture of the combined image added with the preset foreground image is richer.

Referring to fig. 3 and 12, an electronic device 1000 is further provided in the present embodiment. The electronic device 1000 includes the image processing device 100. The image processing apparatus 100 may be implemented using hardware and/or software. The image processing apparatus 100 includes an imaging device 10 and a processor 20.

The imaging device 10 includes a visible light camera 11 and a depth image acquisition assembly 12.

Specifically, the visible light camera 11 includes an image sensor 111 and a lens 112, and the visible light camera 11 can be used to capture color information of a current user to obtain a multi-frame scene image, wherein the image sensor 111 includes a color filter array (e.g., a Bayer filter array), and the number of the lens 112 can be one or more. In the process of acquiring each frame of scene image by the visible light camera 11, each imaging pixel in the image sensor 111 senses light intensity and wavelength information from a shooting scene to generate a group of original image data; the image sensor 111 sends the group of raw image data to the processor 20, and the processor 20 performs operations such as denoising and interpolation on the raw image data to obtain a colorful scene image. Processor 20 may process each image pixel in the raw image data one-by-one in a variety of formats, for example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and processor 20 may process each image pixel at the same or different bit depth.

The depth image acquisition assembly 12 includes a structured light projector 121 and a structured light camera 122, and the depth image acquisition assembly 12 is operable to capture depth information of a current user to obtain a depth image. The structured light projector 121 is used to project structured light to the current user, wherein the structured light pattern may be a laser stripe, a gray code, a sinusoidal stripe, or a randomly arranged speckle pattern, etc. The structured light camera 122 includes an image sensor 1221 and lenses 1222, and the number of the lenses 1222 may be one or more. The image sensor 1221 is used to capture a plurality of frames of structured light images projected onto the current user by the structured light projector 121. Each frame of structured light image can be sent to the processor 20 by the depth acquisition component 12 for demodulation, phase recovery, phase information calculation, and the like to obtain the depth information of the current user.

In some embodiments, the functions of the visible light camera 11 and the structured light camera 122 can be implemented by one camera, that is, the imaging device 10 includes only one camera and one structured light projector 121, and the camera can capture not only the scene image but also the structured light image.

In addition to acquiring a depth image by using structured light, a depth image of a current user can be acquired by a binocular vision method, a Time of Flight (TOF) based depth image acquisition method, and the like.

Further, the image processing apparatus 100 includes a memory 30. The Memory 30 may be embedded in the electronic device 1000, or may be a Memory independent from the electronic device 1000, and may include a Direct Memory Access (DMA) feature. The raw image data collected by the visible light camera 11 or the structured light image related data collected by the depth image collecting assembly 12 can be transmitted to the memory 30 for storage or buffering. Processor 20 may read raw image data from memory 30 for processing to obtain an image of a scene and may read structured light image-related data from memory 30 for processing to obtain a depth image. In addition, the scene image and the depth image may also be stored in the memory 30, so that the processor 20 may invoke the processing at any time, for example, the processor 20 invokes a plurality of frames of scene images and a plurality of frames of depth images to perform action information extraction of a current user, and performs fusion processing on a predetermined foreground image rendered by the action information and a corresponding predetermined two-dimensional background image to obtain a plurality of frames of merged images, and the plurality of frames of merged images are sequentially arranged or stored to form a video image. Wherein the predetermined foreground image, the predetermined two-dimensional background image, the combined image, the video image may also be stored in the memory 30.

The image processing apparatus 100 may also include a display 50. The display 50 may retrieve video images directly from the processor 20 and may also retrieve video images from the memory 30. The display 50 displays video images for viewing by a user or for further Processing by a Graphics Processing Unit (GPU). The image processing apparatus 100 further includes an encoder/decoder 60, and the encoder/decoder 60 may encode and decode image data of a scene image, a depth image, a predetermined foreground image, a predetermined two-dimensional background image, a merged image, a video image, and the like, and the encoded image data may be stored in the memory 30 and may be decompressed by the decoder for display before the image is displayed on the display 50. The encoder/decoder 60 may be implemented by a Central Processing Unit (CPU), a GPU, or a coprocessor. In other words, the encoder/decoder 60 may be any one or more of a Central Processing Unit (CPU), a GPU, and a coprocessor.

The image processing apparatus 100 further comprises a control logic 40. When imaging device 10 is imaging, processor 20 may perform an analysis based on data acquired by the imaging device to determine image statistics for one or more control parameters (e.g., exposure time, etc.) of imaging device 10. Processor 20 sends the image statistics to control logic 40 and control logic 40 controls imaging device 10 to determine the control parameters for imaging. Control logic 40 may include a processor and/or microcontroller that executes one or more routines (e.g., firmware). One or more routines may determine control parameters of imaging device 10 based on the received image statistics.

Referring to fig. 13, an electronic device 1000 according to an embodiment of the invention includes one or more processors 20, a memory 30, and one or more programs 31. Where one or more programs 31 are stored in memory 30 and configured to be executed by one or more processors 20. The program 31 includes instructions for executing the image processing method of any one of the above embodiments.

For example, the program 31 includes instructions for executing an image processing method described in the following steps:

For another example, the program 31 further includes instructions for executing an image processing method described in the following steps:

0131: identifying a face area in each frame of scene image;

The computer-readable storage medium of an embodiment of the present invention includes a computer program for use in conjunction with the image-enabled electronic device 1000. The computer program may be executed by the processor 20 to perform the image processing method of any of the above embodiments.

As another example, the computer program may also be executable by the processor 20 to perform an image processing method as described in the following steps:

0131: identifying a face area in each frame of scene image;

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. An image processing method for processing a merged image, wherein the merged image is formed by fusing a predetermined two-dimensional background image and a predetermined foreground image which can follow the action of a current user, and the predetermined foreground image is rendered according to the action information of a character area image of the current user in a scene image, the image processing method comprising:

when the position change is larger than a preset threshold value, judging whether the duration time of the position change is larger than preset time or not; wherein the position change larger than a preset threshold value represents that the human image of the user cannot be extracted from the current frame merged image;

and when the duration is less than the preset time, taking the merged image of the previous frame or the preset two-dimensional background image as a current frame merged image in the duration so as to keep the current frame merged image consistent with the merged image in the whole process, and outputting a video image by using the multi-frame merged image.

2. The image processing method according to claim 1, characterized in that the image processing method further comprises:

acquiring scene images of multiple frames of current users at a preset frequency;

collecting a plurality of frames of depth images of the current user at the preset frequency;

processing each frame of the scene image and each frame of the depth image to extract motion information of the current user;

rendering the predetermined foreground image according to the motion information so that each frame of the predetermined foreground image follows the motion of the current user; and

and fusing the rendered each frame of the preset foreground image and the preset two-dimensional background image to obtain a plurality of frames of combined images so as to output a video image.

3. The image processing method according to claim 1, characterized in that the image processing method further comprises:

and when the duration is longer than the preset time, taking the previous frame combined image or the preset two-dimensional background image as the current frame combined image in the preset time, and fusing the preset foreground image of the current frame and the preset two-dimensional background image of the current frame after the preset time to form the current frame combined image.

4. The image processing method according to claim 2, wherein the step of acquiring the depth images of the plurality of frames of the current user at the preset frequency comprises:

projecting structured light towards the current user;

shooting a plurality of frames of structured light images modulated by the current user at the preset frequency; and

and demodulating phase information corresponding to each pixel of each frame of the structured light image to obtain a plurality of frames of the depth image.

5. The method according to claim 4, wherein the step of demodulating the phase information corresponding to each pixel of each frame of the structured-light image to obtain a plurality of frames of the depth image comprises:

demodulating phase information corresponding to each pixel in each frame of the structured light image;

converting the phase information into depth information; and

and generating the depth image according to the depth information.

6. The image processing method according to claim 1, wherein the predetermined foreground image includes a predetermined foreground image in two dimensions and/or three dimensions, the predetermined foreground image including at least one of a virtual character, a real character, an animal and a plant, the real character excluding the current user itself;

the predetermined two-dimensional background image can be randomly selected or selected by the current user.

7. The image processing method according to claim 1, wherein the motion information includes at least one of an expression and a limb motion of the current user.

8. An image processing apparatus for processing a merged image, the merged image being formed by fusing a predetermined two-dimensional background image with a predetermined foreground image that can follow a motion of a current user, the predetermined foreground image being rendered based on motion information of a person region image of the current user in a scene image, the image processing apparatus comprising a processor configured to:

and when the duration is less than the preset time, taking the combined image of the previous frame or the preset two-dimensional background image as a current frame combined image in the duration so as to keep the current frame combined image consistent with the combined image in the whole process.

9. The image processing apparatus according to claim 8, characterized by further comprising:

the system comprises a visible light camera, a video camera and a video processing unit, wherein the visible light camera is used for collecting scene images of multiple frames of current users at a preset frequency;

the depth image acquisition component is used for acquiring a plurality of frames of depth images of the current user at the preset frequency;

the processor is further configured to:

10. The image processing apparatus of claim 8, wherein the processor is further configured to:

11. The image processing apparatus of claim 9, wherein the depth image acquisition assembly comprises a structured light projector and a structured light camera, the structured light projector for projecting structured light to the current user;

the structured light camera is configured to:

12. The image processing apparatus of claim 11, wherein the structured light camera is further configured to:

converting the phase information into depth information; and

and generating the depth image according to the depth information.

13. The apparatus according to claim 8, wherein the predetermined foreground image includes a predetermined foreground image in two dimensions and/or three dimensions, the predetermined foreground image including at least one of a virtual character, a real character, an animal or plant, the real character excluding the current user itself;

14. The apparatus according to claim 8, wherein the motion information includes at least one of an expression and a limb motion of the current user.

15. An electronic device, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the image processing method of any of claims 1 to 7.

16. A computer-readable storage medium comprising a computer program for use in conjunction with an electronic device capable of capturing images, the computer program being executable by a processor to perform the image processing method of any one of claims 1 to 7.