CN111105489A

CN111105489A - Data synthesis method and apparatus, storage medium, and electronic apparatus

Info

Publication number: CN111105489A
Application number: CN201911344786.8A
Authority: CN
Inventors: 刘思阳
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-12-23
Filing date: 2019-12-23
Publication date: 2020-05-05

Abstract

The application provides a data synthesis method and device, a storage medium and an electronic device, wherein the method comprises the following steps: constructing attitude information of a three-dimensional object model for constructing a target object, wherein the attitude information is shaft angle information of joint points in the three-dimensional object model; determining model parameter information of the three-dimensional object model according to the posture information, the shape information of the three-dimensional object model and lens information corresponding to the three-dimensional object model, wherein the shape information is used for controlling the body type of the three-dimensional object model, and the lens information is used for controlling the size of the three-dimensional object model; constructing an initial model of the three-dimensional object model according to the model parameter information; rendering the initial model to obtain a three-dimensional object model; and synthesizing the three-dimensional object model and the background image into image data of the target object. The method and the device solve the problem that the acquisition cost of the training data is high because the training data needs high-precision optical capture equipment for acquisition in a three-dimensional object reconstruction mode in the related technology.

Description

Data synthesis method and apparatus, storage medium, and electronic apparatus

Technical Field

The present application relates to the field of computers, and in particular, to a data synthesis method and apparatus, a storage medium, and an electronic apparatus.

Background

3D (3 Dimensions) object reconstruction (e.g., human body reconstruction), which is a task in computer vision, is to reconstruct or restore a 3D model of an object's pose from a single picture or video, and can be applied to a variety of application fields, such as avatars, interactive games, and the like.

At present, the 3D object reconstruction algorithm given to a deep learning model has very strong dependence on training data, a good-effect model can be trained only by a large amount of data, but the training data needs high-precision optical capture equipment for acquisition, and the acquisition cost is very high.

Therefore, in the three-dimensional object reconstruction mode in the related art, the problem of high acquisition cost of training data is caused because the training data needs to be acquired by high-precision optical capture equipment.

Disclosure of Invention

The embodiment of the application provides a data synthesis method and device, a storage medium and an electronic device, which are used for solving the problem that the acquisition cost of training data is high due to the fact that the training data needs to be acquired by high-precision optical capture equipment in a three-dimensional object reconstruction mode in the related technology.

According to an aspect of an embodiment of the present application, there is provided a data synthesis method including: constructing attitude information of a three-dimensional object model for constructing a target object, wherein the attitude information is shaft angle information of joint points in the three-dimensional object model; determining model parameter information of the three-dimensional object model according to the posture information, the shape information of the three-dimensional object model and lens information corresponding to the three-dimensional object model, wherein the shape information is used for controlling the body type of the three-dimensional object model, and the lens information is used for controlling the size of the three-dimensional object model; constructing an initial model of the three-dimensional object model according to the model parameter information; rendering the initial model to obtain a three-dimensional object model; and synthesizing the three-dimensional object model and the background image into image data of the target object.

According to another aspect of the embodiments of the present application, there is provided a data synthesis apparatus including: the system comprises a construction unit, a processing unit and a display unit, wherein the construction unit is used for constructing attitude information of a three-dimensional object model for constructing a target object, and the attitude information is shaft angle information of a joint point in the three-dimensional object model; a determining unit for determining model parameter information of the three-dimensional object model according to the posture information, shape information of the three-dimensional object model and lens information corresponding to the three-dimensional object model, wherein the shape information is used for controlling the body type of the three-dimensional object model, and the lens information is used for controlling the size of the three-dimensional object model; the building unit is used for building an initial model of the three-dimensional object model according to the model parameter information; the rendering unit is used for rendering the initial model to obtain a three-dimensional object model; and the synthesis unit is used for synthesizing the three-dimensional object model and the background image into image data of the target object.

Optionally, the apparatus further comprises: a generating unit, the constructing unit including: the generating module is used for randomly generating attitude information for constructing a three-dimensional object model of the target object; and a generation unit for randomly generating shape information in the three-dimensional object model and lens information corresponding to the three-dimensional object model.

Optionally, the determining unit includes: the input module is used for inputting the posture information, the shape information and the lens information into a skin multi-person linear SMPL model to obtain model parameter information output by the SMPL model, wherein the model parameter information comprises: the three-dimensional object model comprises two-dimensional coordinate information of joint points, three-dimensional coordinate information of the joint points, rotation information of the joint points and three-dimensional coordinate information of triangular vertexes, wherein the triangular vertexes are used for determining a mesh for constructing the three-dimensional object model.

Optionally, the determining unit includes: the first determining module is used for determining joint point information of joint points according to the posture information, the shape information and the lens information, wherein the joint point information of the joint points comprises: two-dimensional coordinate information of the joint point, three-dimensional coordinate information of the joint point and rotation information of the joint point; the second determining module is used for determining three-dimensional coordinate information of a triangular vertex of the three-dimensional object model according to the posture information, the shape information and the lens information, wherein the model parameter information comprises: the three-dimensional object model comprises two-dimensional coordinate information of joint points, three-dimensional coordinate information of the joint points, rotation information of the joint points and three-dimensional coordinate information of triangular vertexes, wherein the triangular vertexes are used for constructing a mesh of the three-dimensional object model.

Optionally, the construction unit comprises: the third determining module is used for determining a model mesh of the three-dimensional object model according to the three-dimensional coordinate information of the triangular vertex of the three-dimensional object model, wherein the model parameter information comprises the three-dimensional coordinate information of the triangular vertex, and the model mesh is a triangular area formed by the triangular vertex; and the filling module is used for carrying out three-dimensional filling on the model mesh to obtain an initial model.

Optionally, the rendering unit comprises: and the rendering module is used for rendering the initial model through the UV map to obtain a three-dimensional object model.

Optionally, the synthesis unit comprises: the acquisition module is used for acquiring a background image matrix and a background depth matrix of a background image; the fourth determining module is used for determining an image origin of the background image according to the background depth matrix, wherein the image origin is a point in the background image, and the distance between the depth value and the average depth of the background depth matrix is smaller than or equal to the depth distance threshold; and the merging module is used for setting the image origin as a point on the background image corresponding to the model origin of the three-dimensional object model, merging the three-dimensional object model and the background image and obtaining the image data of the target object.

Optionally, the apparatus further comprises: the unit of acquireing, setting unit and binarization unit merge the module and include: the merging submodule is used for acquiring an initial depth matrix of the three-dimensional object model before merging the three-dimensional object model and the background image to obtain image data of the target object; the device comprises a setting unit, a calculating unit and a calculating unit, wherein the setting unit is used for setting a first position point corresponding to an object area in an initial depth matrix as a target value and setting a second position point corresponding to a non-object area in the initial depth matrix as a fixed value to obtain a model depth matrix of the three-dimensional object model, the object area is an area on a projection plane where the three-dimensional object model is projected, the non-object area is an area except the object area in the projection plane, the target value is the distance between one or more points projected to the second position point on the three-dimensional object model and the projection plane, and the point closest to the projection plane is the distance between the projection; the binarization unit is used for carrying out binarization operation on the model depth matrix to obtain a mask matrix of the three-dimensional object model on the projection surface; and the merging submodule is used for merging the three-dimensional object model and the background image according to the mask matrix to obtain the image data of the target object.

According to a further embodiment of the present invention, a computer-readable storage medium is also provided, in which a computer program is stored, wherein the computer program is configured to carry out the steps of any of the above-described method embodiments when executed.

According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.

According to the invention, the attitude information of the three-dimensional object model for constructing the target object is constructed in a mode of constructing the attitude information of the three-dimensional object model for constructing the target object, wherein the attitude information is the axial angle information of the joint points in the three-dimensional object model; determining model parameter information of the three-dimensional object model according to the posture information, the shape information of the three-dimensional object model and lens information corresponding to the three-dimensional object model, wherein the shape information is used for controlling the body type of the three-dimensional object model, and the lens information is used for controlling the size of the three-dimensional object model; constructing an initial model of the three-dimensional object model according to the model parameter information; rendering the initial model to obtain a three-dimensional object model; the three-dimensional object model and the background image are synthesized into the image data of the target object, the attitude information of the three-dimensional object model is constructed, and the three-dimensional object model is constructed according to the attitude information, the shape information and the lens information of the three-dimensional object model, so that a large amount of training data can be generated instead of an optical capture device, the technical effect of reducing the acquisition cost of the training data is achieved, the problem that the acquisition cost of the training data is high due to the fact that the training data needs to be acquired by the high-precision optical capture device in the related technology is solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a block diagram of an alternative server hardware configuration according to an embodiment of the present application;

FIG. 2 is a flow diagram of an alternative data synthesis method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an alternative data synthesis method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of another alternative data synthesis method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of yet another alternative data synthesis method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an alternative initial model according to an embodiment of the present application;

FIG. 7 is a schematic diagram of an alternative three-dimensional object model in accordance with embodiments of the present application; and the number of the first and second groups,

fig. 8 is a block diagram of an alternative data synthesis apparatus according to an embodiment of the present application.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

According to an aspect of an embodiment of the present application, there is provided a data synthesis method. Alternatively, the method may be performed in a server, a user terminal or a similar computing device. Taking an example of an application running on a server, fig. 1 is a block diagram of a hardware structure of an optional server according to an embodiment of the present application. As shown in fig. 1, the server 10 may include one or more processors 102 (only one is shown in fig. 1), wherein the processors 102 may include, but are not limited to, a processing device such as an MCU (micro controller Unit) or an FPGA (Field Programmable Gate Array) and a memory 104 for storing data, and optionally, the server may further include a transmission device 106 for communication function and an input/output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and is not intended to limit the structure of the server. For example, the server 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store a computer program, for example, a software program of an application software and a module, such as a computer program corresponding to the data synthesis method in the embodiment of the present application, and the processor 102 executes the computer program stored in the memory 104 to execute various functional applications and data processing, i.e., to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 104 may further include memory located remotely from processor 102, which may be connected to server 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 10. In one example, the transmission device 106 includes a NIC (Network Interface Controller) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be an RF (Radio Frequency) module, which is used for communicating with the internet in a wireless manner.

In this embodiment, a data synthesis method running on the server is provided, and fig. 2 is a flowchart of an alternative data synthesis method according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:

step S202, constructing attitude information of a three-dimensional object model for constructing a target object, wherein the attitude information is shaft angle information of a joint point in the three-dimensional object model;

step S204, determining model parameter information of the three-dimensional object model according to the posture information, the shape information of the three-dimensional object model and lens information corresponding to the three-dimensional object model, wherein the shape information is used for controlling the body type of the three-dimensional object model, and the lens information is used for controlling the size of the three-dimensional object model;

step S206, constructing an initial model of the three-dimensional object model according to the model parameter information;

step S208, rendering the initial model to obtain a three-dimensional object model;

in step S210, the three-dimensional object model and the background image are synthesized into image data of the target object.

Alternatively, the executing subject of the above steps may be a server, a user terminal, and the like, but is not limited thereto, and other devices capable of performing data synthesis may be used to execute the method in the embodiment of the present application.

Alternatively, the data synthesis method in the embodiment of the present application may be applied to, but not limited to, an integration algorithm in an AR (Augmented Reality) solution at a mobile end, a basic algorithm of an avatar, or a motion capture scheme in animation production and movie production, and the like.

According to the method, the posture information of the three-dimensional object model used for constructing the target object is constructed, the posture information of the three-dimensional object model is constructed, and the three-dimensional object model is constructed according to the posture information, the shape information and the lens information of the three-dimensional object model, so that a large amount of training data can be generated instead of an optical capture device, the problem of high acquisition cost of the training data due to the fact that the training data needs to be acquired by the high-precision optical capture device in the related technology is solved, and the acquisition cost of the training data is reduced.

The data synthesis method in the embodiment of the present application will be described below with reference to fig. 2.

In step S202, pose information for constructing a three-dimensional object model of the target object is constructed, wherein the pose information is axial angle information of a joint point in the three-dimensional object model.

The constructed three-dimensional object model may correspond to a target object. The target object may be a human-shaped object, an animal object, or other object having a joint. The target object may contain one or more joints, and the joint points in the three-dimensional object model may correspond to one or more of the one or more joints.

For example, the human body may have L joints, the number of movable joints is M, and the number of joint points of the 3D humanoid model may be N, where N ≦ M < L.

In order to construct a three-dimensional object model of a target object, pose information for constructing the three-dimensional object model of the target object may be first constructed, wherein the pose information is axis angle information of a joint point in the three-dimensional object model. The three-dimensional object model may have a plurality of joint points, which may be represented by a Pose matrix.

The pose information may be constructed in various ways, for example, a candidate axis angle may be extracted from a predetermined set of axis angles as the axis angle of the joint point. The corresponding sets of axis angles may be the same or different for different joint points.

In addition to the pose information, shape information of the three-dimensional object model, which may be used to control the body type (e.g., height, slimness, arm thickness, leg thickness, etc.) of the three-dimensional object model, and lens information, which is used to control the size (e.g., zoom factor, x-axis displacement, and y-axis displacement) of the three-dimensional object model, may be acquired.

The shape information and the shot information of the three-dimensional object model may be fixed, or a candidate shape extracted from the shape set may be used as the shape of the three-dimensional object model, and/or a candidate shot extracted from the shot set may be used as the shot of the three-dimensional object model.

As an alternative embodiment, posture information for constructing a three-dimensional object model of the target object is randomly generated; shape information in the three-dimensional object model and lens information corresponding to the three-dimensional object model are randomly generated.

The three-dimensional object model may have three model parameters of pose information, shape information, and lens information, which may be randomly generated so that it may generate various actions. The random generation may be: pose information, shape information, and lens information of the three-dimensional object model are randomly generated within a predetermined range.

For example, as shown in fig. 3, the whole input of the data synthesis flow is a left vertical row, which may include:

(1) attitude information (Pose matrix)

The input dose matrix P (24 × 3) indicates axis angle information (rotation angles with respect to three coordinate axes) indicated by 24 joint points Local, and it can be understood that various motions of the 3D model are controlled by the posture information.

(2) Shape information (Shape matrix)

The input Shape matrix S (1 × 10) represents a Shape vector subjected to the dimensionality reduction by PCA (Principal Component Analysis). The Shape matrix used to control the three-dimensional object model may be multidimensional, and may be subjected to a dimensionality reduction process by PCA to obtain a reduced-dimensionality Shape matrix (a 10-dimensional Shape vector).

(3) Lens information (Cams)

The input Cams (1 × 3) represents the size of the 3D model and the offset in the x, y coordinate system, including:

1) human body scaling factor s_b；

2) X-axis displacement o of human body_xDisplacement on the x-axis relative to the origin;

3) human y-axis displacement o_yIn the y-axisDisplacement of the upper phase relative to the origin.

Through the embodiment, the posture information, the shape information and the lens information of the three-dimensional object model are randomly generated, and the randomly generated Pose data is utilized, so that the 3D object models in various postures can be generated, the problem that the existing data postures are not diverse enough is solved, and the effect of improving the various information of the posture information is achieved.

In step S204, model parameter information of the three-dimensional object model is determined from the posture information, the shape information of the three-dimensional object model, and the lens information corresponding to the three-dimensional object model.

According to the attitude information, the shape information and the lens information of the three-dimensional object model, the model parameter information of the three-dimensional object model can be determined. The model parameters are parameters required for constructing the three-dimensional object model, and may include, but are not limited to, at least one of the following: shape information, lens information, joint point coordinate information, joint point rotation information, and triangle vertex information.

As shown in fig. 4, the human body is formed by a triangle formed by a plurality of triangle vertices, the points of the circle are the joints, and the information of each joint can be shown in fig. 5.

As an alternative embodiment, determining model parameter information of the three-dimensional object model according to the pose information, the shape information of the three-dimensional object model, and the lens information corresponding to the three-dimensional object model includes: according to the posture information, the shape information and the lens information, joint point information of joint points is determined, wherein the joint point information of the joint points comprises the following steps: two-dimensional coordinate information of the joint point, three-dimensional coordinate information of the joint point and rotation information of the joint point; determining three-dimensional coordinate information of a triangular vertex of the three-dimensional object model according to the posture information, the shape information and the lens information, wherein the model parameter information comprises: the three-dimensional object model comprises two-dimensional coordinate information of joint points, three-dimensional coordinate information of the joint points, rotation information of the joint points and three-dimensional coordinate information of triangular vertexes, wherein the triangular vertexes are used for constructing a mesh of the three-dimensional object model.

From the posture information, the shape information, and the lens information, unit information of the structural unit of the three-dimensional object model (for example, coordinate information and rotation information of the joint point, triangle vertex information) can be determined, thereby determining the three-dimensional object model.

The method may first determine joint point information of a joint point according to pose information, shape information, and lens information of a three-dimensional object model, the joint point information of the joint point including: 2D coordinate information of the joint point, 3D coordinate information of the joint point and rotation information of the joint point; then, three-dimensional coordinate information of the triangular vertex of the three-dimensional object model is determined according to the posture information, the shape information and the lens information of the three-dimensional object model.

The triangle vertices may be used to construct a mesh of the three-dimensional object model, i.e., triangles of the three-dimensional object model.

According to the embodiment, the joint point information of the joint point is determined according to the posture information, the shape information and the lens information, and then the three-dimensional coordinate information of the triangular vertex is determined, so that the accuracy of determining the model parameter information of the three-dimensional object model can be ensured.

As an alternative embodiment, determining model parameter information of the three-dimensional object model according to the pose information, the shape information of the three-dimensional object model, and the lens information corresponding to the three-dimensional object model includes: inputting the posture information, the shape information and the lens information into a skin multi-person linear SMPL model to obtain model parameter information output by the SMPL model, wherein the model parameter information comprises: the three-dimensional object model comprises two-dimensional coordinate information of joint points, three-dimensional coordinate information of the joint points, rotation information of the joint points and three-dimensional coordinate information of triangular vertexes, wherein the triangular vertexes are used for determining a mesh for constructing the three-dimensional object model.

In order to determine the model parameter information of the three-dimensional object model, an existing algorithm model, for example, a SMPL (Skinned Multi-Person Linear) model, may be used to obtain the model parameter information of the three-dimensional object model.

The obtained pose information, shape information, and lens information of the three-dimensional object model may be input to the SMPL model to obtain 2D/3D articulated point coordinate information, triangle vertex information, and rotation information of the articulated point.

The inside of the SMPL model is matrix operation, the 2D/3D coordinates of the triangle vertex and the joint point can be obtained through the matrix operation, the two operation processes can be relatively independent, and the coordinate information of the triangle vertex and the joint point is linked and exists in a pre-trained SMPL matrix.

For example, as shown in fig. 3, the joint coordinate information Joints output by the SMPL model may be obtained by inputting the posture information (the position matrix P (24 × 3)), the Shape information (the Shape matrix S (1 × 10)), and the shot information (Cams (1 × 3)) to the SMPL model, and include: 2D joint coordinate information (24 × 2) and 3D joint coordinate information (24 × 3); triangular vertex information Verts matrix V (6890 × 3); rotation information rotation matrix R (24 × 3 × 3), where 6890 is the number of triangle vertices, and 6890 triangle vertices can construct 10000 triangles.

By the embodiment, the SMPL model is used for determining the model parameter information of the three-dimensional object model, so that the efficiency of determining the model parameter information can be improved.

In step S206, an initial model of the three-dimensional object model is constructed based on the model parameter information.

The model parameter information includes three-dimensional coordinate information of the triangle vertices, and the model mesh of the three-dimensional object model is a triangular region composed of the triangle vertices. From the determined model parameter information, an initial model of the three-dimensional object model (initial object model) may be constructed. The initial object model is an object model that is not rendered.

As an alternative embodiment, constructing an initial model of the three-dimensional object model according to the model parameter information includes: determining a model mesh of the three-dimensional object model according to three-dimensional coordinate information of triangular vertexes of the three-dimensional object model, wherein the model parameter information comprises the three-dimensional coordinate information of the triangular vertexes, and the model mesh is a triangular area formed by the triangular vertexes; and carrying out three-dimensional filling on the model mesh to obtain an initial model.

According to the three-dimensional coordinate information of the triangle vertex, a model mesh (triangle) of the three-dimensional object model can be determined, and the initial model of the three-dimensional object model can be obtained by three-dimensionally filling the model mesh.

In addition to three-dimensional filling of the model mesh, a depth matrix and a mask matrix of the three-dimensional object model may be obtained for subsequent model processing, e.g., merging between the three-dimensional object model and a background image, etc.

As an alternative embodiment, obtaining the model depth matrix of the three-dimensional object model comprises: acquiring an initial depth matrix of a three-dimensional object model; setting a first position point corresponding to an object area in the initial depth matrix as a target value, and setting a second position point corresponding to a non-object area in the initial depth matrix as a fixed value to obtain a model depth matrix of the three-dimensional object model, wherein the object area is an area on a projection plane where the three-dimensional object model is projected, the non-object area is an area except the object area in the projection plane, and the target value is the distance between one or more points projected to the second position point on the three-dimensional object model and the projection plane, and the point closest to the projection plane is the distance between the projection plane and the target point.

The model depth matrix of the three-dimensional object model corresponds to a depth matrix of the three-dimensional object model when the three-dimensional object model is viewed from a predetermined direction (e.g., in the z-axis direction, or in the opposite direction to the z-axis) (in this case, the projection plane may be a plane in which the x-axis and the y-axis lie). The size of the depth matrix is: w is a_3d×h_3dX 1, the value of each element point (pixel point) and the x-axis and y-axis coordinates in the three-dimensional object model are w_3d×h_3dOf the points (2) above, the difference between the maximum and minimum values of the z-axis is correlated.

For example, each triangle vertex can be filled in three dimensions, the filled value is the coordinate of the Z axis, and a w is obtained_3d×h_3dX 1 matrix F_v. A w may be initialized_3d×h_3dX 1 matrix D, one position D on D_x-y(x columns, y rows) corresponding to F_vOf (z) corresponding to position_max-z_min) Vector V of dimension_x-yWhen V is_x-yAnd 0, indicating a non-human region, d_x-y255; when V is_x-yAnd not 0, where there are multiple facets, the maximum value is chosen as d_x-yIs taken (value of the foremost point), the processing on the matrix D is completedAfter each point of (a), this matrix D is the depth matrix of the 3D object model.

After the depth matrix D is obtained, binarization operation may be performed on the depth matrix D to obtain a Mask matrix M of the projection surface.

For example, a Mask matrix M of the projection plane can be obtained by setting a point of the depth matrix D other than 255 to 1 and a point of 255 to 0.

According to the embodiment, the model mesh of the three-dimensional object model is determined according to the three-dimensional coordinate information of the triangular vertex, the model mesh is filled in a three-dimensional mode to obtain the initial model, and the capability of the three-dimensional object model for representing the target object can be improved.

In step S208, the initial model is rendered to obtain a three-dimensional object model.

The resulting initial model is achromatic, e.g., black and white, or grayscale (as shown in fig. 6). For the authenticity of the constructed three-dimensional object model, the initial model may be rendered to obtain the three-dimensional object model.

As an alternative embodiment, rendering the initial model to obtain the three-dimensional object model includes: and rendering the initial model through the UV map to obtain a three-dimensional object model.

After all parameters of the 3D object model are obtained, a UVMap (UV map) may be rendered on the initial model by Unity, resulting in the 3D object model.

For example, as shown in fig. 6 and 7, fig. 6 is an initial model without UVmap, and fig. 7 is a 3D object model with UVmap.

According to the embodiment, the initial model is rendered through the UV map to obtain the three-dimensional object model, and the visual reality of the obtained three-dimensional object model can be improved.

After the three-dimensional object model is obtained, the three-dimensional object model and the background image may be merged to obtain image data of the target object.

As an alternative embodiment, synthesizing the three-dimensional object model and the background image into image data of the target object includes: acquiring a background image matrix and a background depth matrix of a background image; determining an image origin of the background image according to the background depth matrix, wherein the image origin is a point in the background image, and the distance between the depth value and the average depth of the background depth matrix is less than or equal to a depth distance threshold value; and setting the image origin as a point on the background image corresponding to the model origin of the three-dimensional object model, and combining the three-dimensional object model and the background image to obtain the image data of the target object.

For the background image, a background image matrix of the background image and a background depth matrix of the background image may be obtained. The background image may be a real background picture or a background picture obtained by picture processing. The background picture may be selected from a picture database, which may be a prior depth data set, by random selection, or by other means, including: background pictures of RGB images and depth information.

For example, the Image matrix of the background picture is: i is_b(w_i×h_iX 3), the background Depth (Depth) matrix is: d_b(w_i×h_i)。

After the background depth matrix of the background image is acquired, the image origin of the background image can be determined. The image origin P of the background image can be calculated from the depth map: calculating a background Depth Depth matrix D_bAnd finding a point having the smallest distance from the average depth as the origin P.

After the image origin P is obtained, the origin of the three-dimensional object model may be placed on the origin P, and the merged image data is obtained. The origin of the three-dimensional object model may be an origin of a person, which may be one of 24 joint points, for example, the Pelvis (Pelvis).

When the three-dimensional object model and the background image are combined, the combination can be performed according to the mask matrix of the three-dimensional object model.

As an alternative embodiment, the merging the three-dimensional object model and the background image to obtain the image data of the target object includes: and combining the three-dimensional object model and the background image according to the mask matrix to obtain the image data of the target object.

The RGB image of the background image and the depth image of the three-dimensional object model may be merged only with the portion inside the Mask region, that is, the portion inside the Mask region of the RGB image is replaced with the information of the 3D object model, and the portion outside the Mask region retains the original information.

By the embodiment, the three-dimensional object model and the background image are merged according to the mask matrix, merging efficiency can be improved, and authenticity of merged image data is guaranteed.

The combined image data may be used as training data to perform a training process of a 3D object reconstruction algorithm, and may also be applied to other scenes, which is not specifically limited in this embodiment.

The above-described data synthesis method is explained below with reference to an alternative example. Training data of a 3D human body reconstruction algorithm for a deep learning model needs high-precision optical capture equipment for acquisition, the acquisition cost is high, the existing data set is single in action, and samples with large action amplitude and high difficulty in action are not included, so that the precision of model training is greatly influenced.

As shown in fig. 3, the inputs to the data set generation flow are:

(1) attitude information, a Pose matrix P (24 × 3);

(2) shape information, Shape matrix S (1 × 10);

(3) shot information, Cams (1 × 3);

(4) background picture, Image matrix I_b(w_i×h_i×3)；

(5) Background Depth, Depth matrix D_b(w_i×h_i×3)。

The output of the dataset generation process is:

(1) lens information Cams (1 × 3);

(2) shape information, Shape matrix S (1 × 10);

(3) joint point coordinate information Joints, including:

2D joint point coordinate information (24 × 2);

3D joint coordinate information (24 × 3).

(4) Triangle vertex information, Verts matrix V (6890 × 3);

(5) rotation information, rotation matrix R (24 × 3 × 3);

(6) mask matrix for projection plane, Mask matrix M (w)_3d×h_3d)；

(7) Depth matrix of projection surface, Depth matrix D (w)_3d×h_3d)；

(8) Composite Picture matrix, Image matrix I (w)_i×h_i×3)；

(9) Synthesizing a Depth matrix, Depth matrix D (w)_i×h_i)。

The data synthesis method in the present example is a human body reconstruction training data synthesis method, which generates a 3D human body model with arbitrary motion in a large scale through an SMPL model and renders a normal 3D model through UV mapping; through the 3D model, 2D/3D joint point coordinate information, 3D triangle vertex information, joint point rotation information, mask information of the 3D human body model on a projection surface and depth information of the 3D human body model on the projection surface can be obtained by setting a lens angle; and then, the data set with the existing depth information is fused to obtain a human body reconstruction data set with a real scene as a background.

By the example, a large amount of training data can be generated instead of the optical capturing device, and the cost is reduced; training samples of any action can be generated, and the requirement of high-difficulty action is met.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

According to another aspect of the embodiments of the present application, there is provided a data synthesis apparatus for implementing the above data synthesis method. Optionally, the apparatus is used to implement the above embodiments and preferred embodiments, and details are not repeated for what has been described. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 8 is a block diagram of an alternative data synthesis apparatus according to an embodiment of the present application, and as shown in fig. 8, the apparatus includes:

(1) a constructing unit 802, configured to construct pose information of a three-dimensional object model used for constructing a target object, where the pose information is axial angle information of a joint point in the three-dimensional object model;

(2) a determining unit 804, connected to the constructing unit 802, for determining model parameter information of the three-dimensional object model according to the posture information, shape information of the three-dimensional object model, and lens information corresponding to the three-dimensional object model, where the shape information is used for controlling the body type of the three-dimensional object model, and the lens information is used for controlling the size of the three-dimensional object model;

(3) the building unit 806 is connected with the determining unit 804 and used for building an initial model of the three-dimensional object model according to the model parameter information;

(4) the rendering unit 808 is connected to the constructing unit 806 and is configured to render the initial model to obtain a three-dimensional object model;

(5) and a synthesizing unit 810 connected to the rendering unit 808, for synthesizing the three-dimensional object model and the background image into image data of the target object.

Alternatively, the constructing unit 802 may be used in step S202 in the foregoing embodiment, the determining unit 804 may be used in step S204 in the foregoing embodiment, the constructing unit 806 may be used to execute step S206 in the foregoing embodiment, the rendering unit 808 may be used to execute step S208 in the foregoing embodiment, and the synthesizing unit 810 may be used to execute step S210 in the foregoing embodiment.

As an alternative embodiment, the apparatus further comprises: the generating unit, constructing unit 802 includes: a generation module for generating, wherein,

(1) the generating module is used for randomly generating attitude information of a three-dimensional object model for constructing a target object;

(2) and a generation unit for randomly generating shape information in the three-dimensional object model and lens information corresponding to the three-dimensional object model.

As an alternative embodiment, the determining unit 804 includes:

(1) the input module is used for inputting the posture information, the shape information and the lens information into the SMPL model to obtain model parameter information output by the SMPL model, wherein the model parameter information comprises: the three-dimensional object model comprises two-dimensional coordinate information of joint points, three-dimensional coordinate information of the joint points, rotation information of the joint points and three-dimensional coordinate information of triangular vertexes, wherein the triangular vertexes are used for determining a mesh for constructing the three-dimensional object model.

As an alternative embodiment, the determining unit 804 includes:

(1) the first determining module is used for determining joint point information of joint points according to the posture information, the shape information and the lens information, wherein the joint point information of the joint points comprises: two-dimensional coordinate information of the joint point, three-dimensional coordinate information of the joint point and rotation information of the joint point;

(2) the second determining module is used for determining three-dimensional coordinate information of a triangular vertex of the three-dimensional object model according to the posture information, the shape information and the lens information, wherein the model parameter information comprises: the three-dimensional object model comprises two-dimensional coordinate information of joint points, three-dimensional coordinate information of the joint points, rotation information of the joint points and three-dimensional coordinate information of triangular vertexes, wherein the triangular vertexes are used for constructing a mesh of the three-dimensional object model.

As an alternative embodiment, the building unit 806 includes:

(1) the third determining module is used for determining a model mesh of the three-dimensional object model according to the three-dimensional coordinate information of the triangular vertex of the three-dimensional object model, wherein the model parameter information comprises the three-dimensional coordinate information of the triangular vertex, and the model mesh is a triangular area formed by the triangular vertex;

(2) and the filling module is used for carrying out three-dimensional filling on the model mesh to obtain an initial model.

As an alternative embodiment, the rendering unit 808 includes:

(1) and the rendering module is used for rendering the initial model through the UV map to obtain a three-dimensional object model.

As an alternative embodiment, the synthesis unit 810 includes:

(1) the acquisition module is used for acquiring a background image matrix and a background depth matrix of a background image;

(2) the fourth determining module is used for determining an image origin of the background image according to the background depth matrix, wherein the image origin is a point in the background image, and the distance between the depth value and the average depth of the background depth matrix is smaller than or equal to the depth distance threshold;

(3) and the merging module is used for setting the image origin as a point on the background image corresponding to the model origin of the three-dimensional object model, merging the three-dimensional object model and the background image and obtaining the image data of the target object.

As an alternative embodiment, the apparatus further comprises: the unit of acquireing, setting unit and binarization unit merge the module and include: the sub-modules are combined, wherein,

(1) the acquiring unit is used for acquiring an initial depth matrix of the three-dimensional object model before the three-dimensional object model and the background image are combined to obtain the image data of the target object;

(2) the device comprises a setting unit, a calculating unit and a calculating unit, wherein the setting unit is used for setting a first position point corresponding to an object area in an initial depth matrix as a target value and setting a second position point corresponding to a non-object area in the initial depth matrix as a fixed value to obtain a model depth matrix of the three-dimensional object model, the object area is an area on a projection plane where the three-dimensional object model is projected, the non-object area is an area except the object area in the projection plane, the target value is the distance between one or more points projected to the second position point on the three-dimensional object model and the projection plane, and the point closest to the projection plane is the distance between the projection;

(3) the binarization unit is used for carrying out binarization operation on the model depth matrix to obtain a mask matrix of the three-dimensional object model on the projection surface;

(4) and the merging submodule is used for merging the three-dimensional object model and the background image according to the mask matrix to obtain the image data of the target object.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

According to yet another aspect of embodiments herein, there is provided a computer-readable storage medium. Optionally, the storage medium has a computer program stored therein, where the computer program is configured to execute the steps in any one of the methods provided in the embodiments of the present application when the computer program is executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, constructing attitude information of a three-dimensional object model for constructing the target object, wherein the attitude information is the axial angle information of the joint points in the three-dimensional object model;

s2, determining model parameter information of the three-dimensional object model according to the posture information, the shape information of the three-dimensional object model and the lens information corresponding to the three-dimensional object model, wherein the shape information is used for controlling the body type of the three-dimensional object model, and the lens information is used for controlling the size of the three-dimensional object model;

s3, constructing an initial model of the three-dimensional object model according to the model parameter information;

s4, rendering the initial model to obtain a three-dimensional object model;

s5, the three-dimensional object model and the background image are synthesized into image data of the target object.

Optionally, in this embodiment, the storage medium may include, but is not limited to: a variety of media that can store computer programs, such as a usb disk, a ROM (Read-only Memory), a RAM (Random Access Memory), a removable hard disk, a magnetic disk, or an optical disk.

According to still another aspect of an embodiment of the present application, there is provided an electronic apparatus including: a processor (which may be the processor 102 in fig. 1) and a memory (which may be the memory 104 in fig. 1) having a computer program stored therein, the processor being configured to execute the computer program to perform the steps of any of the above methods provided in embodiments of the present application.

Optionally, the electronic apparatus may further include a transmission device (the transmission device may be the transmission device 106 in fig. 1) and an input/output device (the input/output device may be the input/output device 108 in fig. 1), wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s4, rendering the initial model to obtain a three-dimensional object model;

Optionally, for an optional example in this embodiment, reference may be made to the examples described in the above embodiment and optional implementation, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of synthesizing data, comprising:

constructing attitude information of a three-dimensional object model for constructing a target object, wherein the attitude information is shaft angle information of joint points in the three-dimensional object model;

determining model parameter information of the three-dimensional object model according to the posture information, shape information of the three-dimensional object model and lens information corresponding to the three-dimensional object model, wherein the shape information is used for controlling the body type of the three-dimensional object model, and the lens information is used for controlling the size of the three-dimensional object model;

constructing an initial model of the three-dimensional object model according to the model parameter information;

rendering the initial model to obtain the three-dimensional object model;

and synthesizing the three-dimensional object model and the background image into image data of the target object.

2. The method of claim 1,

constructing the pose information for constructing the three-dimensional object model of the target object comprises: randomly generating the pose information for constructing the three-dimensional object model of the target object;

the method further comprises the following steps: randomly generating the shape information in the three-dimensional object model and the lens information corresponding to the three-dimensional object model.

3. The method of claim 1, wherein determining the model parameter information for the three-dimensional object model from the pose information, the shape information for the three-dimensional object model, and the lens information corresponding to the three-dimensional object model comprises:

inputting the posture information, the shape information and the lens information into a skin multi-person linear SMPL model to obtain the model parameter information output by the SMPL model, wherein the model parameter information comprises: the three-dimensional object model comprises two-dimensional coordinate information of the joint points, three-dimensional coordinate information of the joint points, rotation information of the joint points and three-dimensional coordinate information of triangular vertexes, wherein the triangular vertexes are used for determining grids for constructing the three-dimensional object model.

4. The method of claim 1, wherein determining the model parameter information for the three-dimensional object model from the pose information, the shape information for the three-dimensional object model, and the lens information corresponding to the three-dimensional object model comprises:

determining joint point information of the joint point according to the posture information, the shape information and the lens information, wherein the joint point information of the joint point comprises: two-dimensional coordinate information of the joint point, three-dimensional coordinate information of the joint point and rotation information of the joint point;

determining three-dimensional coordinate information of a triangular vertex of the three-dimensional object model according to the posture information, the shape information and the lens information, wherein the model parameter information comprises: the two-dimensional coordinate information of the joint point, the three-dimensional coordinate information of the joint point, the rotation information of the joint point and the three-dimensional coordinate information of the triangular vertex, and the triangular vertex is used for constructing a mesh of the three-dimensional object model.

5. The method of claim 1, wherein constructing an initial model of the three-dimensional object model based on the model parameter information comprises:

determining a model mesh of the three-dimensional object model according to three-dimensional coordinate information of triangular vertexes of the three-dimensional object model, wherein the model parameter information comprises the three-dimensional coordinate information of the triangular vertexes, and the model mesh is a triangular area formed by the triangular vertexes;

and carrying out three-dimensional filling on the model mesh to obtain the initial model.

6. The method of claim 1, wherein rendering the initial model, resulting in the three-dimensional object model, comprises:

and rendering the initial model through a UV map to obtain the three-dimensional object model.

7. The method according to any one of claims 1 to 6, wherein synthesizing the three-dimensional object model and the background image into the image data of the target object comprises:

acquiring a background image matrix and a background depth matrix of the background image;

determining an image origin of the background image according to the background depth matrix, wherein the image origin is a point in the background image, and the distance between a depth value and the average depth of the background depth matrix is less than or equal to a depth distance threshold value;

and setting the image origin as a point on the background image corresponding to the model origin of the three-dimensional object model, and combining the three-dimensional object model and the background image to obtain the image data of the target object.

8. The method of claim 7,

before combining the three-dimensional object model and the background image to obtain the image data of the target object, the method further comprises: acquiring an initial depth matrix of the three-dimensional object model; setting a first position point corresponding to an object area in the initial depth matrix as a target value, and setting a second position point corresponding to a non-object area in the initial depth matrix as a fixed value, so as to obtain a model depth matrix of the three-dimensional object model, wherein the object area is an area on a projection plane where the three-dimensional object model is projected, the non-object area is an area other than the object area in the projection plane, and the target value is a distance between one or more points projected to the second position point on the three-dimensional object model and a point closest to the projection plane and the projection plane; carrying out binarization operation on the model depth matrix to obtain a mask matrix of the three-dimensional object model on the projection surface;

merging the three-dimensional object model and the background image to obtain the image data of the target object comprises: and merging the three-dimensional object model and the background image according to the mask matrix to obtain the image data of the target object.

9. A data synthesis apparatus, comprising:

a construction unit, configured to construct pose information of a three-dimensional object model for constructing a target object, where the pose information is axial angle information of a joint point in the three-dimensional object model;

a determination unit configured to determine model parameter information of the three-dimensional object model according to the pose information, shape information of the three-dimensional object model, and lens information corresponding to the three-dimensional object model, wherein the shape information is used to control a body type of the three-dimensional object model, and the lens information is used to control a size of the three-dimensional object model;

the building unit is used for building an initial model of the three-dimensional object model according to the model parameter information;

the rendering unit is used for rendering the initial model to obtain the three-dimensional object model;

and the synthesis unit is used for synthesizing the three-dimensional object model and the background image into the image data of the target object.

10. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 8 when executed.

11. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 8 by means of the computer program.