US20190057519A1 - Generating Synthetic Image Data - Google Patents
Generating Synthetic Image Data Download PDFInfo
- Publication number
- US20190057519A1 US20190057519A1 US15/727,108 US201715727108A US2019057519A1 US 20190057519 A1 US20190057519 A1 US 20190057519A1 US 201715727108 A US201715727108 A US 201715727108A US 2019057519 A1 US2019057519 A1 US 2019057519A1
- Authority
- US
- United States
- Prior art keywords
- image
- bag
- overlaid
- gan
- synthesized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims abstract description 70
- 238000001514 detection method Methods 0.000 claims abstract description 32
- 238000013528 artificial neural network Methods 0.000 claims description 43
- 239000000463 material Substances 0.000 claims description 7
- 238000012216 screening Methods 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 7
- 238000010801 machine learning Methods 0.000 claims description 5
- 238000003860 storage Methods 0.000 claims description 5
- 230000003595 spectral effect Effects 0.000 claims description 3
- 238000007670 refining Methods 0.000 claims 8
- 238000005286 illumination Methods 0.000 claims 3
- 238000005070 sampling Methods 0.000 claims 1
- 238000004088 simulation Methods 0.000 description 21
- 230000015572 biosynthetic process Effects 0.000 description 19
- 238000003786 synthesis reaction Methods 0.000 description 19
- 230000008569 process Effects 0.000 description 17
- 238000010586 diagram Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 10
- 238000011176 pooling Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 230000000007 visual effect Effects 0.000 description 5
- 238000005457 optimization Methods 0.000 description 4
- 210000000225 synapse Anatomy 0.000 description 4
- 238000013519 translation Methods 0.000 description 4
- 230000001143 conditioned effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000003750 conditioning effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000007123 defense Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 241000282414 Homo sapiens Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013329 compounding Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000000149 penetrating effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/003—Reconstruction from projections, e.g. tomography
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01V—GEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
- G01V5/00—Prospecting or detecting by the use of ionising radiation, e.g. of natural or induced radioactivity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/50—Lighting effects
- G06T15/503—Blending, e.g. for anti-aliasing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/30—Polynomial surface description
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/12—Edge-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
Definitions
- the disclosure is generally related to screening security systems for use in public or private applications and, more particularly, to methods, systems, devices, and other elements directed to screening an object.
- a screener's job is to check baggage for security threats prior to boarding a plane.
- the baggage is run through a detection device, such as a scanner, and with the aid of the scanner, the screener flags suspicious pieces of baggage that appear to contain an object that is a security threat. If the baggage is flagged as suspicious, the screener searches to the contents of the piece of baggage by hand to determine whether an object that is a security threat is present in the piece of baggage.
- a screener's job may be both difficult and monotonous. This difficulty and monotonousness may increase the chance that a piece of baggage that is a security threat gets through the screening process without being detected.
- FIG. 1 is a conceptual diagram illustrating an example computer system for generating synthetic data of objects.
- FIG. 2 is a conceptual diagram illustrating a projection of an object from a 3D representation to a 2D representation.
- FIG. 3 is another example of a conceptual diagram illustrating a projection of an object from a 3D representation to a 2D representation.
- FIG. 4 is a conceptual diagram illustrating an example operation of a generative adversarial network in accordance with implementations of this disclosure.
- FIG. 5 is a conceptual diagram illustrating an example encoder-decoder generator architecture.
- FIG. 6 is a flow diagram illustrating an example technique for generating synthetic image data.
- Synapse Technology Corporation (“Synapse”), a technology company currently based out of Palo Alto and the assignee of the present application, created a system that greatly modernizes the image intelligence industry with proprietary deep learning and computer vision technology.
- the system is described in U.S. provisional patent applications 62/532,865 and 62/532,821, both filed on Jul. 14, 2017.
- Synapse-created technology may be used in the airport security screening process to, among other things, increase accuracy significantly, while avoiding false positives and increasing the throughput of passengers at the checkpoint.
- the Synapse technology may be used in many other industries, such as those that relate to public and private security and defense.
- One technology area where for Synapse's technology may improve throughput relates to scanning of pieces of baggage, e.g. at an airport security checkpoint or another screening location.
- a system at an airport referred to as a Threat Image Projection System
- management personnel may determine whether the screener is paying attention.
- images of objects captured by a scanner may have an altered appearance relative to a standard camera image of those objects. For instance, an object that passes through an X-Ray scanner may have a green translucent appearance. This altered appearance of images generated by the scanner may make objects more difficult to identify or detect.
- the Threat Image Projection System may be utilized to better-familiarize a screener with the appearance of images of certain objects (e.g. objects that are a threat) captured by a scanner so that the screener may be more able to identify those objects.
- images of knives may be presented at a display of a baggage scanner to better familiarize screeners with the appearance of knives.
- the Threat Image Projection system may provide useful image data, but at great cost and in a time-inefficient manner.
- security personnel may manually place objects into a piece of baggage, and may use a scanner to capture images of those objects. The security personnel may then present the captured images to security personnel, such as screeners, as part of the Threat Image Projection System.
- the process of manually placing objects into baggage, scanning the baggage, and capturing image data is laborious, slow, and expensive.
- the process is expensive in part because (1) only personnel with a certain security clearance may perform the image capturing process, (2) a scanner has to be rented during the capturing process, and (3) each captured image must be manually labeled with a description of that object.
- the techniques of this disclosure are directed to techniques for generating images of objects that have the appearance of being captured by a scanner such that the generated images can be presented at a scanner's display as part of the Threat Image Projection System described above. Additionally, the generated images may be input to a neural network as part of a set of training data that may be used to train the neural network to classify various objects based on images captured by a scanner.
- this disclosure describes techniques that may programmatically generate images of objects having the appearance of being scanned by any of various detection devices. Programmatically generating images of objects may bypass the need for manually-intensive scanning of objects to generate captured images of the objects, and labeling the captured images.
- the techniques of this disclosure may allow scalable generation of images in such volume and variety that the generated images may be able to train deep learning and other machine learning algorithms to recognize various objects.
- the programmatic image generation techniques of this disclosure may generate image data with a lower latency than previously-employed techniques, i.e. the time necessary to programmatically generate image data may be less than previously-employed techniques for generating image data. Reducing the image generating latency may enable a system as configured herein to rapidly generate simulated image data for objects constituting new threats that appear, e.g. a new 3D printed gun or explosive caught at a checkpoint.
- the techniques of this disclosure may provide various other benefits as well.
- an “object” as used herein is used to broadly describe any material entity. Examples of an “object,” for illustration purposes might include a bag or luggage, purse or backpack, briefcase, box, container or cargo container, wallet, watch, laptop, tablet computer, mobile phone, stroller or wheelchair, a person, and/or any combination thereof. It is also understood that an “object” as used herein may refer to describe any material entity that can be part of, or within, another object. As an example, a shoe, gun, or laptop may all be items located within an object, such as a piece of luggage.
- a computer system configured in accordance with this disclosure may comprise a database of 3D representations of various objects for which generating synthetic representations of these objects may be desirable.
- synthetic data may refer to a visual representation, such as an image, of a given object that comprises a combination of a combination of (1) “real” (e.g. captured) image data obtained from a device, and (2) a simulated image, which may be generated based on non-captured image data, e.g. a simulated image derived from an object model such as a 3D object model.
- a computer system as described herein may generate two-dimensional (2D) simulated images of an object based on 3D representations of those objects.
- the computer system may generate such simulated images of the objects, and may combine a simulated image of a given object with a background image (e.g. of a piece of baggage) to form a synthetic image.
- the computer system may generate synthesized images that appear as though the images were generated by a detection device, such as an MM scanner, CT scanner, millimeter wave scanner, X-ray machine, or any other type of scanning system developed in the future.
- a synthesized image appears as though it has been generated by a detection device despite the object of the simulated image never having been captured by such a detection device.
- a synthetic image which contains a simulated image of an object generated may approximate a captured image with high enough accuracy to be used for various purposes as described herein.
- a representation of an object may comprise a 3D representation such as a polygonal model or a set of 2D slices that combine to form a digital representation of the object.
- the computer system may store the representations in a datastore, such as a database, which may be referred to as an object database.
- a projection of the computer system unit may access a representation of a given object from the database, and may generate one or more 2D images, referred to as projections, of the given object based on the accessed representation.
- the projection unit may determine a position for a virtual camera at a given point in a 3D space centered around the representation of the given object.
- the projection unit may use a projection technique to convert the 3D representation to a 2D at the given point.
- the projection unit may perform various image processing techniques on the generated 2D representation to generate a projection.
- One such type of image processing technique that the projection may perform is edgemapping.
- Edgemapping is a process that takes an inputted image of an object that contains detail such as color and texture, and outputs an edgemapped image consisting solely of edges of the that object and lacking the detail such as texture, color and various other details.
- the projection unit may output the projection along with any associated metadata to a datastore such as a database, which may be referred to as a projection database.
- the database may store projection images along with any associated metadata, which may contain information such as the name or type of the object which is represented by the projection.
- Other projections can be made from 3-dimensional data through one of many techniques that can reduce the higher-dimensional data to a 2-dimensional spatial model.
- a “slice” of the 3D model can be taken to generate a 2-dimensional plane within the 3D model.
- multiple slices may be combined to generate a 2D model that represents both an item's external shape and the content within.
- a 2D representation of a retractable utility knife could contain the shape of the razor blade within superimposed on the projection of the casing.
- an object representation from which a projection is formed may comprise purely 2D forms such as one or more a photos or other images of an object.
- the object representation may take various other forms as well.
- Such a projection is not limited to two spatial dimensions, but can also contain additional data that may take various forms.
- per-pixel data may include spectral band data and/or material characteristics (z-effective value, atomic number, density).
- the additional data may take various other forms as well.
- a simulation unit After storing the projection image (also referred to as the “target” image) in the projection database, a simulation unit inputs projection image into a generative adversarial network (GAN) to generate a simulated output image.
- the GAN is a type of neural network that generates a simulated output image based on a type of input image.
- the GAN may generate an image representation of the given object, referred to as a simulated image.
- the simulated image may simulate the appearance of the object if the object had been scanned by a given detection device. More particularly, the GAN of the simulation unit adds color and texture to the edgemapped projection image to generate the simulated image of the object having the appearance of being scanned by the given detection device.
- the synthesis module outputs a simulated to an overlay generator after the simulated image is generated.
- the simulated image may be generated based on the application of various variations to the projection image.
- variations may take various forms such as changing rotation, changing lighting, and/or obscuring part of the projection image.
- the variations to the simulated image may take various other forms as well.
- the overlay generator inputs a simulated scanned image from the overlay generator and a real image (also referred to as a “background image”) captured by a detection device, and combines the real image and the synthetic image of the object to form a combined image that includes the background image and the synthetic image of the given object.
- the overlay generator may store the combined image to a synthetic image database.
- the background image may comprise a background image of a bag that was previously scanned.
- the background image may itself be simulated in part or in whole.
- the simulation of the entire background image may encompass generating a 3-dimensional model of each item inside of an object and manipulating it in a manner similar to the manipulation of the target object.
- Further transformations to the synthetic image may be applied in the context of the background image, including manipulating the synthetic image to better match the background image, overlapping the synthetic image with certain objects in the real image, etc.
- the method may comprise: generating a 2D projection from a 3D representation of an object, generating, based on the 2D projection, a simulated image of the object, wherein the simulated image appears as though the object has been scanned by a detection device, combining the simulated image of the object with a background image to form a synthesized image, wherein the background image was captured by a detection device, and outputting the synthesized image.
- Another aspect of this disclosure involves a system comprising a memory and at least one processor, the at least one processor to: generate a 2D projection from a 3D representation of an object, generate, based on the 2D projection, a simulated image of the object, wherein the simulated image appears as though the object has been scanned by a detection device, combine the simulated object with a background image to form a synthesized image, wherein the background image was captured by a detection device, and output the synthesized image.
- Yet another aspect of this disclosure involves a non-transitory computer-readable storage medium comprising instructions stored thereon that, when executed, cause at least one processor to: generate a 2D projection from a 3D representation of an object, generate, based on the 2D projection, a simulated image of the object, wherein the simulated image appears as though the object has been scanned by a detection device, combine the simulated object with a background image to form a synthesized image, wherein the background image was captured by a detection device, and output the synthesized image.
- FIG. 1 is a conceptual diagram illustrating an example computer system for generating synthetic image data of objects.
- FIG. 1 illustrates a computer system 100 .
- Computer system 100 may comprise an object database 102 , a projection unit 104 , a simulation unit 106 , a synthesis unit 108 , a projection database 110 , a background image database 112 , and a synthetic image database 114 .
- object database 102 may comprise a database that stores visual representations of various objects.
- Object database 102 may take a query as input, may process the query, and may output an object representation based on the inputted query. More particularly, object database 102 may be indexed using a key identifier for a given object within 102 .
- object database 102 may be queried using a query language such as JSON, SQL or various other query languages.
- Object database 102 may be queried in various other manners as well.
- object database 102 may comprise a data store other than a database such as a flat file. Object database 102 may take various other forms as well.
- Object database 102 may store a visual representation of an object in various formats.
- a visual representation may comprise a 3D model, a set of 2D slices of an object or various other forms as well.
- 3D modeling formats may comprise: OBJ, STL, DFX, and Blender formats, as some non-limiting examples.
- An object may be represented in various formats as well.
- a representation of a given object may be comprised of sets of connected vertices of various polygons. Each vertex may be defined by a set of coordinates such as Cartesian coordinates, polar coordinates, spherical coordinates, or the like. These sets of connected vertices may be combined to form higher-level surfaces, which contain additional detail.
- a given object representation may have associated texture data. The texture data defines color information (e.g. in pixel format), and/or transparency for a given set of polygons of the visual representation.
- a given object representation may be represented as a series of voxels or other multi-dimensional graphical elements.
- an object representation may be formed from 3-dimensional data through one of many techniques that can reduce the higher-dimensional data to a 2-dimensional spatial model.
- a “slice” of the 3D model can be taken to generate a 2-dimensional plane within the 3D model.
- multiple slices may be combined to generate a 2D model that represents both an item's external shape and the content within.
- a 2D representation of a retractable utility knife could contain the shape of the razor blade within superimposed on the projection of the casing.
- an object representation from which a projection is formed may comprise purely 2D forms such as one or more a photos or other images of an object.
- the object representation may take various other forms as well.
- object representation data may be obtained from a government agency, such as the TSA or from a manufacturer, such as a laptop manufacturer.
- object representation data may be obtained from scanning devices, such as 3D scanners.
- the object representations may be obtained from publicly available sources such as Internet sources, e.g. by using a crawling program or various other techniques. The object representations may be obtained using various other techniques as well.
- Projection unit 104 may comprise software, hardware, firmware, an FPGA (field-programmable gate array), CPU, GPU, ASIC, or any combination thereof that may be configured to generate a 2D projection image based on an object representation. To generate a 2D projection image of a given object, projection unit 104 load a given object representation from object database 102 and may generate a 3D space containing the representation of the given object.
- FPGA field-programmable gate array
- projection unit 104 may generate a representation of a given object in a 3D space through the perspective a virtual camera.
- Projection unit 104 may allow control over the location of the camera along various degrees of freedom, e.g. yaw, pitch, translation, etc.
- Projection unit 104 may also generate and apply various effects such as lighting effects, which may include positioning of lighting sources within the 3D space containing the representation of the object.
- projection unit 104 may use various software rendering libraries to generate the 3D space, to position the virtual camera.
- the python-stl library is one such example library.
- Projection unit 104 may use various other libraries as well. These rendering libraries may also enable various other interactions with, and application of effects to an object representation as well.
- projection unit 104 Responsive to receiving an inputted object representation, projection unit 104 generates a 3D coordinate space containing the object representation. Projection unit 104 then determines a centroid of the object representation.
- the centroid as described herein may be defined as a point around which the 3D coordinate space for a given object representation is centered, as a fixed center point around which one or more camera viewpoints are positioned, or as a center point of a given object representation.
- a centroid may take various other forms as well.
- projection unit 104 may circumscribe a sphere around the object representation, and may define the centroid as the center of the sphere.
- projection unit 104 may define the centroid by: (1) finding a longest span of a given object representation, wherein the span comprises a line segment (2) determining the center of the line segment, (3) defining the center of the line segment as the centroid of the given object.
- Projection unit 104 may define the positions of the centroid and the camera viewpoint as respective sets of coordinates in a 3D coordinate space, such as a spherical coordinate space, as one example.
- a spherical coordinate space may be such a coordinate space in one example.
- Such a spherical coordinate space is defined by the values: (r, ⁇ , ⁇ ), where r is a radial distance, ⁇ is a polar angle, and ⁇ is a zenith angle.
- the camera viewpoints may be defined as a set of points at a uniform distance around the centroid of the given object representation, wherein each point of the set of points is defined by a set of spherical coordinates.
- the camera viewpoints may be defined at a minimum distance (i.e. a minimum radius) away from the centroid in the 3D space.
- the minimum distance of the camera viewpoints away from the centroid may be defined as half the length of the segment defined by the longest span of the object representation.
- the camera viewpoints may be defined at various other distances and in various other manners as well.
- projection unit 104 may determine a 2D projection of the object representation. Responsive to determining the centroid's coordinates, projection unit 104 may convert the spherical coordinates of the camera viewpoint to Cartesian coordinates. To convert the viewpoint's spherical coordinates to Cartesian coordinates, projection unit 104 may utilize the following conversion equations to generate Cartesian coordinates denoted as x, y, and z:
- projection unit 104 may utilize various optimization techniques to speed the computation of the conversion between spherical and Cartesian coordinates. Examples of such optimizations may involve vector processing, transformation matrices, lookup tables or various other optimization techniques.
- projection unit 104 may convert the three-dimensional Cartesian coordinates to a set of projected 2D Cartesian (i.e. x, y) coordinates. To convert the three-dimensional coordinates to a set of 2D Cartesian coordinates, projection unit 104 may apply a projective equation at each 3D coordinate of the object. In an implementation, projection unit 104 may use a perspective projection matrix that performs the projection equations on a matrix representation of a 3D object.
- projection unit 104 may convert the 2D Cartesian coordinates to a set of pixel coordinates represented by a pair of variables u and v, where u is a downward horizontal distance relative to an origin in a top-left corner of an image, and v is a vertical distance from the origin, wherein u and v are restricted to the set of positive real integers.
- projection unit 104 may shift the origin of and apply an affine transformation (e.g. a translation, etc.) to go from x,y coordinates to u,v coordinates.
- an affine transformation e.g. a translation, etc.
- the result of projecting the camera viewpoint to the pixel coordinate space is a 2D pixel representation of the given object at the given camera viewpoint.
- projection unit 104 may determine a set of camera viewpoints, and may iteratively generate a set of 2D projections for each viewpoint in the set of viewpoints.
- Projection unit 104 may determine the spherical coordinates for each of the different camera viewpoints from a predefined list, which contains spherical coordinates for each of the camera viewpoints.
- the list of different viewpoints coordinates may be the same for every object.
- the camera viewpoints may be different for each object.
- the list of camera viewpoints may differ such that the camera viewpoints capture certain areas of interest of various objects, as one example.
- the camera viewpoints may be generated such that certain components of the handgun, e.g. the stock, barrel, etc., are visible in 2D projections generated from at least some of the camera viewpoints.
- the list of spherical coordinates may comprise a set of viewpoints that include certain angles, e.g. non-standard angles of a given object such as a down-the-barrel-viewpoint of a gun or along a narrow dimension of a laptop.
- the list of camera viewpoint coordinates may be defined and may differ in various other manners as well.
- projection unit 104 may iteratively: (1) select a given viewpoint from the list, (2) convert the coordinates of the given viewpoint from a spherical coordinate system to a Cartesian 3D coordinate system, and (3) generate a 2D protection of the object representation at that viewpoint. Projection unit 104 may continue to generate 2D viewpoints until projection unit 104 has generated a 2D image of the object at point defined by each set of spherical coordinates in the list.
- the list of camera viewpoints may comprise n sets of spherical coordinates, where n is any number, and each set of spherical viewpoints corresponds to a camera viewpoint. In an implementation, n may be 256 entries.
- the list of camera viewpoints may comprise various other numbers of camera viewpoints as well.
- FIG. 2 an example conceptual diagram of projecting an object from a 3D representation to a 2D representation is illustrated.
- a 3D model 202 of an object is illustrated.
- 3D model 202 is positioned within a spherical coordinate space.
- the spherical coordinate space is represented by the tuple of numbers: (r, ⁇ , ⁇ ), where r is a distance from the origin, ⁇ is the polar angle, and ⁇ is an azimuth angle.
- the center of the polar coordinate space (e.g. corresponding to coordinates (0,0,0) may be a centroid of 3D model 202 in various implementations.
- Camera viewpoint 204 is positioned at some point (r, ⁇ , ⁇ ).
- projection unit 204 may convert the camera viewpoint spherical coordinates to 3D cartesian coordinates (x, y, z), and may perform projection to convert the an image of 3D model 202 viewed from camera viewpoint 204 to a set of projected 2D (x,y) coordinates, and may further convert the set of 2D Cartesian coordinates to a set of 2D pixel coordinates (u,v). Additional details regarding the process of projecting a 3D image to a 2D image will now be described with respect to FIG. 3 .
- FIG. 3 a conceptual diagram 300 illustrating a projection of an object from a 3D representation to a 2D representation.
- a 3D model 302 is illustrated.
- 3D model 302 consists of a set of points, one of which is point 304 , which is located at a point (x,y,z) located in a 3D cartesian space.
- a projection unit may attempt to project point 304 onto some 2D point 306 of a 2D surface 308 .
- 2D surface 308 may be located at some distance r from 3D point 304 , and distance z-r from an origin 310 of the 3D space.
- projection unit 104 may use the following equations to project point 304 to point 306 , i.e. to determine the x-coordinates on 2D surface 308 :
- projection unit 104 may use an edgemapping technique to generate an edgemapped image of the given object based on the 2D image projection.
- Projection unit 104 may perform the edgemapping in various manners.
- projection unit 104 may apply a Sobel filter to a given 2D image projection to generate an edgemapped image of the given object.
- Projection unit 104 may use various other edge detection techniques (e.g. algorithms) to generate an edgemapped image as well.
- Projection unit 104 may also generate associated metadata for each given projection image.
- metadata may comprise a name or class of an object represented in the projection or edgemapped projection as an example.
- the metadata may comprise a dictionary comprising a set of attributes.
- the metadata may take various other forms as well.
- projection unit 104 may output the edgemapped projection image.
- projection unit 104 may output each edgemapped image, and associated metadata to a projection database 110 .
- Projection database 110 may comprise a database or datastore that may store projection images (e.g. edgemapped projection images) and any associated metadata.
- projection database 110 may be queried using various query languages and indexed using a key identifier.
- projection database 110 may be queried using a query language such as JSON, SQL or various other query languages.
- projection database 110 may output the corresponding edegemapped projection image and any associated metadata.
- Projection database 110 may comprise various other data stores, such as flat files, as well.
- Simulation unit 106 may generate a simulated image based on an inputted edgemapped projection image.
- Simulation unit 106 may access projection database 110 to obtain a given edgemapped projection image, and may input the given edgemapped projection image into a neural network to generate a simulated image of a given object that may appear as though the given object has passed through a given detection device.
- a synthesized image of an object may appear as though the object had passed through and had been scanned by a detection device, such as an X-ray scanner, CT scanner, or the like.
- simulation unit 106 may receive an edgemapped projection image of a knife as input. Simulation unit 106 may apply various operations on the edgemapped projection image to generate a synthetic image of the knife having the appearance of being scanned by a detection device, such as a multi-spectral X-ray scanner.
- a detection device such as a multi-spectral X-ray scanner.
- simulation unit 106 may comprise a set of neural networks that may generate a synthesized image from an input image, e.g. an edgemapped projection image. Responsive to receiving an input edgemapped projection image of a given object, simulation unit 106 may determine an object associated with the edgemapped projection image, e.g. based on the edgemapped projection image's associated metadata. Based on the determined object, simulation unit 106 may select a given neural network to generate a synthetic image of the given object.
- simulation unit 106 may receive an edgemapped image of an object, which may be categorized as a “sharps” object. Such sharps object may comprise knives or other objects that have sharp edges. Based on the determined sharps object, simulation unit 106 may select a neural network to use to generate a simulated image of the sharps object.
- simulation unit 106 may select the neural network that is likely to generate the most accurate simulated image for the given input image. For instance, for an input edgemapped projection image of a sharps object, simulation unit 106 may select a neural network that may be configured to generate synthesized images of sharps objects, knives, of objects having blades, etc.
- simulation unit 106 may resize (e.g. downsample) the inputted edgemapped image to dimensions 256 ⁇ 256 ⁇ 1, where 256 ⁇ 256 are the length and width respectively, and where “1” is the number of color channels. After resizing the edgemapped image, simulation unit 106 may output the downsampled edgemapped image into the selected neural network.
- the selected neural network comprises a series of layers.
- the selected neural network inputs the downsampled edgemapped image the series of layers, each of which apply mathematical operations to the output of the previous layer, and finally outputs a simulated image of the given object.
- the selected neural network may comprise a Generative Adversarial Network (GAN).
- GAN Generative Adversarial Network
- FIG. 4 a system 400 is illustrated comprising an example generative adversarial network.
- System 400 comprises a generator 402 , and a discriminator 404 .
- the GAN of FIG. 4 is a neural network comprised of two neural networks: (1) a discriminator neural network 404 , and (2) a generator neural network 404 .
- Discriminator 404 is a neural network that attempts to determine whether images that are input to the discriminator neural network are “real” or “fake.” Discriminator 404 is trained based on pairs of training data images that are specified as either “real” (real image pairs 408 ) or “fake” (fake image pairs 408 ). Discriminator 404 generates a determination of whether the discriminator neural network thinks the pair of images is real or fake (real/fake determination 406 ). Based on each pair image, discriminator 404 “learns” features that distinguish real and fake images.
- discriminator 404 may update the discriminator neural network's parameters to minimize an error function.
- the error function may be based on whether the discriminator neural network correctly identifies a given image as being real or fake.
- the discriminator neural network may update the discriminator neural network's parameters using techniques such as gradient ascent/descent, which is an iterative optimization technique.
- the discriminator neural network may utilize various other error minimization techniques as well.
- Generator 402 comprises a network that attempts to generate images that the discriminator believes are real. At a high level, generator 402 inputs an image, and attempts to generate a simulated output image that resembles a given class of output with high enough accuracy to fool the discriminator into thinking that the simulated output image is real when in fact it was generated artificially, i.e. the output image is not a captured image.
- Generator 402 and discriminator 404 are trained in an adversarial and alternating fashion.
- the discriminator inputs training images comprising real or fake pairs of edgemapped images of a given object, and based on whether or not discriminator neural network 404 accurately predicts whether the given pair of images is real or fake, updates the parameters of discriminator 404 .
- discriminator 404 uses backpropagation to convey gradient information to generator 402 so that generator 402 can use the gradient information to update its parameters to be better able to fool discriminator 404 .
- Generator 402 takes in an input image and adds in some random noise, and generates a fake simulated output image.
- the random noise may comprise Gaussian noise, referred to as “z” so that generator 402 generates slightly different output regardless of whether input is the same, i.e. to ensure generator 402 operates in a non-deterministic manner.
- other techniques may be used to ensure that generator 402 operates in a non-deterministic fashion (i.e. to ensure stochasticity”) on than conditioning generator 402 on Gaussian noise z.
- a dropout layer can be used, which randomly selects nodes to be perturbed within generator 402 's neural networks.
- generator 402 updates the parameters of its neural network to generate simulated output that more closely approximates real output, thereby fooling the discriminator more frequently.
- Generator 402 and discriminator 404 may be trained in an alternating fashion in which the discriminator attempts to distinguish an image and backpropagates gradient information, and then the discriminator generates a new fake image that attempts to fool the discriminator. This alternating training process may continue until the generator generates images that fool the discriminator with a given frequency. Once generator 402 and discriminator 404 have been sufficiently trained, the generator can be used to generate simulated images based on inputted images, and the discriminator is no longer issued.
- a GAN comprises two convolutional neural network models, a discriminator and a generator, which are trained on opposing objective functions.
- the discriminator is given paired images, referred to as X and Y or Y′.
- X can comprise an edgemapped image
- Y may comprise a corresponding scanned image (such as an X-ray) image of that edgemap
- Y′ may comprise an image generated by some generator G.
- D the discriminator (referred to as “D”) is trained to discriminate between real image data (referred to as Y) and synthesized image data (referred to as Y′).
- the generator G is trained to synthesize data (Y′) given an input image X and some gaussian noise z.
- Gaussian noise is may be added so that G (which can be thought of as a function that maps (X,z) ⁇ Y′) is non-deterministic. That is, the mapping will be conditioned on random noise z so it produces it produces a slightly different Y′ every time. Producing slightly different output images may be useful because there may be multiple ways to synthesize data while still preserving certain semantics and because generating varied output images may be desirable for various reasons. It should also be noted that there may be other ways to ensure that G is non-deterministic (to ‘ensure stochasticity’) other than conditioning on a Gaussian noise z. For example, noise can be introduced noise in a dropout layer, in which nodes are randomly selected to be perturbed within the neural network of the generator, G.
- the objective of generator G is to fool the discriminator D, by trying to synthesize image data that is as realistic as possible to ensure the discriminator does not do well in satisfying its objective.
- an L1 loss (least absolute deviations) may be added to the generator to ensure that L1 distance of the generator's synthesized output to the ground truth output is minimized.
- the objective of the discriminator, D is to minimize the log-loss of the mistakes made in differentiating real and synthetic image data.
- both the generator and discriminator get progressively better. After sufficient training, the generator is able to generate realistic simulated images conditioned on an input.
- ⁇ D and ⁇ G are the weights of the discriminator and generator neural network, respectively.
- the discriminator may be trained or may “learn” based on the following gradient:
- the generator may be trained or may learn based on the following gradient:
- the optimal generator denoted as G* can be determined according to the following equation:
- G * arg ⁇ ⁇ min G ⁇ ⁇ max D ⁇ ⁇ [ log ⁇ ⁇ D ⁇ ( Y ) + log ⁇ ( 1 - D ⁇ ( G ⁇ ( X , Z ) ) ) ] ⁇ ⁇ .
- System 400 may attempt to determine the optimal generator G* using gradient and/or descent, as some examples.
- the neural network of generator 402 may comprise an encoder-decoder architecture.
- An encoder-decoder architecture is a neural network comprising encoding layers and decoding layers. Encoding layers attempt to reduce the dimensionality of an input image, thereby generating a lower-dimensional representation of the input image. Decoding layers perform an inverse process, and attempt to construct a higher-dimensional representation that maps the lower-dimensional representation generated by the encoder back to an image-dimensional representation having sufficient dimension to be displayed as an image.
- the encoding layers comprise a series of convolutional layers and pooling layer that perform convolution and downsampling, respectively.
- an encoder may reduce the dimensionality of an input image, thereby generating a lower-dimensional feature map of the input image.
- a decoder may take the lower-dimensional representation of an image generated by an encoder, and may perform convolution and upsampling to generate a higher-dimensional representation of the image.
- a given convolutional layer may receive a set of neighboring input values (e.g. a feature map or a set of pixels) for processing, may apply a set of matrices referred to as “kernels” to the set of input values to generate a representation of the features identified form that set of input values, referred to as a feature map.
- Each convolutional layer may have a different associated set of kernels.
- a convolutional layer performs a technique referred to as convolution, which takes a set of neighboring input values, which may comprise neighboring pixels or neighboring values of a feature map, and expresses a given value from the set as a weighted sum of the value and its neighboring values in which the weights for each input value are defined by the elements of the kernel matrices.
- the output of a convolutional layer is referred to as a “feature map” because the output contains information about features detected by the convolutional layer.
- a pooling layer may input a set of neighboring values, and selectively downsample the input values, e.g. pixels or values of a feature map. More particularly, the pooling layer may determine a set of regions and may apply a pooling function, each of the regions, and may apply a function, such as a max-pool function, to each region.
- the max-pool function may identify a maximum value from a given region, retain the maximum value, and may discard all other values in the region.
- a pooling layer may apply various other functions to input values as well.
- FIG. 5 a conceptual diagram of an encoder-decoder architecture is illustrated.
- the example of FIG. 5 illustrates an encoder-decoder architecture for a given class of object, e.g. a “sharps” object.
- the encoder-decoder network 500 illustrated in FIG. 5 may comprise a neural network of a generator, e.g. generator 402 of FIG. 4 as an example.
- network 500 takes an edgemapped image 506 as input, passes the input image 506 through a series of encoder layers 502 A-N, and a set of decoder layers 504 A- 504 N to generate output simulated image 508 .
- input image 506 is an edgemapped image of a knife
- output image 506 is a simulated image of that knife which appears as though it has passed through a detection device, such as an X-ray scanner.
- encoder layers 502 A- 502 N may perform convolution and/or pooling operations.
- Each encoder layer which may be comprised of multiple kernels (e.g. convolution, dropout, upsampling, etc.), may output a feature map to a subsequent convolution layer.
- each encoder layer may output a feature map to a corresponding decoder layer, which is a technique referred to as a “skip layer” due to the fact that the output from the encoder layer skips intermediary encoder and decoder layers.
- convolution layer 502 A outputs to both encoder layer 502 B and to decoder layer 504 B (an example of such a skip layer).
- Each encoder layer generates feature map until encoder layer 502 N generates a feature map, which is outputted to decoder layer 504 N.
- Each of the decoder layers 504 A- 504 N perform generally reciprocal operations relative to the encoder layers. Such reciprocal operations may comprise deconvolution, upsampling, and various other operations as well.
- the decoder layers output in reverse order relative to the encoder layers, e.g. decoder layer 504 B outputs to decoder layer 504 A, etc.
- the decoder layers may also receive input from corresponding encoder layers as part of a skipping process.
- neural networks may also have higher-level hyperparameters, which may define the operation of these neural networks.
- hyperparameters may define learning rate, numbers of layers, numbers of feature maps, etc.
- the hyperparameters for an encoder-decoder architecture may be defined in various manners.
- the architecture hyperparameters for an encoder as described in this disclosure may be: 64-128-256-512-512-512-512, where 64 is a number of features maps for a first encoder layer, and 512 is a number of feature maps for the last-in-sequence encoder layer.
- the architecture hyperparameters for the decoder may be 512-1024-1024-1024-1024-512-256-128, where 512 is a number of feature maps for the first-in-sequence decoder layer, and 128 is a number of feature maps for a last-in-sequence decoder layer.
- the hyperparameters may be defined using various other values as well.
- background image database 112 comprises a database of images scanned by one or more detection devices.
- Detection devices may comprise one or more detection devices that capture images of objects such as baggage items, clothing, human beings, and the like.
- Example detection devices may comprise x-ray scanners, MRI scanners, CT scanners, spectral band scanners, millimeter wave scanners, or any other scanning device now or later devised. Other manners of detection devices are also possible.
- the images captured by detection devices may represent the captured data using various representations.
- the background images may be represented using pixels, voxels, polygons, or elements that may generally be used to construct image data.
- the background images stored in background image database 112 captured by detection devices may be captured in a DICOS (“Digital Imaging and Communication for Security” standard published by the National Electrical Manufacturers Association) format.
- the background images may contain metadata, which may comprise information related to material density, geometric dimensions, and/or atomic numbers of various regions or graphical elements in a background image, as some non-limiting examples. It should be understood that image data may be captured in other formats and the metadata may take other various forms as well.
- the background images stored in background image database 112 may comprise a background image of a bag that was previously scanned.
- the background image may itself be simulated in part or in whole.
- the simulation of an entire background image may encompass generating a 3-dimensional model of each item inside of an object and manipulating it in a manner similar to the manipulation of the target object.
- Synthesis unit 108 may comprise software, hardware (e.g. CPUs or GPUs), firmware, or any combination thereof that may generate a synthesized image that combines an inputted simulated image and an inputted scanned image. Synthesis unit 108 may obtain the scanned image from background image database 112 , and the scanned image from simulation unit 106 .
- synthesis unit 108 may combine the pixel data of the simulated image with the pixel data of the background image to generate a combined, synthetic image. More particularly, synthesis unit first selects a location of the background image at which to insert the simulated image. Synthesis unit 108 may select this location in various manners.
- synthesis unit 108 may combine the simulated image and the background image to generate the synthetic image. Once combined, synthesis unit may add further variation to the combined image
- synthesis unit 108 may apply various variations to the combined image.
- the variations may take various forms. As examples, such variations may comprise changes in: intensity, obscuration, noise, magnification, rotation, and Z-effective encoding.
- synthesis unit 108 may select a bound based on the variation parameters, and may randomly sample a parameter from this bound.
- the parameter bounds for rotation may be in a range of 1 to 360 degrees, and an angle may be randomly sampled from this bound.
- synthesis unit 108 may learn the bounds of variation conditioned on a relevant image and the variation type, and may then randomly sample from the predicted bounds.
- the location of the background image at which the simulated image is injected may also be abstracted as a variation of type ‘translation’.
- bounds of the translation variation may be predefined and then randomly sampled from those bounds or the bounds can be predicted from a machine learning model and then randomly sampled from those bounds.
- synthesis unit 108 may apply variations to the combined image that may comprise overlaying certain elements of the background image over the simulated image, and adapting the z-effective and density of the simulated image to match the background image. For example, if clothes or other low-density organic clutter are in the background of a background image such small “noise” may be overlaid onto the simulated image. If the background image contains high-density objects so as to make the background appear “black” to the user, then the part of the background image that is over said “black” portion may take on the same background due to the compounding nature of density on the ultimate x-ray penetrative ability at that point in space.
- synthesis unit 108 may attempt to make the applied variations appear more realistic by parameterizing the variations.
- synthesis unit 108 may learn parameters of these variation using an adversarial framework.
- a generator model may learn parameters of the variation and a discriminator model may learn to distinguish between images having an injected image of a simulated object and images synthesized object injected.
- Synthesis unit 108 may apply various other variations and in various other manners as well.
- Synthetic image database 114 may comprise a database that stores synthetic images.
- Synthetic image database 114 may take a query as input, may process the query, and may output a synthetic image and any associated metadata for the outputted synthetic image on the inputted query. More particularly, synthetic image database 114 may be indexed using a key identifier for a given image type or object type. In various examples, synthetic image database 114 may be queried using a query language such as JSON,
- Synthetic image database 114 may be queried in various other manners as well.
- synthetic image database 114 may comprise a data store other than a database such as a flat file.
- Synthetic image database 114 may take various other forms as well.
- FIG. 6 is a flow diagram illustrating an example method for generating synthetic images in accordance with techniques of this disclosure.
- FIG. 6 illustrates a method 600 .
- Method 600 may be implemented as a method which may be executed by at least one processor,
- each block may represent a module or portion of program code that includes instructions that are executable by a processor to implement specific logical functions or steps in a process.
- the program code may be stored on any type of computer-readable medium, such as non-transitory computer-readable media.
- each block may represent circuitry that is wired to perform specific logical functions or steps in a process.
- the blocks shown in the flow diagrams may be rearranged into different orders, combined into fewer blocks, separated into additional blocks, and/or removed based upon the particular embodiment.
- method 600 of FIG. 6 may be executed by system 100 of FIG. 1 .
- System 100 may be implemented in hardware, software, microcode, firmware, on an application-specific integrated circuit (ASIC), read-only memory (ROM), field-programmable gate arrays (FPGAs) or any combination thereof.
- ASIC application-specific integrated circuit
- ROM read-only memory
- FPGAs field-programmable gate arrays
- the method of FIG. 6 may be implemented in various other forms as well.
- projection unit 102 may generate a 2D projection from a 3D representation of an object.
- simulation unit 106 may generate, based on the projection, a simulated image of the object, wherein the simulated image appears as though the object has been scanned by a detection device.
- synthesis unit 108 may combine the simulated object with a background image to form a synthesized image, wherein the background image was captured by a detection device, and at block 608 , overlay unit 108 may output the synthesized image.
- references herein to “embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one example embodiment of an invention.
- the appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
- the embodiments described herein, explicitly and implicitly understood by one skilled in the art can be combined with other embodiments.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Graphics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Mathematical Analysis (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Optimization (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Geometry (AREA)
- Algebra (AREA)
- High Energy & Nuclear Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Life Sciences & Earth Sciences (AREA)
- Geophysics (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
- This application claims priority to U.S. Provisional Application No. 62/547,626, filed on Aug. 18, 2017 the entirety of which is incorporated herein by reference.
- The disclosure is generally related to screening security systems for use in public or private applications and, more particularly, to methods, systems, devices, and other elements directed to screening an object.
- Today, the Transportation Security Administration (TSA) employs tens of thousands airport screeners. A screener's job is to check baggage for security threats prior to boarding a plane. To check whether a piece of baggage is a security threat, the baggage is run through a detection device, such as a scanner, and with the aid of the scanner, the screener flags suspicious pieces of baggage that appear to contain an object that is a security threat. If the baggage is flagged as suspicious, the screener searches to the contents of the piece of baggage by hand to determine whether an object that is a security threat is present in the piece of baggage.
- There are a number of issues with the approach of using the two-level of approach of (1) using a baggage scanner to flag suspicious pieces of baggage, and then (2) having a screener search the flagged pieces of baggage. One issue is that currently-utilized scanners may falsely flag many pieces of baggage as suspicious, i.e. resulting in many false positives. False positives, in turn, may cause security screeners to waste time inspecting the baggage incorrectly flagged as suspicious, which may in turn result in a significant waste of money. In addition to incorrectly falsely flagging many pieces of baggage as suspicious, the scanner may also fail to flag significant numbers of pieces of baggage as suspicious when those bags do, in fact, contain objects that are a security risk.
- Even more, a screener's job may be both difficult and monotonous. This difficulty and monotonousness may increase the chance that a piece of baggage that is a security threat gets through the screening process without being detected.
- Features, aspects, and advantages of the presently disclosed technology may be better understood with respect to the following description, appended claims, and accompanying drawings where:
-
FIG. 1 is a conceptual diagram illustrating an example computer system for generating synthetic data of objects. -
FIG. 2 is a conceptual diagram illustrating a projection of an object from a 3D representation to a 2D representation. -
FIG. 3 is another example of a conceptual diagram illustrating a projection of an object from a 3D representation to a 2D representation. -
FIG. 4 is a conceptual diagram illustrating an example operation of a generative adversarial network in accordance with implementations of this disclosure. -
FIG. 5 is a conceptual diagram illustrating an example encoder-decoder generator architecture. -
FIG. 6 is a flow diagram illustrating an example technique for generating synthetic image data. - The drawings are for illustrating example embodiments, and the inventions are not limited to the arrangements and instrumentality shown in the drawings.
- Synapse Technology Corporation (“Synapse”), a technology company currently based out of Palo Alto and the assignee of the present application, created a system that greatly modernizes the image intelligence industry with proprietary deep learning and computer vision technology. The system is described in U.S. provisional patent applications 62/532,865 and 62/532,821, both filed on Jul. 14, 2017. Synapse-created technology may be used in the airport security screening process to, among other things, increase accuracy significantly, while avoiding false positives and increasing the throughput of passengers at the checkpoint. The Synapse technology may be used in many other industries, such as those that relate to public and private security and defense.
- One technology area where for Synapse's technology may improve throughput relates to scanning of pieces of baggage, e.g. at an airport security checkpoint or another screening location. Currently, in effort to determine whether a screener is paying attention when operating a baggage scanner, a system at an airport, referred to as a Threat Image Projection System, may periodically generate “threat” images of objects that are a security threat on the screen of a baggage scanner. Based on whether the screener identifies the object represented by the threat image, management personnel may determine whether the screener is paying attention.
- Additionally, images of objects captured by a scanner may have an altered appearance relative to a standard camera image of those objects. For instance, an object that passes through an X-Ray scanner may have a green translucent appearance. This altered appearance of images generated by the scanner may make objects more difficult to identify or detect. The Threat Image Projection System may be utilized to better-familiarize a screener with the appearance of images of certain objects (e.g. objects that are a threat) captured by a scanner so that the screener may be more able to identify those objects. As an example, images of knives may be presented at a display of a baggage scanner to better familiarize screeners with the appearance of knives.
- The Threat Image Projection system may provide useful image data, but at great cost and in a time-inefficient manner. Currently, security personnel may manually place objects into a piece of baggage, and may use a scanner to capture images of those objects. The security personnel may then present the captured images to security personnel, such as screeners, as part of the Threat Image Projection System.
- The process of manually placing objects into baggage, scanning the baggage, and capturing image data is laborious, slow, and expensive. The process is expensive in part because (1) only personnel with a certain security clearance may perform the image capturing process, (2) a scanner has to be rented during the capturing process, and (3) each captured image must be manually labeled with a description of that object.
- It may be possible to classify objects from images of objects captured by a scanner using computer vision technology. However, computer vision technology is typically inaccurate unless it is trained with a sufficiently large volume and quality of training data. Moreover, there is very little, if non-existent, usable image data available from the airport security sector for objects of interest within baggage, such as images of baggage containing objects that are security threats. Similarly, image data from other security and defense industry sectors is often not available and hard to produce.
- The techniques of this disclosure are directed to techniques for generating images of objects that have the appearance of being captured by a scanner such that the generated images can be presented at a scanner's display as part of the Threat Image Projection System described above. Additionally, the generated images may be input to a neural network as part of a set of training data that may be used to train the neural network to classify various objects based on images captured by a scanner.
- More particularly, this disclosure describes techniques that may programmatically generate images of objects having the appearance of being scanned by any of various detection devices. Programmatically generating images of objects may bypass the need for manually-intensive scanning of objects to generate captured images of the objects, and labeling the captured images. In turn, the techniques of this disclosure may allow scalable generation of images in such volume and variety that the generated images may be able to train deep learning and other machine learning algorithms to recognize various objects.
- Additionally, the programmatic image generation techniques of this disclosure may generate image data with a lower latency than previously-employed techniques, i.e. the time necessary to programmatically generate image data may be less than previously-employed techniques for generating image data. Reducing the image generating latency may enable a system as configured herein to rapidly generate simulated image data for objects constituting new threats that appear, e.g. a new 3D printed gun or explosive caught at a checkpoint. The techniques of this disclosure may provide various other benefits as well.
- Various systems, devices, and methods disclosed herein significantly improve the generation of synthetic data representations of various objects. It is to be understood that an “object” as used herein is used to broadly describe any material entity. Examples of an “object,” for illustration purposes might include a bag or luggage, purse or backpack, briefcase, box, container or cargo container, wallet, watch, laptop, tablet computer, mobile phone, stroller or wheelchair, a person, and/or any combination thereof. It is also understood that an “object” as used herein may refer to describe any material entity that can be part of, or within, another object. As an example, a shoe, gun, or laptop may all be items located within an object, such as a piece of luggage.
- At a high-level, a computer system configured in accordance with this disclosure may comprise a database of 3D representations of various objects for which generating synthetic representations of these objects may be desirable. As described herein, “synthetic data” may refer to a visual representation, such as an image, of a given object that comprises a combination of a combination of (1) “real” (e.g. captured) image data obtained from a device, and (2) a simulated image, which may be generated based on non-captured image data, e.g. a simulated image derived from an object model such as a 3D object model.
- According to some aspects of this disclosure, a computer system as described herein may generate two-dimensional (2D) simulated images of an object based on 3D representations of those objects. The computer system may generate such simulated images of the objects, and may combine a simulated image of a given object with a background image (e.g. of a piece of baggage) to form a synthetic image. The computer system may generate synthesized images that appear as though the images were generated by a detection device, such as an MM scanner, CT scanner, millimeter wave scanner, X-ray machine, or any other type of scanning system developed in the future. A synthesized image appears as though it has been generated by a detection device despite the object of the simulated image never having been captured by such a detection device. A synthetic image which contains a simulated image of an object generated may approximate a captured image with high enough accuracy to be used for various purposes as described herein.
- Additional details regarding the design an operation of the computer system of this disclosure will now be described. The computer system of this disclosure stores representations of various objects in a data store. A representation of an object may comprise a 3D representation such as a polygonal model or a set of 2D slices that combine to form a digital representation of the object. The computer system may store the representations in a datastore, such as a database, which may be referred to as an object database.
- A projection of the computer system unit may access a representation of a given object from the database, and may generate one or more 2D images, referred to as projections, of the given object based on the accessed representation. To generate a given projection, the projection unit may determine a position for a virtual camera at a given point in a 3D space centered around the representation of the given object. The projection unit may use a projection technique to convert the 3D representation to a 2D at the given point. Responsive to generating the 2D representation, the projection unit may perform various image processing techniques on the generated 2D representation to generate a projection. One such type of image processing technique that the projection may perform is edgemapping. Edgemapping is a process that takes an inputted image of an object that contains detail such as color and texture, and outputs an edgemapped image consisting solely of edges of the that object and lacking the detail such as texture, color and various other details.
- After generating the given projection, the projection unit may output the projection along with any associated metadata to a datastore such as a database, which may be referred to as a projection database. The database may store projection images along with any associated metadata, which may contain information such as the name or type of the object which is represented by the projection.
- Other projections can be made from 3-dimensional data through one of many techniques that can reduce the higher-dimensional data to a 2-dimensional spatial model. In one instance, a “slice” of the 3D model can be taken to generate a 2-dimensional plane within the 3D model. In another instance, multiple slices may be combined to generate a 2D model that represents both an item's external shape and the content within. For example, a 2D representation of a retractable utility knife could contain the shape of the razor blade within superimposed on the projection of the casing.
- In another implementation, an object representation from which a projection is formed may comprise purely 2D forms such as one or more a photos or other images of an object. The object representation may take various other forms as well.
- Such a projection is not limited to two spatial dimensions, but can also contain additional data that may take various forms. As one example, such per-pixel data may include spectral band data and/or material characteristics (z-effective value, atomic number, density). The additional data may take various other forms as well.
- After storing the projection image (also referred to as the “target” image) in the projection database, a simulation unit inputs projection image into a generative adversarial network (GAN) to generate a simulated output image. The GAN is a type of neural network that generates a simulated output image based on a type of input image. In this disclosure, based on an input projection image of a given object, the GAN may generate an image representation of the given object, referred to as a simulated image. The simulated image may simulate the appearance of the object if the object had been scanned by a given detection device. More particularly, the GAN of the simulation unit adds color and texture to the edgemapped projection image to generate the simulated image of the object having the appearance of being scanned by the given detection device. The synthesis module outputs a simulated to an overlay generator after the simulated image is generated.
- In another embodiment of this invention, the simulated image may be generated based on the application of various variations to the projection image. Such variations may take various forms such as changing rotation, changing lighting, and/or obscuring part of the projection image. The variations to the simulated image may take various other forms as well.
- The overlay generator inputs a simulated scanned image from the overlay generator and a real image (also referred to as a “background image”) captured by a detection device, and combines the real image and the synthetic image of the object to form a combined image that includes the background image and the synthetic image of the given object. After the combined image has been generated, the overlay generator may store the combined image to a synthetic image database.
- In some implementations, the background image may comprise a background image of a bag that was previously scanned. In some implementations, the background image may itself be simulated in part or in whole. For example, the simulation of the entire background image may encompass generating a 3-dimensional model of each item inside of an object and manipulating it in a manner similar to the manipulation of the target object.
- Further transformations to the synthetic image may be applied in the context of the background image, including manipulating the synthetic image to better match the background image, overlapping the synthetic image with certain objects in the real image, etc.
- In one aspect of this disclosure involves a method. The method may comprise: generating a 2D projection from a 3D representation of an object, generating, based on the 2D projection, a simulated image of the object, wherein the simulated image appears as though the object has been scanned by a detection device, combining the simulated image of the object with a background image to form a synthesized image, wherein the background image was captured by a detection device, and outputting the synthesized image.
- Another aspect of this disclosure involves a system comprising a memory and at least one processor, the at least one processor to: generate a 2D projection from a 3D representation of an object, generate, based on the 2D projection, a simulated image of the object, wherein the simulated image appears as though the object has been scanned by a detection device, combine the simulated object with a background image to form a synthesized image, wherein the background image was captured by a detection device, and output the synthesized image.
- Yet another aspect of this disclosure involves a non-transitory computer-readable storage medium comprising instructions stored thereon that, when executed, cause at least one processor to: generate a 2D projection from a 3D representation of an object, generate, based on the 2D projection, a simulated image of the object, wherein the simulated image appears as though the object has been scanned by a detection device, combine the simulated object with a background image to form a synthesized image, wherein the background image was captured by a detection device, and output the synthesized image.
- Various other aspects of this disclosure may take various other forms as well.
-
FIG. 1 is a conceptual diagram illustrating an example computer system for generating synthetic image data of objects.FIG. 1 illustrates acomputer system 100.Computer system 100 may comprise anobject database 102, aprojection unit 104, asimulation unit 106, asynthesis unit 108, aprojection database 110, abackground image database 112, and asynthetic image database 114. - In an implementation,
object database 102 may comprise a database that stores visual representations of various objects. In an implementation,Object database 102 may take a query as input, may process the query, and may output an object representation based on the inputted query. More particularly,object database 102 may be indexed using a key identifier for a given object within 102. In various examples,object database 102 may be queried using a query language such as JSON, SQL or various other query languages.Object database 102 may be queried in various other manners as well. In another implementation,object database 102 may comprise a data store other than a database such as a flat file.Object database 102 may take various other forms as well. -
Object database 102 may store a visual representation of an object in various formats. As examples, a visual representation may comprise a 3D model, a set of 2D slices of an object or various other forms as well. Such 3D modeling formats may comprise: OBJ, STL, DFX, and Blender formats, as some non-limiting examples. An object may be represented in various formats as well. - In an implementation, a representation of a given object may be comprised of sets of connected vertices of various polygons. Each vertex may be defined by a set of coordinates such as Cartesian coordinates, polar coordinates, spherical coordinates, or the like. These sets of connected vertices may be combined to form higher-level surfaces, which contain additional detail. In some implementations, a given object representation may have associated texture data. The texture data defines color information (e.g. in pixel format), and/or transparency for a given set of polygons of the visual representation.
- In other implementations, a given object representation may be represented as a series of voxels or other multi-dimensional graphical elements.
- In still other implementations, an object representation may be formed from 3-dimensional data through one of many techniques that can reduce the higher-dimensional data to a 2-dimensional spatial model. In one instance, a “slice” of the 3D model can be taken to generate a 2-dimensional plane within the 3D model. In another instance, multiple slices may be combined to generate a 2D model that represents both an item's external shape and the content within. For example, a 2D representation of a retractable utility knife could contain the shape of the razor blade within superimposed on the projection of the casing.
- In yet another implementation, an object representation from which a projection is formed may comprise purely 2D forms such as one or more a photos or other images of an object. The object representation may take various other forms as well.
- The representations stored in
object database 102 may be obtained in various different manners. As one example, object representation data may be obtained from a government agency, such as the TSA or from a manufacturer, such as a laptop manufacturer. As another example, object representation data may be obtained from scanning devices, such as 3D scanners. As yet another example, the object representations may be obtained from publicly available sources such as Internet sources, e.g. by using a crawling program or various other techniques. The object representations may be obtained using various other techniques as well. -
Projection unit 104 may comprise software, hardware, firmware, an FPGA (field-programmable gate array), CPU, GPU, ASIC, or any combination thereof that may be configured to generate a 2D projection image based on an object representation. To generate a 2D projection image of a given object,projection unit 104 load a given object representation fromobject database 102 and may generate a 3D space containing the representation of the given object. - More particularly,
projection unit 104 may generate a representation of a given object in a 3D space through the perspective a virtual camera.Projection unit 104 may allow control over the location of the camera along various degrees of freedom, e.g. yaw, pitch, translation, etc.Projection unit 104 may also generate and apply various effects such as lighting effects, which may include positioning of lighting sources within the 3D space containing the representation of the object. In an implementation,projection unit 104 may use various software rendering libraries to generate the 3D space, to position the virtual camera. The python-stl library is one such example library.Projection unit 104 may use various other libraries as well. These rendering libraries may also enable various other interactions with, and application of effects to an object representation as well. - Responsive to receiving an inputted object representation,
projection unit 104 generates a 3D coordinate space containing the object representation.Projection unit 104 then determines a centroid of the object representation. The centroid as described herein may be defined as a point around which the 3D coordinate space for a given object representation is centered, as a fixed center point around which one or more camera viewpoints are positioned, or as a center point of a given object representation. A centroid may take various other forms as well. - In one example,
projection unit 104 may circumscribe a sphere around the object representation, and may define the centroid as the center of the sphere. As another example,projection unit 104 may define the centroid by: (1) finding a longest span of a given object representation, wherein the span comprises a line segment (2) determining the center of the line segment, (3) defining the center of the line segment as the centroid of the given object. -
Projection unit 104 may define the positions of the centroid and the camera viewpoint as respective sets of coordinates in a 3D coordinate space, such as a spherical coordinate space, as one example. A spherical coordinate space may be such a coordinate space in one example. Such a spherical coordinate space is defined by the values: (r,θ,φ), where r is a radial distance, θ is a polar angle, and φ is a zenith angle. - In one implementation, the camera viewpoints may be defined as a set of points at a uniform distance around the centroid of the given object representation, wherein each point of the set of points is defined by a set of spherical coordinates. In some examples, the camera viewpoints may be defined at a minimum distance (i.e. a minimum radius) away from the centroid in the 3D space. In these examples, the minimum distance of the camera viewpoints away from the centroid may be defined as half the length of the segment defined by the longest span of the object representation. The camera viewpoints may be defined at various other distances and in various other manners as well.
- Based on the coordinates of the camera viewpoint and the centroid,
projection unit 104 may determine a 2D projection of the object representation. Responsive to determining the centroid's coordinates,projection unit 104 may convert the spherical coordinates of the camera viewpoint to Cartesian coordinates. To convert the viewpoint's spherical coordinates to Cartesian coordinates,projection unit 104 may utilize the following conversion equations to generate Cartesian coordinates denoted as x, y, and z: -
x=r sin φ cos θ -
y=r sin φ cos θ -
z=r cos φ - In an
implementation projection unit 104 may utilize various optimization techniques to speed the computation of the conversion between spherical and Cartesian coordinates. Examples of such optimizations may involve vector processing, transformation matrices, lookup tables or various other optimization techniques. - After the viewpoint coordinates have been converted to three-dimensional Cartesian coordinates,
projection unit 104 may convert the three-dimensional Cartesian coordinates to a set of projected 2D Cartesian (i.e. x, y) coordinates. To convert the three-dimensional coordinates to a set of 2D Cartesian coordinates,projection unit 104 may apply a projective equation at each 3D coordinate of the object. In an implementation,projection unit 104 may use a perspective projection matrix that performs the projection equations on a matrix representation of a 3D object. - Responsive to generating the 2D Cartesian coordinates,
projection unit 104 may convert the 2D Cartesian coordinates to a set of pixel coordinates represented by a pair of variables u and v, where u is a downward horizontal distance relative to an origin in a top-left corner of an image, and v is a vertical distance from the origin, wherein u and v are restricted to the set of positive real integers. Generally,projection unit 104 may shift the origin of and apply an affine transformation (e.g. a translation, etc.) to go from x,y coordinates to u,v coordinates. - The result of projecting the camera viewpoint to the pixel coordinate space is a 2D pixel representation of the given object at the given camera viewpoint.
- In an implementation,
projection unit 104 may determine a set of camera viewpoints, and may iteratively generate a set of 2D projections for each viewpoint in the set of viewpoints.Projection unit 104 may determine the spherical coordinates for each of the different camera viewpoints from a predefined list, which contains spherical coordinates for each of the camera viewpoints. - In one implementation, the list of different viewpoints coordinates may be the same for every object. In another implementation, the camera viewpoints may be different for each object. As an example, the list of camera viewpoints may differ such that the camera viewpoints capture certain areas of interest of various objects, as one example. For instance, for a handgun object, the camera viewpoints may be generated such that certain components of the handgun, e.g. the stock, barrel, etc., are visible in 2D projections generated from at least some of the camera viewpoints. In another instance, the list of spherical coordinates may comprise a set of viewpoints that include certain angles, e.g. non-standard angles of a given object such as a down-the-barrel-viewpoint of a gun or along a narrow dimension of a laptop. The list of camera viewpoint coordinates may be defined and may differ in various other manners as well.
- In an implementation, responsive to selecting a list of camera viewpoints defined by corresponding sets of spherical coordinates,
projection unit 104 may iteratively: (1) select a given viewpoint from the list, (2) convert the coordinates of the given viewpoint from a spherical coordinate system to a Cartesian 3D coordinate system, and (3) generate a 2D protection of the object representation at that viewpoint.Projection unit 104 may continue to generate 2D viewpoints untilprojection unit 104 has generated a 2D image of the object at point defined by each set of spherical coordinates in the list. - In an implementation, the list of camera viewpoints may comprise n sets of spherical coordinates, where n is any number, and each set of spherical viewpoints corresponds to a camera viewpoint. In an implementation, n may be 256 entries. The list of camera viewpoints may comprise various other numbers of camera viewpoints as well.
- Turning now to
FIG. 2 , an example conceptual diagram of projecting an object from a 3D representation to a 2D representation is illustrated. In the example ofFIG. 2 , a3D model 202 of an object is illustrated.3D model 202 is positioned within a spherical coordinate space. The spherical coordinate space is represented by the tuple of numbers: (r,θ,φ), where r is a distance from the origin, θ is the polar angle, and φ is an azimuth angle. The center of the polar coordinate space (e.g. corresponding to coordinates (0,0,0) may be a centroid of3D model 202 in various implementations. - Also positioned within the polar coordinate space is a camera viewpoint 204. Camera viewpoint 204 is positioned at some point (r,θ,φ). As illustrated in
FIG. 2 , projection unit 204 may convert the camera viewpoint spherical coordinates to 3D cartesian coordinates (x, y, z), and may perform projection to convert the an image of3D model 202 viewed from camera viewpoint 204 to a set of projected 2D (x,y) coordinates, and may further convert the set of 2D Cartesian coordinates to a set of 2D pixel coordinates (u,v). Additional details regarding the process of projecting a 3D image to a 2D image will now be described with respect toFIG. 3 . - Turning now to
FIG. 3 , a conceptual diagram 300 illustrating a projection of an object from a 3D representation to a 2D representation. In the example ofFIG. 3 , a3D model 302 is illustrated.3D model 302 consists of a set of points, one of which ispoint 304, which is located at a point (x,y,z) located in a 3D cartesian space. - In this example of
FIG. 3 , a projection unit (e.g. projection unit 104 ofFIG. 1 ) may attempt to projectpoint 304 onto some2D point 306 of a2D surface 308.2D surface 308 may be located at some distance r from3D point 304, and distance z-r from anorigin 310 of the 3D space.projection unit 104 may use the following equations to projectpoint 304 to point 306, i.e. to determine the x-coordinates on 2D surface 308: -
- After generating the a given 2D image projection,
projection unit 104 may use an edgemapping technique to generate an edgemapped image of the given object based on the 2D image projection.Projection unit 104 may perform the edgemapping in various manners. In one example,projection unit 104 may apply a Sobel filter to a given 2D image projection to generate an edgemapped image of the given object.Projection unit 104 may use various other edge detection techniques (e.g. algorithms) to generate an edgemapped image as well. -
Projection unit 104 may also generate associated metadata for each given projection image. Examples of such metadata may comprise a name or class of an object represented in the projection or edgemapped projection as an example. In some examples, the metadata may comprise a dictionary comprising a set of attributes. The metadata may take various other forms as well. - Responsive to generating the edegemapped projection image,
projection unit 104 may output the edgemapped projection image. - In an implementation,
projection unit 104 may output each edgemapped image, and associated metadata to aprojection database 110.Projection database 110 may comprise a database or datastore that may store projection images (e.g. edgemapped projection images) and any associated metadata. In an implementation,projection database 110 may be queried using various query languages and indexed using a key identifier. In various examples,projection database 110 may be queried using a query language such as JSON, SQL or various other query languages. - Responsive to receiving a query containing a key corresponding to a given edgemapped projection image,
projection database 110 may output the corresponding edegemapped projection image and any associated metadata.Projection database 110 may comprise various other data stores, such as flat files, as well. -
Simulation unit 106 may generate a simulated image based on an inputted edgemapped projection image.Simulation unit 106 may accessprojection database 110 to obtain a given edgemapped projection image, and may input the given edgemapped projection image into a neural network to generate a simulated image of a given object that may appear as though the given object has passed through a given detection device. As examples, a synthesized image of an object may appear as though the object had passed through and had been scanned by a detection device, such as an X-ray scanner, CT scanner, or the like. - As an example,
simulation unit 106 may receive an edgemapped projection image of a knife as input.Simulation unit 106 may apply various operations on the edgemapped projection image to generate a synthetic image of the knife having the appearance of being scanned by a detection device, such as a multi-spectral X-ray scanner. - In an implementation,
simulation unit 106 may comprise a set of neural networks that may generate a synthesized image from an input image, e.g. an edgemapped projection image. Responsive to receiving an input edgemapped projection image of a given object,simulation unit 106 may determine an object associated with the edgemapped projection image, e.g. based on the edgemapped projection image's associated metadata. Based on the determined object,simulation unit 106 may select a given neural network to generate a synthetic image of the given object. - As an example,
simulation unit 106 may receive an edgemapped image of an object, which may be categorized as a “sharps” object. Such sharps object may comprise knives or other objects that have sharp edges. Based on the determined sharps object,simulation unit 106 may select a neural network to use to generate a simulated image of the sharps object. - In an implementation,
simulation unit 106 may select the neural network that is likely to generate the most accurate simulated image for the given input image. For instance, for an input edgemapped projection image of a sharps object,simulation unit 106 may select a neural network that may be configured to generate synthesized images of sharps objects, knives, of objects having blades, etc. - In an implementation, before inputting the edgemapped image into the selected neural network,
simulation unit 106 may resize (e.g. downsample) the inputted edgemapped image to dimensions 256×256×1, where 256×256 are the length and width respectively, and where “1” is the number of color channels. After resizing the edgemapped image,simulation unit 106 may output the downsampled edgemapped image into the selected neural network. - At a high level, the selected neural network comprises a series of layers. The selected neural network inputs the downsampled edgemapped image the series of layers, each of which apply mathematical operations to the output of the previous layer, and finally outputs a simulated image of the given object.
- In an implementation, the selected neural network may comprise a Generative Adversarial Network (GAN). Turning now to
FIG. 4 , asystem 400 is illustrated comprising an example generative adversarial network.System 400 comprises agenerator 402, and adiscriminator 404. The GAN ofFIG. 4 is a neural network comprised of two neural networks: (1) a discriminatorneural network 404, and (2) a generatorneural network 404. -
Discriminator 404 is a neural network that attempts to determine whether images that are input to the discriminator neural network are “real” or “fake.”Discriminator 404 is trained based on pairs of training data images that are specified as either “real” (real image pairs 408) or “fake” (fake image pairs 408).Discriminator 404 generates a determination of whether the discriminator neural network thinks the pair of images is real or fake (real/fake determination 406). Based on each pair image,discriminator 404 “learns” features that distinguish real and fake images. - After analyzing each image,
discriminator 404 may update the discriminator neural network's parameters to minimize an error function. The error function may be based on whether the discriminator neural network correctly identifies a given image as being real or fake. The discriminator neural network may update the discriminator neural network's parameters using techniques such as gradient ascent/descent, which is an iterative optimization technique. The discriminator neural network may utilize various other error minimization techniques as well. -
Generator 402 comprises a network that attempts to generate images that the discriminator believes are real. At a high level,generator 402 inputs an image, and attempts to generate a simulated output image that resembles a given class of output with high enough accuracy to fool the discriminator into thinking that the simulated output image is real when in fact it was generated artificially, i.e. the output image is not a captured image. -
Generator 402 anddiscriminator 404 are trained in an adversarial and alternating fashion. The discriminator inputs training images comprising real or fake pairs of edgemapped images of a given object, and based on whether or not discriminatorneural network 404 accurately predicts whether the given pair of images is real or fake, updates the parameters ofdiscriminator 404. Additionally,discriminator 404 uses backpropagation to convey gradient information togenerator 402 so thatgenerator 402 can use the gradient information to update its parameters to be better able to fooldiscriminator 404. -
Generator 402 takes in an input image and adds in some random noise, and generates a fake simulated output image. In an implementation. The random noise may comprise Gaussian noise, referred to as “z” so thatgenerator 402 generates slightly different output regardless of whether input is the same, i.e. to ensuregenerator 402 operates in a non-deterministic manner. In an implementation, other techniques may be used to ensure thatgenerator 402 operates in a non-deterministic fashion (i.e. to ensure stochasticity”) on thanconditioning generator 402 on Gaussian noise z. For example, a dropout layer can be used, which randomly selects nodes to be perturbed withingenerator 402's neural networks. - Based on the backpropagation information received from the discriminator,
generator 402 updates the parameters of its neural network to generate simulated output that more closely approximates real output, thereby fooling the discriminator more frequently.Generator 402 anddiscriminator 404 may be trained in an alternating fashion in which the discriminator attempts to distinguish an image and backpropagates gradient information, and then the discriminator generates a new fake image that attempts to fool the discriminator. This alternating training process may continue until the generator generates images that fool the discriminator with a given frequency. Oncegenerator 402 anddiscriminator 404 have been sufficiently trained, the generator can be used to generate simulated images based on inputted images, and the discriminator is no longer issued. - A more mathematically-detailed description of the operation of a generator and a discriminator (
e.g. generator 402 and discriminator 404) will now be discussed. As discussed above, a GAN comprises two convolutional neural network models, a discriminator and a generator, which are trained on opposing objective functions. - The discriminator is given paired images, referred to as X and Y or Y′. In an implementation, X can may comprise an edgemapped image, Y may comprise a corresponding scanned image (such as an X-ray) image of that edgemap, and Y′ may comprise an image generated by some generator G. From the images X, Y, and Y′ the discriminator (referred to as “D”) is trained to discriminate between real image data (referred to as Y) and synthesized image data (referred to as Y′).
- The generator G is trained to synthesize data (Y′) given an input image X and some gaussian noise z. Gaussian noise is may be added so that G (which can be thought of as a function that maps (X,z)→Y′) is non-deterministic. That is, the mapping will be conditioned on random noise z so it produces it produces a slightly different Y′ every time. Producing slightly different output images may be useful because there may be multiple ways to synthesize data while still preserving certain semantics and because generating varied output images may be desirable for various reasons. It should also be noted that there may be other ways to ensure that G is non-deterministic (to ‘ensure stochasticity’) other than conditioning on a Gaussian noise z. For example, noise can be introduced noise in a dropout layer, in which nodes are randomly selected to be perturbed within the neural network of the generator, G.
- As described above, the objective of generator G is to fool the discriminator D, by trying to synthesize image data that is as realistic as possible to ensure the discriminator does not do well in satisfying its objective. In addition, an L1 loss (least absolute deviations) may be added to the generator to ensure that L1 distance of the generator's synthesized output to the ground truth output is minimized. As described above, the objective of the discriminator, D, is to minimize the log-loss of the mistakes made in differentiating real and synthetic image data.
- As the generator and discriminator are iteratively trained (using stochastic gradient descent) with opposing objectives, both the generator and discriminator get progressively better. After sufficient training, the generator is able to generate realistic simulated images conditioned on an input.
- The process of determining weights for the generator and discriminator neural networks will now be discussed in greater detail. If θD and θG are the weights of the discriminator and generator neural network, respectively, the discriminator may be trained or may “learn” based on the following gradient:
-
∇θD [log D(∂)+log(1−D(G(X,z))], - and the generator may be trained or may learn based on the following gradient:
-
∇θG [log(1−D(G(X,z))]. - Based on the above-two gradient equations, the optimal generator, denoted as G* can be determined according to the following equation:
-
-
System 400 may attempt to determine the optimal generator G* using gradient and/or descent, as some examples. - In an implementation, the neural network of
generator 402 may comprise an encoder-decoder architecture. An encoder-decoder architecture is a neural network comprising encoding layers and decoding layers. Encoding layers attempt to reduce the dimensionality of an input image, thereby generating a lower-dimensional representation of the input image. Decoding layers perform an inverse process, and attempt to construct a higher-dimensional representation that maps the lower-dimensional representation generated by the encoder back to an image-dimensional representation having sufficient dimension to be displayed as an image. - More particularly, the encoding layers comprise a series of convolutional layers and pooling layer that perform convolution and downsampling, respectively. By performing convolution and downsampling, an encoder may reduce the dimensionality of an input image, thereby generating a lower-dimensional feature map of the input image. Conversely, a decoder may take the lower-dimensional representation of an image generated by an encoder, and may perform convolution and upsampling to generate a higher-dimensional representation of the image.
- A given convolutional layer may receive a set of neighboring input values (e.g. a feature map or a set of pixels) for processing, may apply a set of matrices referred to as “kernels” to the set of input values to generate a representation of the features identified form that set of input values, referred to as a feature map. Each convolutional layer may have a different associated set of kernels.
- To apply a given kernel, a convolutional layer performs a technique referred to as convolution, which takes a set of neighboring input values, which may comprise neighboring pixels or neighboring values of a feature map, and expresses a given value from the set as a weighted sum of the value and its neighboring values in which the weights for each input value are defined by the elements of the kernel matrices. The output of a convolutional layer is referred to as a “feature map” because the output contains information about features detected by the convolutional layer.
- A pooling layer may input a set of neighboring values, and selectively downsample the input values, e.g. pixels or values of a feature map. More particularly, the pooling layer may determine a set of regions and may apply a pooling function, each of the regions, and may apply a function, such as a max-pool function, to each region. The max-pool function may identify a maximum value from a given region, retain the maximum value, and may discard all other values in the region. A pooling layer may apply various other functions to input values as well.
- Turning now to
FIG. 5 , a conceptual diagram of an encoder-decoder architecture is illustrated. The example ofFIG. 5 illustrates an encoder-decoder architecture for a given class of object, e.g. a “sharps” object. The encoder-decoder network 500 illustrated inFIG. 5 may comprise a neural network of a generator,e.g. generator 402 ofFIG. 4 as an example. - In the example of
FIG. 5 ,network 500 takes anedgemapped image 506 as input, passes theinput image 506 through a series ofencoder layers 502A-N, and a set ofdecoder layers 504A-504N to generate outputsimulated image 508. In the example ofFIG. 5 ,input image 506 is an edgemapped image of a knife, andoutput image 506 is a simulated image of that knife which appears as though it has passed through a detection device, such as an X-ray scanner. - As described above, encoder layers 502A-502N may perform convolution and/or pooling operations. Each encoder layer, which may be comprised of multiple kernels (e.g. convolution, dropout, upsampling, etc.), may output a feature map to a subsequent convolution layer. Additionally, in an implementation, each encoder layer may output a feature map to a corresponding decoder layer, which is a technique referred to as a “skip layer” due to the fact that the output from the encoder layer skips intermediary encoder and decoder layers.
- In the example of
FIG. 5 ,convolution layer 502A outputs to bothencoder layer 502B and todecoder layer 504B (an example of such a skip layer). Each encoder layer generates feature map untilencoder layer 502N generates a feature map, which is outputted todecoder layer 504N. - Each of the decoder layers 504A-504N perform generally reciprocal operations relative to the encoder layers. Such reciprocal operations may comprise deconvolution, upsampling, and various other operations as well. The decoder layers output in reverse order relative to the encoder layers,
e.g. decoder layer 504B outputs todecoder layer 504A, etc. The decoder layers may also receive input from corresponding encoder layers as part of a skipping process. - In addition to parameters such as kernel weights, pooling functions, and various other parameters, neural networks may also have higher-level hyperparameters, which may define the operation of these neural networks. Such hyperparameters may define learning rate, numbers of layers, numbers of feature maps, etc. The hyperparameters for an encoder-decoder architecture may be defined in various manners.
- In one implementation, the architecture hyperparameters for an encoder as described in this disclosure (as described by the number of feature maps in each layer) may be: 64-128-256-512-512-512-512-512, where 64 is a number of features maps for a first encoder layer, and 512 is a number of feature maps for the last-in-sequence encoder layer. In another implementation, the architecture hyperparameters for the decoder may be 512-1024-1024-1024-1024-512-256-128, where 512 is a number of feature maps for the first-in-sequence decoder layer, and 128 is a number of feature maps for a last-in-sequence decoder layer. The hyperparameters may be defined using various other values as well.
- In an implementation,
background image database 112 comprises a database of images scanned by one or more detection devices. Detection devices may comprise one or more detection devices that capture images of objects such as baggage items, clothing, human beings, and the like. Example detection devices may comprise x-ray scanners, MRI scanners, CT scanners, spectral band scanners, millimeter wave scanners, or any other scanning device now or later devised. Other manners of detection devices are also possible. The images captured by detection devices may represent the captured data using various representations. As examples, the background images may be represented using pixels, voxels, polygons, or elements that may generally be used to construct image data. - In some examples, the background images stored in
background image database 112 captured by detection devices may be captured in a DICOS (“Digital Imaging and Communication for Security” standard published by the National Electrical Manufacturers Association) format. The background images may contain metadata, which may comprise information related to material density, geometric dimensions, and/or atomic numbers of various regions or graphical elements in a background image, as some non-limiting examples. It should be understood that image data may be captured in other formats and the metadata may take other various forms as well. - In some implementations, the background images stored in
background image database 112 may comprise a background image of a bag that was previously scanned. In some implementations, the background image may itself be simulated in part or in whole. For example, the simulation of an entire background image may encompass generating a 3-dimensional model of each item inside of an object and manipulating it in a manner similar to the manipulation of the target object. -
Synthesis unit 108 may comprise software, hardware (e.g. CPUs or GPUs), firmware, or any combination thereof that may generate a synthesized image that combines an inputted simulated image and an inputted scanned image.Synthesis unit 108 may obtain the scanned image frombackground image database 112, and the scanned image fromsimulation unit 106. - At a high level,
synthesis unit 108 may combine the pixel data of the simulated image with the pixel data of the background image to generate a combined, synthetic image. More particularly, synthesis unit first selects a location of the background image at which to insert the simulated image.Synthesis unit 108 may select this location in various manners. - After selecting the insertion location,
synthesis unit 108 may combine the simulated image and the background image to generate the synthetic image. Once combined, synthesis unit may add further variation to the combined image - Once synthesis unit has combined the background and simulated images,
synthesis unit 108 may apply various variations to the combined image. The variations may take various forms. As examples, such variations may comprise changes in: intensity, obscuration, noise, magnification, rotation, and Z-effective encoding. For each type of variation,synthesis unit 108 may select a bound based on the variation parameters, and may randomly sample a parameter from this bound. As an example, the parameter bounds for rotation may be in a range of 1 to 360 degrees, and an angle may be randomly sampled from this bound. - In another implementation,
synthesis unit 108 may learn the bounds of variation conditioned on a relevant image and the variation type, and may then randomly sample from the predicted bounds. The location of the background image at which the simulated image is injected may also be abstracted as a variation of type ‘translation’. Similarly, bounds of the translation variation may be predefined and then randomly sampled from those bounds or the bounds can be predicted from a machine learning model and then randomly sampled from those bounds. - In another implementation,
synthesis unit 108 may apply variations to the combined image that may comprise overlaying certain elements of the background image over the simulated image, and adapting the z-effective and density of the simulated image to match the background image. For example, if clothes or other low-density organic clutter are in the background of a background image such small “noise” may be overlaid onto the simulated image. If the background image contains high-density objects so as to make the background appear “black” to the user, then the part of the background image that is over said “black” portion may take on the same background due to the compounding nature of density on the ultimate x-ray penetrative ability at that point in space. - In another implementation,
synthesis unit 108 may attempt to make the applied variations appear more realistic by parameterizing the variations. In an implementation,synthesis unit 108 may learn parameters of these variation using an adversarial framework. In this implementation, a generator model may learn parameters of the variation and a discriminator model may learn to distinguish between images having an injected image of a simulated object and images synthesized object injected. -
Synthesis unit 108 may apply various other variations and in various other manners as well. - Responsive to generating a synthetic image,
synthesis unit 108 may output the generated synthetic image and any associated metadata tosynthetic image database 114. In an implementation,synthetic image database 114 may comprise a database that stores synthetic images. In an implementation,Synthetic image database 114 may take a query as input, may process the query, and may output a synthetic image and any associated metadata for the outputted synthetic image on the inputted query. More particularly,synthetic image database 114 may be indexed using a key identifier for a given image type or object type. In various examples,synthetic image database 114 may be queried using a query language such as JSON, - SQL or various other query languages.
Synthetic image database 114 may be queried in various other manners as well. In another implementation,synthetic image database 114 may comprise a data store other than a database such as a flat file.Synthetic image database 114 may take various other forms as well. -
FIG. 6 is a flow diagram illustrating an example method for generating synthetic images in accordance with techniques of this disclosure.FIG. 6 illustrates amethod 600.Method 600 may be implemented as a method which may be executed by at least one processor, - To help describe some of these operations, flow diagrams may be referenced to describe combinations of operations that may be performed. In some cases, each block may represent a module or portion of program code that includes instructions that are executable by a processor to implement specific logical functions or steps in a process. The program code may be stored on any type of computer-readable medium, such as non-transitory computer-readable media. In other cases, each block may represent circuitry that is wired to perform specific logical functions or steps in a process. Moreover, the blocks shown in the flow diagrams may be rearranged into different orders, combined into fewer blocks, separated into additional blocks, and/or removed based upon the particular embodiment.
- By way of example,
method 600 ofFIG. 6 may be executed bysystem 100 ofFIG. 1 .System 100 may be implemented in hardware, software, microcode, firmware, on an application-specific integrated circuit (ASIC), read-only memory (ROM), field-programmable gate arrays (FPGAs) or any combination thereof. The method ofFIG. 6 may be implemented in various other forms as well. - At
block 602,projection unit 102 may generate a 2D projection from a 3D representation of an object. - At
block 604,simulation unit 106 may generate, based on the projection, a simulated image of the object, wherein the simulated image appears as though the object has been scanned by a detection device. - At
block 606,synthesis unit 108 may combine the simulated object with a background image to form a synthesized image, wherein the background image was captured by a detection device, and atblock 608,overlay unit 108 may output the synthesized image. - The description above discloses, among other things, various example systems, methods, apparatus, and articles of manufacture including, among other components, firmware and/or software executed on hardware. It is understood that such examples are merely illustrative and should not be considered as limiting. For example, it is contemplated that any or all of the firmware, hardware, and/or software aspects or components can be embodied exclusively in hardware, exclusively in software, exclusively in firmware, or in any combination of hardware, software, and/or firmware. Accordingly, the examples provided are not the only way(s) to implement such systems, methods, apparatus, and/or articles of manufacture.
- Additionally, references herein to “embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one example embodiment of an invention. The appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. As such, the embodiments described herein, explicitly and implicitly understood by one skilled in the art, can be combined with other embodiments.
- The specification is presented largely in terms of illustrative environments, systems, procedures, steps, logic blocks, processing, and other symbolic representations that directly or indirectly resemble the operations of data processing devices coupled to networks. These process descriptions and representations are typically used by those skilled in the art to most effectively convey the substance of their work to others skilled in the art. Numerous specific details are set forth to provide a thorough understanding of the present disclosure. However, it is understood to those skilled in the art that certain embodiments of the present disclosure can be practiced without certain, specific details. In other instances, well known methods, procedures, components, and circuitry have not been described in detail to avoid unnecessarily obscuring aspects of the embodiments. Accordingly, the scope of the present disclosure is defined by the appended claims rather than the forgoing description of embodiments.
Claims (29)
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/727,108 US10210631B1 (en) | 2017-08-18 | 2017-10-06 | Generating synthetic image data |
US15/799,274 US10453223B2 (en) | 2017-08-18 | 2017-10-31 | Generating synthetic image data |
US16/658,513 US11423592B2 (en) | 2017-08-18 | 2019-10-21 | Object detection training based on artificially generated images |
US17/812,830 US11790575B2 (en) | 2017-08-18 | 2022-07-15 | Object detection training based on artificially generated images |
US18/467,340 US20240062437A1 (en) | 2017-08-18 | 2023-09-14 | Object Detection Training Based on Artificially Generated Images |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762547626P | 2017-08-18 | 2017-08-18 | |
US15/727,108 US10210631B1 (en) | 2017-08-18 | 2017-10-06 | Generating synthetic image data |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/799,274 Continuation US10453223B2 (en) | 2017-08-18 | 2017-10-31 | Generating synthetic image data |
Publications (2)
Publication Number | Publication Date |
---|---|
US10210631B1 US10210631B1 (en) | 2019-02-19 |
US20190057519A1 true US20190057519A1 (en) | 2019-02-21 |
Family
ID=65322735
Family Applications (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/727,108 Active US10210631B1 (en) | 2017-08-18 | 2017-10-06 | Generating synthetic image data |
US15/799,274 Active US10453223B2 (en) | 2017-08-18 | 2017-10-31 | Generating synthetic image data |
US16/658,513 Active 2038-02-27 US11423592B2 (en) | 2017-08-18 | 2019-10-21 | Object detection training based on artificially generated images |
US17/812,830 Active US11790575B2 (en) | 2017-08-18 | 2022-07-15 | Object detection training based on artificially generated images |
US18/467,340 Pending US20240062437A1 (en) | 2017-08-18 | 2023-09-14 | Object Detection Training Based on Artificially Generated Images |
Family Applications After (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/799,274 Active US10453223B2 (en) | 2017-08-18 | 2017-10-31 | Generating synthetic image data |
US16/658,513 Active 2038-02-27 US11423592B2 (en) | 2017-08-18 | 2019-10-21 | Object detection training based on artificially generated images |
US17/812,830 Active US11790575B2 (en) | 2017-08-18 | 2022-07-15 | Object detection training based on artificially generated images |
US18/467,340 Pending US20240062437A1 (en) | 2017-08-18 | 2023-09-14 | Object Detection Training Based on Artificially Generated Images |
Country Status (1)
Country | Link |
---|---|
US (5) | US10210631B1 (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110378842A (en) * | 2019-07-25 | 2019-10-25 | 厦门大学 | A kind of image texture filtering method, terminal device and storage medium |
CN111104982A (en) * | 2019-12-20 | 2020-05-05 | 电子科技大学 | Label-independent cross-task confrontation sample generation method |
CN111193920A (en) * | 2019-12-31 | 2020-05-22 | 重庆特斯联智慧科技股份有限公司 | Video picture three-dimensional splicing method and system based on deep learning network |
CN111415316A (en) * | 2020-03-18 | 2020-07-14 | 山西安数智能科技有限公司 | Defect data synthesis algorithm based on generation of countermeasure network |
CN112102460A (en) * | 2020-09-17 | 2020-12-18 | 上海复志信息技术有限公司 | 3D printing slicing method, device, equipment and storage medium |
CN112100908A (en) * | 2020-08-31 | 2020-12-18 | 西安工程大学 | Garment design method for generating confrontation network based on multi-condition deep convolution |
US20210224596A1 (en) * | 2020-01-19 | 2021-07-22 | John McDevitt | System and method for improved fake digital content creation |
WO2021150362A1 (en) * | 2020-01-21 | 2021-07-29 | Disney Enterprises, Inc. | Secure content processing pipeline |
US11257272B2 (en) * | 2019-04-25 | 2022-02-22 | Lucid VR, Inc. | Generating synthetic image data for machine learning |
US20220114722A1 (en) * | 2019-01-17 | 2022-04-14 | Smiths Detection France S.A.S. | Classifier using data generation |
US20220148119A1 (en) * | 2020-11-11 | 2022-05-12 | Fujitsu Limited | Computer-readable recording medium storing operation control program, operation control method, and operation control apparatus |
US11425120B2 (en) | 2020-02-11 | 2022-08-23 | Disney Enterprises, Inc. | Systems for authenticating digital contents |
US12124553B2 (en) | 2020-01-08 | 2024-10-22 | Disney Enterprises, Inc. | Content authentication based on intrinsic attributes |
Families Citing this family (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10534994B1 (en) * | 2015-11-11 | 2020-01-14 | Cadence Design Systems, Inc. | System and method for hyper-parameter analysis for multi-layer computational structures |
US10751879B2 (en) | 2017-06-05 | 2020-08-25 | Autodesk, Inc. | Adapting simulation data to real-world conditions encountered by physical processes |
US11188783B2 (en) * | 2017-10-19 | 2021-11-30 | Nokia Technologies Oy | Reverse neural network for object re-identification |
CN107767408B (en) * | 2017-11-09 | 2021-03-12 | 京东方科技集团股份有限公司 | Image processing method, processing device and processing equipment |
CN109784325A (en) * | 2017-11-10 | 2019-05-21 | 富士通株式会社 | Opener recognition methods and equipment and computer readable storage medium |
TWI682359B (en) * | 2018-01-29 | 2020-01-11 | 國立清華大學 | Image completion method |
US10552714B2 (en) * | 2018-03-16 | 2020-02-04 | Ebay Inc. | Generating a digital image using a generative adversarial network |
US10956635B1 (en) * | 2019-12-04 | 2021-03-23 | David Byron Douglas | Radiologist-assisted machine learning with interactive, volume subtending 3D cursor |
US11238197B1 (en) * | 2018-04-03 | 2022-02-01 | David Byron Douglas | Generating a 3D dataset containing a simulated surgical device |
US10872221B2 (en) * | 2018-06-21 | 2020-12-22 | Amazon Technologies, Inc | Non-contact biometric identification system |
KR102565279B1 (en) * | 2018-08-23 | 2023-08-09 | 삼성전자주식회사 | Object detection method, learning method for object detection, and devices thereof |
US10699458B2 (en) * | 2018-10-15 | 2020-06-30 | Shutterstock, Inc. | Image editor for merging images with generative adversarial networks |
US10997464B2 (en) * | 2018-11-09 | 2021-05-04 | Adobe Inc. | Digital image layout training using wireframe rendering within a generative adversarial network (GAN) system |
JP6915605B2 (en) * | 2018-11-29 | 2021-08-04 | オムロン株式会社 | Image generator, robot training system, image generation method, and image generation program |
US11010896B2 (en) * | 2018-12-17 | 2021-05-18 | Bodygram, Inc. | Methods and systems for generating 3D datasets to train deep learning networks for measurements estimation |
US11315021B2 (en) * | 2019-01-28 | 2022-04-26 | StradVision, Inc. | Method and device for on-device continual learning of a neural network which analyzes input data, and method and device for testing the neural network to be used for smartphones, drones, vessels, or military purpose |
US10325201B1 (en) * | 2019-01-31 | 2019-06-18 | StradVision, Inc. | Method and device for generating deceivable composite image by using GAN including generating neural network and discriminating neural network to allow surveillance system to recognize surroundings and detect rare event more accurately |
CN109934282B (en) * | 2019-03-08 | 2022-05-31 | 哈尔滨工程大学 | SAGAN sample expansion and auxiliary information-based SAR target classification method |
US10949635B2 (en) | 2019-04-11 | 2021-03-16 | Plus One Robotics, Inc. | Systems and methods for identifying package properties in an automated industrial robotics system |
CN110189351A (en) * | 2019-04-16 | 2019-08-30 | 浙江大学城市学院 | A kind of scratch image data amplification method based on production confrontation network |
CN110147794A (en) * | 2019-05-21 | 2019-08-20 | 东北大学 | A kind of unmanned vehicle outdoor scene real time method for segmenting based on deep learning |
CN110675353A (en) * | 2019-08-31 | 2020-01-10 | 电子科技大学 | Selective segmentation image synthesis method based on conditional generation countermeasure network |
WO2021087425A1 (en) * | 2019-10-31 | 2021-05-06 | Bodygram, Inc. | Methods and systems for generating 3d datasets to train deep learning networks for measurements estimation |
WO2021102030A1 (en) | 2019-11-18 | 2021-05-27 | Autodesk, Inc. | Synthetic data generation and building information model (bim) element extraction from floor plan drawings using machine learning |
US11019087B1 (en) * | 2019-11-19 | 2021-05-25 | Ehsan Adeli | Computer vision-based intelligent anomaly detection using synthetic and simulated data-system and method |
CN111091151B (en) * | 2019-12-17 | 2021-11-05 | 大连理工大学 | Construction method of generation countermeasure network for target detection data enhancement |
US11087172B2 (en) * | 2019-12-31 | 2021-08-10 | Plus One Robotics, Inc. | Systems and methods for creating training data |
IT202000000664A1 (en) * | 2020-01-15 | 2021-07-15 | Digital Design S R L | GENERATIVE SYSTEM FOR THE CREATION OF DIGITAL IMAGES FOR PRINTING ON DESIGN SURFACES |
US11272164B1 (en) * | 2020-01-17 | 2022-03-08 | Amazon Technologies, Inc. | Data synthesis using three-dimensional modeling |
KR102300796B1 (en) | 2020-02-03 | 2021-09-13 | 한국과학기술연구원 | Method for supporting x-ray image reading using image transform model and system performing the same |
CN111369468B (en) * | 2020-03-09 | 2022-02-01 | 北京字节跳动网络技术有限公司 | Image processing method, image processing device, electronic equipment and computer readable medium |
DE102020202964A1 (en) * | 2020-03-09 | 2021-09-09 | Continental Automotive Gmbh | The invention relates to a method for increasing the safety of driving functions. |
EP4150549A4 (en) * | 2020-05-14 | 2024-05-29 | Cignal LLC | Creating imagery for al model training in security screening |
WO2021247026A1 (en) * | 2020-06-04 | 2021-12-09 | Google Llc | Visual asset development using a generative adversarial network |
US20220012568A1 (en) * | 2020-07-07 | 2022-01-13 | Nvidia Corporation | Image generation using one or more neural networks |
KR102378742B1 (en) | 2020-07-30 | 2022-03-28 | 한국과학기술연구원 | System and method for supporting user to read x-ray image |
US11720650B2 (en) | 2020-10-30 | 2023-08-08 | Tiliter Pty Ltd. | Methods and apparatus for training a classification model based on images of non-bagged produce or images of bagged produce generated by a generative model |
KR102631452B1 (en) * | 2021-06-30 | 2024-01-30 | 주식회사 에이리스 | Method, system and recording medium for generating training data for detection model based on artificial intelligence |
US11900534B2 (en) * | 2021-07-30 | 2024-02-13 | The Boeing Company | Systems and methods for synthetic image generation |
US11651554B2 (en) * | 2021-07-30 | 2023-05-16 | The Boeing Company | Systems and methods for synthetic image generation |
US20240020935A1 (en) * | 2022-07-15 | 2024-01-18 | The Boeing Company | Modeling system for 3d virtual model |
CN116703791B (en) * | 2022-10-20 | 2024-04-19 | 荣耀终端有限公司 | Image processing method, electronic device and readable medium |
CN116778128B (en) * | 2023-08-15 | 2023-11-17 | 武汉大学 | Anti-patch re-projection method and system based on three-dimensional reconstruction |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170316285A1 (en) * | 2016-04-28 | 2017-11-02 | International Business Machines Corporation | Detection of objects in images using region-based convolutional neural networks |
Family Cites Families (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0797394B2 (en) | 1989-09-19 | 1995-10-18 | 株式会社テレマティーク国際研究所 | Character cutout recognition device |
US5600303A (en) | 1993-01-15 | 1997-02-04 | Technology International Incorporated | Detection of concealed explosives and contraband |
US6128365A (en) | 1998-02-11 | 2000-10-03 | Analogic Corporation | Apparatus and method for combining related objects in computed tomography data |
DE19959081A1 (en) | 1999-09-09 | 2001-03-22 | Heimann Systems Gmbh & Co | Process for image management of X-ray images |
US6829384B2 (en) | 2001-02-28 | 2004-12-07 | Carnegie Mellon University | Object finder for photographic images |
USH2110H1 (en) | 2002-07-30 | 2004-10-05 | The United States Of America As Represented By The Secretary Of The Air Force | Automated security scanning process |
US7194114B2 (en) | 2002-10-07 | 2007-03-20 | Carnegie Mellon University | Object finder for two-dimensional images, and system for determining a set of sub-classifiers composing an object finder |
US8804899B2 (en) | 2003-04-25 | 2014-08-12 | Rapiscan Systems, Inc. | Imaging, data acquisition, data transmission, and data distribution methods and systems for high data rate tomographic X-ray scanners |
US7856081B2 (en) | 2003-09-15 | 2010-12-21 | Rapiscan Systems, Inc. | Methods and systems for rapid detection of concealed objects using fluorescence |
US7277577B2 (en) | 2004-04-26 | 2007-10-02 | Analogic Corporation | Method and system for detecting threat objects using computed tomography images |
US7848566B2 (en) | 2004-10-22 | 2010-12-07 | Carnegie Mellon University | Object recognizer and detector for two-dimensional images using bayesian network based classifier |
US8503800B2 (en) | 2007-03-05 | 2013-08-06 | DigitalOptics Corporation Europe Limited | Illumination detection using classifier chains |
CA2608119A1 (en) | 2005-05-11 | 2006-11-16 | Optosecurity Inc. | Method and system for screening luggage items, cargo containers or persons |
WO2007131328A1 (en) | 2006-05-11 | 2007-11-22 | Optosecurity Inc. | Apparatus, method and system for screening receptacles and persons, having image distortion correction functionality |
US8494210B2 (en) | 2007-03-30 | 2013-07-23 | Optosecurity Inc. | User interface for use in security screening providing image enhancement capabilities and apparatus for implementing same |
US8014493B2 (en) | 2007-10-01 | 2011-09-06 | Optosecurity Inc. | Method and devices for assessing the threat status of an article at a security check point |
US8867816B2 (en) | 2008-09-05 | 2014-10-21 | Optosecurity Inc. | Method and system for performing X-ray inspection of a liquid product at a security checkpoint |
CN102203801B (en) | 2008-10-30 | 2014-03-26 | 模拟逻辑有限公司 | Detecting concealed threats |
US9310323B2 (en) | 2009-05-16 | 2016-04-12 | Rapiscan Systems, Inc. | Systems and methods for high-Z threat alarm resolution |
EA022136B1 (en) | 2009-05-16 | 2015-11-30 | Рапискан Системз, Инк. | Systems and methods for automated, rapid detection of high-atomic-number materials |
US8713131B2 (en) | 2010-02-23 | 2014-04-29 | RHPiscan Systems, Inc. | Simultaneous image distribution and archiving |
EP3483635B1 (en) | 2010-04-21 | 2021-11-17 | Vanderlande APC Inc. | Method and system for use in performing security screening |
JP5716978B2 (en) | 2010-09-30 | 2015-05-13 | アナロジック コーポレイション | Object classification using two-dimensional projection |
GB201203883D0 (en) | 2012-03-05 | 2012-04-18 | King S College London | Method and system to assist 2D-3D image registration |
US8908919B2 (en) | 2012-05-29 | 2014-12-09 | The Johns Hopkins University | Tactical object finder |
US9697710B2 (en) | 2012-06-20 | 2017-07-04 | Apstec Systems Usa Llc | Multi-threat detection system |
US9767565B2 (en) | 2015-08-26 | 2017-09-19 | Digitalglobe, Inc. | Synthesizing training data for broad area geospatial object detection |
US9767381B2 (en) | 2015-09-22 | 2017-09-19 | Xerox Corporation | Similarity-based detection of prominent objects using deep CNN pooling layers as features |
US10572797B2 (en) | 2015-10-27 | 2020-02-25 | Pusan National University Industry—University Cooperation Foundation | Apparatus and method for classifying home appliances based on power consumption using deep learning |
US10475174B2 (en) * | 2017-04-06 | 2019-11-12 | General Electric Company | Visual anomaly detection system |
-
2017
- 2017-10-06 US US15/727,108 patent/US10210631B1/en active Active
- 2017-10-31 US US15/799,274 patent/US10453223B2/en active Active
-
2019
- 2019-10-21 US US16/658,513 patent/US11423592B2/en active Active
-
2022
- 2022-07-15 US US17/812,830 patent/US11790575B2/en active Active
-
2023
- 2023-09-14 US US18/467,340 patent/US20240062437A1/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170316285A1 (en) * | 2016-04-28 | 2017-11-02 | International Business Machines Corporation | Detection of objects in images using region-based convolutional neural networks |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220114722A1 (en) * | 2019-01-17 | 2022-04-14 | Smiths Detection France S.A.S. | Classifier using data generation |
US11257272B2 (en) * | 2019-04-25 | 2022-02-22 | Lucid VR, Inc. | Generating synthetic image data for machine learning |
CN110378842A (en) * | 2019-07-25 | 2019-10-25 | 厦门大学 | A kind of image texture filtering method, terminal device and storage medium |
CN111104982A (en) * | 2019-12-20 | 2020-05-05 | 电子科技大学 | Label-independent cross-task confrontation sample generation method |
CN111193920A (en) * | 2019-12-31 | 2020-05-22 | 重庆特斯联智慧科技股份有限公司 | Video picture three-dimensional splicing method and system based on deep learning network |
US12124553B2 (en) | 2020-01-08 | 2024-10-22 | Disney Enterprises, Inc. | Content authentication based on intrinsic attributes |
US20210224596A1 (en) * | 2020-01-19 | 2021-07-22 | John McDevitt | System and method for improved fake digital content creation |
US11403369B2 (en) | 2020-01-21 | 2022-08-02 | Disney Enterprises, Inc. | Secure content processing pipeline |
WO2021150362A1 (en) * | 2020-01-21 | 2021-07-29 | Disney Enterprises, Inc. | Secure content processing pipeline |
US11425120B2 (en) | 2020-02-11 | 2022-08-23 | Disney Enterprises, Inc. | Systems for authenticating digital contents |
CN111415316A (en) * | 2020-03-18 | 2020-07-14 | 山西安数智能科技有限公司 | Defect data synthesis algorithm based on generation of countermeasure network |
CN112100908A (en) * | 2020-08-31 | 2020-12-18 | 西安工程大学 | Garment design method for generating confrontation network based on multi-condition deep convolution |
CN112102460A (en) * | 2020-09-17 | 2020-12-18 | 上海复志信息技术有限公司 | 3D printing slicing method, device, equipment and storage medium |
US11507057B2 (en) | 2020-09-17 | 2022-11-22 | Shanghai Fusion Tech Co., Ltd. | 3D printing slicing method, apparatus, device, and storage medium |
US11126162B1 (en) | 2020-09-17 | 2021-09-21 | Shanghai Fusion Tech Co., Ltd. | 3D printing slicing method, apparatus, device, and storage medium |
US20220148119A1 (en) * | 2020-11-11 | 2022-05-12 | Fujitsu Limited | Computer-readable recording medium storing operation control program, operation control method, and operation control apparatus |
Also Published As
Publication number | Publication date |
---|---|
US11790575B2 (en) | 2023-10-17 |
US10453223B2 (en) | 2019-10-22 |
US10210631B1 (en) | 2019-02-19 |
US11423592B2 (en) | 2022-08-23 |
US20200051291A1 (en) | 2020-02-13 |
US20240062437A1 (en) | 2024-02-22 |
US20230010382A1 (en) | 2023-01-12 |
US20190057520A1 (en) | 2019-02-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11790575B2 (en) | Object detection training based on artificially generated images | |
US11263499B2 (en) | Multi-perspective detection of objects | |
US11276213B2 (en) | Neural network based detection of items of interest and intelligent generation of visualizations thereof | |
Jain | An evaluation of deep learning based object detection strategies for threat object detection in baggage security imagery | |
Rogers et al. | Automated x-ray image analysis for cargo security: Critical review and future promise | |
KR101995294B1 (en) | Image analysis apparatus and method | |
Jaccard et al. | Tackling the X-ray cargo inspection challenge using machine learning | |
US11182637B2 (en) | X-ray image processing system and method, and program therefor | |
Andrews et al. | Representation-learning for anomaly detection in complex x-ray cargo imagery | |
Chouai et al. | CH-Net: Deep adversarial autoencoders for semantic segmentation in X-ray images of cabin baggage screening at airports | |
Liu et al. | A framework for the synthesis of x-ray security inspection images based on generative adversarial networks | |
Danso et al. | An optimal defect recognition security-based terahertz low resolution image system using deep learning network | |
Vukadinovic et al. | Automated detection of inorganic powders in X-ray images of airport luggage | |
Seyfi et al. | A literature review on deep learning algorithms for analysis of X-ray images | |
KR102158967B1 (en) | Image analysis apparatus, image analysis method and recording medium | |
US20220351517A1 (en) | Object identification system and method | |
US20200294259A1 (en) | Contour based image segmentation apparatus and method | |
Liang et al. | Geodesic spin contour for partial near-isometric matching | |
Dedring et al. | Synthesis and evaluation of seamless, large-scale, multispectral satellite images using Generative Adversarial Networks on land use and land cover and Sentinel-2 data | |
Wang et al. | SC-PCA: Shape Constraint Physical Camouflage Attack Against Vehicle Detection | |
Sebernegg | Benign object detection and distractor removal in 2d baggage scans | |
Ahmad | Object Recognition in 3D data using Capsules | |
CN118486015A (en) | NeRF and deep learning-based multi-view image contraband detection method | |
ISAAC-MEDINA et al. | On Deep Machine Learning for Multi-view Object Detection and Neural Scene Rendering | |
Akojwar et al. | Image copy-move forgery detection using block matching probabilities |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
AS | Assignment |
Owner name: SYNAPSE TECHNOLOGY CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CINNAMON, IAN;FAVIERO, BRUNO BRASIL FERRARI;GAUTAM, SIMANTA;REEL/FRAME:043943/0138 Effective date: 20171004 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: RAPISCAN LABORATORIES, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SYNAPSE TECHNOLOGY CORPORATION;REEL/FRAME:052322/0078 Effective date: 20200403 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 4 |