CN109064489A

CN109064489A - Method, apparatus, equipment and medium for face tracking

Info

Publication number: CN109064489A
Application number: CN201810786024.2A
Authority: CN
Inventors: 蔡育锋; 张无名; 陈飞; 杨松帆; 黄琰; 张邦鑫
Original assignee: Beijing Xintang Sichuang Educational Technology Co Ltd
Current assignee: Beijing Century TAL Education Technology Co Ltd
Priority date: 2018-07-17
Filing date: 2018-07-17
Publication date: 2018-12-21

Abstract

According to this disclosure relates to a kind of method, apparatus, equipment and medium for face tracking.According to one embodiment, the method for face tracking is provided.In the method, target face is detected in the first video frame of video sequence to determine the first human face region；Based on the first human face region, trace regions are calculated in the second video frame in the video sequence, trace regions instruction includes the candidate region of target face；Determine that detection zone, detection zone include at least trace regions in the second video frame based on trace regions；Detection target face is in the detection area to determine the second human face region；And trace regions are matched in response to the second human face region, determination is tracked successfully.

Description

Method, apparatus, equipment and medium for face tracking

Technical field

This disclosure relates to field of image processing, and in particular to method, apparatus, the equipment being tracked to the face in video And medium.

Background technique

In traditional live teaching broadcast classroom, teacher's personalization difficult to realize to student in direct broadcasting room is paid close attention to.Teacher Teacher is typically limited to the interaction forms of student and proposes problem, is then answered jointly by whole students.This live teaching broadcast class Hall is difficult to realize call the roll and answer a question, and the process answered a question of being more difficult to will to call the roll embodies in live streaming.

Summary of the invention

Based on the above issues, it according to the example embodiment of present disclosure, provides a kind of for real-time tracing face Scheme.

In the first aspect of present disclosure, a kind of method for tracking face is provided.Specifically, this method packet It includes: detecting target face in the first video frame of video sequence to determine the first human face region；Based on the first human face region, Trace regions are calculated in the second video frame in video sequence, trace regions instruction includes the candidate region of target face；It is based on Trace regions determine that detection zone, detection zone include at least trace regions in the second video frame；It detects in the detection area Target face is to determine the second human face region；And trace regions are matched in response to the second human face region, determination is tracked successfully.

By carrying out real-time inspection to trace regions, the accumulation of error in tracing algorithm can be effectively eliminated, to avoid Tracking target is lost, tracking precision is improved.

In the certain embodiments of the disclosure, determine that tracking successfully includes: to compare the second human face region and trace regions To determine the first offset；And in response to the first offset not higher than threshold value is examined, determination is tracked successfully.

In the certain embodiments of the disclosure, this method further comprises: tracking successfully in response to determining, is based on first Human face region trace regions calculate in the third video frame of video sequence and determine that new trace regions, new trace regions refer to Show the candidate region including target face.

In the certain embodiments of the disclosure, this method further include: be higher than in response to the first offset and examine threshold value, determine Tracking failure；And in response to determining tracking failure, it is based on the second human face region, is calculated in the third video frame of video sequence Determine new trace regions, new trace regions instruction includes the candidate region of target face.

In the certain embodiments of the disclosure, this method further include: camera lens frame is set based on trace regions, camera lens frame is used The image-region including target face in display video sequence.

In the certain embodiments of the disclosure, trace regions are defined as B '_track=(x, y, w, h), wherein x, y are to chase after The lower-left angular coordinate in track region, w are the width of trace regions, and h is the height of trace regions；Be provided with camera lens frame include: by Camera lens frame B '_camIt is set as B'_cam=(x-x₀,y-y₀, W, H), wherein x₀、y₀For predefined deviant, W is predefined mirror The width of head frame, H are the height of predefined camera lens frame, wherein W > w and H > h.

In the certain embodiments of the disclosure, this method further include: compare camera lens frame and for the elder generation in video sequence The first front lens frame of preceding video frame is to determine the second offset；And it is not higher than smooth threshold value in response to the second offset, it is based on camera lens Frame shows video sequence；And it is higher than smooth threshold value in response to the second offset, camera lens frame is smoothed.

In the certain embodiments of the disclosure, smoothing processing includes: to obtain inserting between camera lens frame and first front lens frame Value；And camera lens frame is arranged based on interpolation.

In the certain embodiments of the disclosure, trace regions are defined as B '_track=(x, y, w, h), x, y are tracing Area The coordinate in the lower left corner in domain, w are the width of trace regions, and h is the height of trace regions；Wherein based on trace regions in the second view Determine that detection zone includes: that will test region R ' in frequency frame_detectIt is set as R '_detect=(x-w, y-h, 3*w, 3*h).

In the certain embodiments of the disclosure, the coordinate at the center of the second human face region is (X, Y), the center of trace regions Coordinate is (X ', Y '), wherein determining that the first offset includes: to set the first offset d to

In the certain embodiments of the disclosure, the lower-left angular coordinate of camera lens frame is (X ', Y '), and the lower left corner of first front lens frame is sat It is designated as (X, Y), wherein determining that the second offset includes: to set the second offset s to

In the certain embodiments of the disclosure, obtaining interpolation includes: by interpolation B "_camIt is set asWherein W is the width of predefined camera lens frame Degree, H are the height of predefined camera lens frame.

In in the second aspect of the present disclosure, provide a kind of for tracking the device of face.Specifically, the device packet Include: detection module, the detection module are configured as detecting target face in the first video frame of video sequence to determine first Human face region；Computing module, the computing module are configured as based on the first human face region, the second video frame in the video sequence Middle calculating trace regions, trace regions instruction include the candidate region of target face；Inspection module, the inspection module are configured Are as follows: determine that detection zone, detection zone include at least trace regions in the second video frame based on trace regions；And it is detecting Target face is detected in region to determine the second human face region；Determining module, determining module are configured to respond to the second face Region Matching is tracked successfully in trace regions, determination.

In the certain embodiments of the disclosure, which further comprises: tracking successfully in response to determining, based on tracking Region calculates new trace regions in the third video frame of video sequence, and new trace regions instruction includes target face Candidate region.

In the certain embodiments of the disclosure, the device further include: be higher than in response to the first offset and examine threshold value, determine Tracking failure；And in response to determining tracking failure, it is based on the second human face region, is calculated in the third video frame of video sequence New trace regions, new trace regions instruction include the candidate region of target face.

In the certain embodiments of the disclosure, the device further include: display module, display module are configured as being based on chasing after Camera lens frame is arranged in track region, and camera lens frame is used to show the image-region including target face in video sequence.

In the certain embodiments of the disclosure, wherein display module further includes smooth submodule, and smooth submodule is matched It is set to: comparing camera lens frame with the first front lens frame for the preceding video frame in video sequence to determine the second offset；And it rings It should be not higher than smooth threshold value in the second offset, video sequence is shown based on camera lens frame；And it is higher than in response to the second offset smooth Threshold value is smoothed camera lens frame.

In the certain embodiments of the disclosure, wherein obtaining interpolation includes: by interpolation B "_camIt is set asWherein W is the width of predefined camera lens frame Degree, H are the height of predefined camera lens frame.

In the third aspect of present disclosure, a kind of equipment, including one or more processors are provided；And storage Device, for storing one or more programs, when one or more programs are executed by one or more processors so that one or The method that multiple processors realize the first aspect according to present disclosure.

In the fourth aspect of present disclosure, a kind of computer-readable medium is provided, is stored thereon with computer journey Sequence realizes the method for the first aspect according to present disclosure when the program is executed by processor.

The disclosure has carried out quality inspection while tracking target, to tracking each time, and in time to error bigger than normal It is corrected, thus accumulation of error when preventing tracking, and then target loss is avoided, realize higher for a long time chase after Track precision.In addition, the operational model that lightweight is used only in the disclosure cooperates algorithm policy, energy in the case where height tracks precision Enough accomplish that the frame per second of average 15FPS or so carries out video output, has achieved the effect that real-time tracing.

In contrast to the method for other application deep learning, the disclosure also has very big advantage in application aspect.This public affairs Opening quickly to be disposed on CPU, and most neural network models need GPU to carry out deployment and operation, this just subtracts The lower deployment cost of model is lacked.In addition, the disclosure can handle the low clear video flowing of high definition simultaneously, thus can also quick portion It affixes one's name in application scenarios.

It should be appreciated that content described in Summary is not intended to limit the embodiment of present disclosure Crucial or important feature, it is also non-for limiting the scope of the disclosure.The other feature of present disclosure will be by below Description is easy to understand.

Detailed description of the invention

It refers to the following detailed description in conjunction with the accompanying drawings, it is the above and other feature of each embodiment of present disclosure, excellent Point and aspect will be apparent.In the accompanying drawings, the same or similar appended drawing reference indicates the same or similar element, In:

Fig. 1 schematically shows the treatment process block diagram of the method for face tracking according to disclosure embodiment；

Fig. 2 schematically shows the treatment process frame of the method for face tracking according to another embodiment of the disclosure Figure；

Fig. 3 schematically shows the procedural block diagram of smoothing processing；

Fig. 4 shows the schematic diagram of camera lens frame offset；

Fig. 5 diagrammatically illustrates the block diagram of the device for face tracking according to disclosure embodiment；

Fig. 6 diagrammatically illustrates the block diagram of the device for face tracking according to another embodiment of the disclosure；

Fig. 7 shows the block diagram that can implement the calculating equipment of multiple embodiments of the disclosure.

Specific embodiment

The embodiment of present disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in attached drawing The certain embodiments of content, it should be understood that, present disclosure can be realized by various forms, and not answered This is construed as limited to embodiments set forth herein, and providing these embodiments on the contrary is for more thorough and complete geography Solve present disclosure.It should be understood that the being given for example only property of drawings and the embodiments of present disclosure acts on, it is not intended to Limit the protection scope of present disclosure.

In the description of the embodiment of present disclosure, term " includes " and its similar term should be understood as opening Include, i.e., " including but not limited to ".Term "based" should be understood as " being based at least partially on ".Term " embodiment " Or " embodiment " should be understood as " at least one embodiment ".Term " first ", " second " etc. may refer to difference Or identical object.Hereafter it is also possible that other specific and implicit definition.

Technology in field of image processing based on face tracking helps to solve the problems, such as above-mentioned live teaching broadcast.Closely Over a little years, with the development of deep learning, it is based on correlation filtering (Correlation Filter) and deep neural network The object tracking technology of (Deep Neural Network, DNN) emerges one after another.This kind of technologies pass through convolutional neural networks (Convolutional Neural Network, CNN) or Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN the target signature in image) is extracted, is then analyzed by correlation filtering, to obtain the location of target.Pass through Richer characteristics of image is obtained, this kind of method for tracing achieves very high tracking precision.However, deep neural network needs A large amount of calculation amount is consumed, and needs the plenty of time to initialize and more new model, thus is difficult to accomplish in application scenarios Tracking in real time, needs processed offline video.

In addition, all there is a common question in the above-mentioned method based on computer vision technique, i.e., as the tracking time increases Long, tracking error can be accumulated constantly, to eventually lose tracking target after the accurate tracking for completing a period of time.And And in these methods, due to lacking itself correction mechanism, target is difficult to restore again once be lost, so that can not Guarantee prolonged tracking.

In traditional method for tracing, tracing process needs to analyze the correlation of target.With prolonged tracking, error meeting Constantly accumulation eventually leads to target loss.In view of the deficiency in above-mentioned technical proposal, the present disclosure proposes improved face trackings Method and device.It is provided with the detection zone with tracking result real-time change in the disclosure, by detection zone The face that face is detected and be will test is compared with tracking result to realize checking function, to eliminate every time The error that may be accumulated in tracing computation.On the basis of not increasing excessive calculation amount, the disclosure can effectively avoid tracking mesh Target is lost, and then realizes prolonged real-time tracking.In addition, the disclosure can also to display tracking result camera lens carry out it is flat Sliding processing avoids the shake of tracking camera lens during live streaming, improves the effect of real-time tracing.

Embodiment of the present disclosure relates generally to carry out real-time tracing to the face in video.But it is to be understood that this The open tracking for being not limited to be used only for face, it is all to be needed defined by disclosure appended claims spirit and scope The target being tracked is considered as belonging to the protection scope of the disclosure.

Fig. 1 diagrammatically illustrates the treatment process frame of the method 100 for face tracking according to disclosure embodiment Figure.As shown in Figure 1, the treatment process of this method 100 may include:

At frame 101, after face tracking is activated, target face is detected in the current video frame of video sequence with true Determine the human face region in current video frame.

Specifically, using face recognition algorithms, system can detecte the position of all faces in current video frame, then The face that can be will test is compared with target face information gathered in advance, to obtain the position of target face.

In this embodiment, Face datection algorithm can be existing algorithm, such as the identification based on human face characteristic point It algorithm, the recognizer based on whole picture facial image, the recognizer based on template and is identified using neural network Algorithm.

System can store target face information gathered in advance.The target face information of the storage can be by other Picture or video obtain, and can also be obtained and carrying out Image Acquisition to live video before tracking starting.Target face Information can also be obtained from other non-image sources.Target face can be corresponding with ID, as long as so that acquisition ID can With the corresponding target face information of determination.

The position of the target face obtained by face recognition algorithms is human face region.It can be right according to the human face region System initializes parameter to be used.For example, the human face region detected according to this, can chase after to what is referred to below Trace regions in track algorithm are initialized, and can also be initialized to the detection zone and camera lens frame referred to below. Initialization can be completed in frame 101, can also set up other frame and complete, and can also use ginseng according to the use of parameter It is completed in several respective blocks.

In an embodiment of the disclosure, human face region, trace regions, Face datection region and camera lens frame are defined Region can be square frame-shaped, be also possible to other any proper shapes.The size of human face region can be according to detecting The size of face and corresponding change.The size of trace regions, Face datection region and camera lens frame can be fixed and invariable, can also According to circumstances and suitably to change.What face frame below and tracking frame respectively referred to generation is human face region and trace regions.

In an embodiment of the disclosure, initialization tracking frame may include: according to face frame I_faceCoordinate and Size determines that the tracking frame of initialization, such as the tracking frame of initialization have coordinate identical with face frame and size.Tracking Frame can be defined as B_track=(x, y, w, h), wherein x, y are box lower left corner apex coordinate, and w is the width of box, and h is box Height, camera lens frame and detection zone below refer to this definition.

It initializes detection zone and camera lens frame includes: according to the tracking frame B after initialization_track=(x, y, w, h), to determine The detection zone R of initialization_detect=(x-w, y-h, 3*w, 3*h), wherein the coordinate of detection zone and size are only exemplary , it also can have other coordinate positions and width and height, such as the height or the width of 6*w of the width of 4*w, 4*h, 5*h It is high；And according to the tracking frame B after initialization_track=(x, y, w, h) determines camera lens frame B_cam=(x-x₀,y-y₀, W, H), Middle x₀、y₀For predefined deviant, W is the width of predefined camera lens frame, and H is the height of predefined camera lens frame, such as x₀ =y₀=200, W=500, H=600 (digital representation pixel therein), the width and height of usual camera lens frame are all larger than and chase after The height and width of track frame.

At frame 102, based on the human face region detected in frame 101, second video frame in the video sequence Middle calculating trace regions, trace regions instruction include the candidate region of target face.

Specifically, it is being based on face frame I_faceCoordinate and size the tracking frame B of initialization has been determined_trackLater, will under One video frame and the initialization tracking frame determined based on face frame are input in tracing algorithm, and second video frame is calculated In tracking result (i.e. trace regions or tracking frame, be used to indicate the candidate region including target face).

Wherein above-mentioned tracing algorithm can be existing face tracking algorithm, such as KCF (Kernelized Correlation Filter, i.e. coring correlation filter) algorithm.

At frame 103, the calculated trace regions of tracing algorithm are based on, detection is determined in second above-mentioned video frame Region, the detection zone have included at least calculated trace regions.

For example, being B ' based on the calculated trace regions of tracing algorithm_track=(x ', y ', w, h), detection zone can be by It is determined as R '_detect=(x '-w, y '-h, 3*w, 3*h).Wherein the coordinate position and width of detection zone and height are only shown Example property, it also can have other coordinate positions appropriate and width and height.But identified detection zone is usual It should include above-mentioned trace regions.

At frame 104, the target face human face region new with determination is detected in identified detection zone.Specifically, The new human face region about target face is detected in the detection zone determined by previous frame using face recognition algorithms.

It will be understood that face recognition algorithms used in the frame can be calculated with the recognition of face used in frame 101 Method is identical.However, the new face recognition algorithms different from frame 101 also can be used in the frame.Due to needing to carry out in the frame The image-region of detection is smaller, and the face that may include is less (if detection zone restriction is suitable, can to only have target face In detection zone), therefore the face recognition algorithms that the frame uses can be more simplified compared to algorithm used in frame 101, from And it reduces calculation amount and saves holding time.

At frame 105 and frame 107, the trace regions in the new human face region detected in frame 104 and frame 102 are judged Whether match, to determine whether tracking succeeds；If tracked successfully, based on the trace regions in frame 102, in video sequence New trace regions are calculated in third video frame, if tracking failure, based on the new human face region in frame 104, in video New trace regions are calculated in the third video frame of sequence.

Specifically, system obtains the face area detected in the central point between region to be matched, such as frame 104 first The centre coordinate in domain is (X, Y), and the coordinate at the center of the trace regions in frame 102 is (X ', Y ')；Then, two centers are calculated Offset between coordinate Then, by the inspection threshold value of offset d and systemic presupposition It compares, if offset d can be determined and be tracked successfully not higher than threshold value is examined, whereas if offset d, which is higher than, examines threshold value, It can then determine tracking failure.

If tracked successfully, by calculated trace regions in next video frame, i.e. third video frame and frame 102 It is input to and carries out continuing to track in the tracing algorithm of frame 102, and continuing to execute frame 103, frame 104 and frame 105 later.

If tracking failure, will be determined as what initial detecting arrived in the human face region that detection zone detects in frame 104 Human face region, and according to new human face region initialization tracking frame, camera lens frame and detection zone, and frame is being continued to execute later 102, frame 103, frame 104 and frame 105.

Other than frame 101- frame 105, embodiment of the present disclosure can also include output tracking result to be shown Frame (not shown in figure 1).Specifically, camera lens frame, tracking frame here can be arranged based on tracking frame calculated in frame 102 The mode that camera lens frame is arranged is identical with the frame setting mode of camera lens frame is tracked in above-mentioned initialization, therefore repeats no more.Then will Identified camera lens frame corresponds in video frame, and intercepts corresponding image, then shows corresponding image.

The embodiment in tracing process simultaneously monitoring, tracing quality, before tracking out active, to tracing algorithm The case where being corrected, to guarantee being not in pursuing missing in tracing process.

Fig. 2 shows handle block diagram according to the process of the method 200 for face tracking of another embodiment of the disclosure. The difference is that, the frame for flow display camera lens is increased during the embodiment with above description embodiment 206。

As shown in Fig. 2, frame 206 is arranged between frame 102 and frame 103.In frame 101, camera lens frame is initialized, Wherein according to the tracking frame B after initialization_track=(x, y, w, h) determines camera lens frame B_cam=(x-x₀,y-y₀,W,H).Camera lens Frame is used to show the image-region including target face in video sequence.Frame 206 is mainly used for the mobile progress to camera lens frame Flow display.

The treatment process of frame 206 is shown in Fig. 3:

At frame 2061, it is inclined to determine with the first front lens frame for the preceding video frame in video sequence to compare camera lens frame It moves.

If the coordinate in the current lens frame lower left corner is (X ', Y '), the lower left corner of the first front lens frame of preceding video frame Coordinate is (X, Y), then can determine offset are as follows:

At frame 2062 and frame 2064, offset is compared with the smooth threshold value of systemic presupposition.

Fig. 4 shows the schematic diagram of camera lens frame offset 400.As shown in figure 4, B '_camIndicate current camera lens frame, B_camIt indicates Previous camera lens frame, s are indicated from previous camera lens frame to the offset of current camera lens frame.If deviating s is not higher than smooth threshold value, Video sequence is then shown based on camera lens frame, whereas if offset s is higher than smooth threshold value, then camera lens frame will be smoothed. For example, smooth threshold value can be 10 (pixels).

The smoothing processing includes the interpolation obtained between camera lens frame and first front lens frame.For example, the interpolation can beThen, camera lens is arranged based on interpolation Frame.

Fig. 3 is returned, at frame 2063, output tracking result.Specifically, it is obtained after being handled by frame 2062 and frame 2064 Final camera lens frame B "_camIt corresponds in video frame, and intercepts corresponding image.Corresponding image is output to front end, thus aobvious Show the result of tracking.

There is no the relationships between processing frame and frame for conventional method for tracing, thus tracing process will appear many jumps Point, i.e. tracking quickly jump to another point from a point.This problem is exactly mirror during being embodied in real-time tracing The fast jitter of head, can be greatly reduced the effect of real-time tracing.In addition to applying real-time tracing in another embodiment of the disclosure Except method is tracked, gliding smoothing processing also is carried out to tracking camera lens, so as to be more close to true man true for tracing process The tracking effect of real camera.

Fig. 5 diagrammatically illustrates the block diagram of the device 500 for face tracking according to disclosure embodiment.Specifically Ground, the device 500 include detection module 501, computing module 502, inspection module 503, determining module 504 and display module 505. Modules will be successively described in detail below.

In an embodiment of the disclosure, detection module 501 is configured as: after face tracking is activated, being regarded Target face is detected in the current video frame of frequency sequence to determine the human face region in current video frame.

In an embodiment of the disclosure, detection module can execute face recognition algorithms, work as so as to detect The position of all faces in preceding video frame, the face that then can be will test and target face information gathered in advance carry out Compare, and then obtains the position of target face.

In an embodiment of the disclosure, Face datection algorithm can be existing algorithm, such as based on face spy It levies the recognizer put, the recognizer based on whole picture facial image, the recognizer based on template and utilizes neural network The algorithm identified.

In an embodiment of the disclosure, it can store in detection module or outside detection module and adopted in advance The target face information of collection.The target face information of the storage can be obtained by other pictures or video, can also tracked It is obtained and carrying out Image Acquisition to live video before starting.Target face information can also be from other non-image sources Middle acquisition.Target face can be corresponding with ID, as long as so that obtaining ID is assured that corresponding target face information.

In an embodiment of the disclosure, the position for the target face that detection module is obtained by face recognition algorithms As human face region.Parameter to be used can be initialized device by detection module according to the human face region.For example, root According to the human face region detected or face frame, detection module can be initialized the trace regions in tracing algorithm, Detection zone and camera lens frame can be initialized.

In an embodiment of the disclosure, human face region, trace regions, Face datection region and camera lens frame are defined Region can be square frame-shaped, be also possible to other any proper shapes.The size of human face region can be according to detecting The size corresponding change of face.The size of trace regions, Face datection region and camera lens frame can be fixed and invariable, can also be with According to circumstances and suitably change.

In an embodiment of the disclosure, initializes detection zone and camera lens frame includes: according to chasing after after initialization Track frame B_track=(x, y, w, h), to determine the detection zone R of initialization_detect=(x-w, y-h, 3*w, 3*h), wherein detection zone The coordinate and size in domain are merely exemplary, and also can have other coordinate positions and width and height, such as 4*w The height of the width of the wide, height of 4*h or 6*w, 5*h；And come according to the tracking frame B after initialization_track=(x, y, w, h) determines mirror Head frame B_cam=(x-x₀,y-y₀, W, H), wherein wherein x₀、y₀For predefined deviant, W is the width of predefined camera lens frame Degree, H are the height of predefined camera lens frame, such as x₀=y₀=200, W=500, H=600 (digital representation pixel therein Point), the width and height of usual camera lens frame are all larger than the height and width of tracking frame.

In an embodiment of the disclosure, computing module 502 is configured as: based on the people detected in detection module Face region calculates trace regions in second video frame in the video sequence, and trace regions instruction includes target face Candidate region.

In an embodiment of the disclosure, it is being based on face frame I_faceCoordinate and size chasing after for initialization has been determined Track frame B_trackLater, next video frame and the initialization tracking frame determined based on face frame are input to tracking and calculated by computing module In method, be calculated in second video frame tracking result (i.e. trace regions or tracking frame, be used to indicate including target face Candidate region).

In an embodiment of the disclosure, above-mentioned tracing algorithm can be existing face tracking algorithm, such as KCF (Kernelized Correlation Filter, i.e. coring correlation filter) algorithm.

In an embodiment of the disclosure, inspection module 503 is configured as: being based on the calculated tracking of tracing algorithm Region determines that detection zone, the detection zone have included at least calculated trace regions in second above-mentioned video frame； The target face human face region new with determination is detected in identified detection zone.

It is B ' based on the calculated trace regions of tracing algorithm in an embodiment of the disclosure_track=(x ', y ', W, h), inspection module can will test region and be determined as R '_detect=(x '-w, y '-h, 3*w, 3*h).The wherein seat of detection zone Cursor position and width and height are merely exemplary, and also can have other coordinate positions appropriate and width and height Degree.But identified detection zone usually should include above-mentioned trace regions.

In an embodiment of the disclosure, inspection module is using face recognition algorithms in identified detection zone Detect the new human face region about target face.

In an embodiment of the disclosure, performed face recognition algorithms can be with detection mould in inspection module Face recognition algorithms used in block are identical.However, inspection module can also execute the new face different from detection module Recognizer.Since the image-region detected in inspection module is smaller, the face that may include is less (if inspection The suitable of region restriction is surveyed, can only have target face in detection zone), therefore recognition of face used in inspection module is calculated Method can be more simplified compared to algorithm used in detection module, to reduce calculation amount and save holding time.

In an embodiment of the disclosure, determining module 504 is configured as: judge that inspection module detects is new Whether human face region matches with trace regions calculated in computing module, to determine whether tracking succeeds；If tracking at Function is based on the calculated trace regions of computing module, calculates new trace regions in the third video frame of video sequence, such as Fruit tracking failure is calculated new based on the new human face region that inspection module detects in the third video frame of video sequence Trace regions.

In an embodiment of the disclosure, determining module obtains the central point between region to be matched, example first The centre coordinate of the human face region detected such as inspection module is (X, Y), the coordinate at the center of the calculated trace regions of computing module For (X ', Y ')；It is then determined module calculates the offset between two centre coordinates Then, it is determined that module compares offset d with preset inspection threshold value, if offset d can be determined not higher than threshold value is examined It tracks successfully, whereas if offset d, which is higher than, examines threshold value, then can determine tracking failure.

In an embodiment of the disclosure, if tracked successfully, computing module is by next video frame, i.e. third A video frame and the calculated trace regions of computing module, which are input to, to carry out continuing to track in tracing algorithm.

In an embodiment of the disclosure, if tracking failure, the people that detection module detects inspection module Face region is determined as the human face region that initial detecting arrives, and initializes tracking frame, camera lens frame and detection zone according to the human face region Domain.

In an embodiment of the disclosure, in addition, the device 500 of the disclosure can also include display module 505, show Show that module 505 can be configured as based on the calculated trace regions setting camera lens frame of computing module.Here trace regions are set The mode for setting camera lens frame is identical with the frame setting mode of camera lens frame is tracked in above-mentioned initialization, therefore repeats no more.Then, it shows Module 505 corresponds to set camera lens frame in video frame, and intercepts corresponding image, then shows corresponding image Show.

Fig. 6 shows the block diagram of the device 600 for face tracking according to another embodiment.With dress shown in fig. 5 Set the difference is that, provided with can be to the display module 605 that the movement of camera lens frame is smoothed.Specifically, should Display module 605 can have smooth submodule 6051, which is configured as: comparing camera lens frame and is used to regard The first front lens frame of preceding video frame in frequency sequence is to determine offset；Offset and the smooth threshold value of systemic presupposition are compared Compared with；And output tracking result.

In an embodiment of the disclosure, if the coordinate in the current lens frame lower left corner is (X ', Y '), previous video The coordinate in the lower left corner of the first front lens frame of frame is (X, Y), then smooth submodule can determine offset are as follows:

In an embodiment of the disclosure, if offset is not higher than smooth threshold value, smooth submodule module is based on Camera lens frame shows video sequence, whereas if offset is higher than smooth threshold value, then smooth submodule will smoothly locate camera lens frame Reason.For example, smooth threshold value can be 10 (pixels).

The smoothing processing includes the interpolation obtained between camera lens frame and first front lens frame.For example, the interpolation can beThen, smooth submodule based on interpolation come Camera lens frame is set.

The final camera lens frame B " that display module will also obtain_camIt corresponds in video frame, and intercepts corresponding image.It will be right The image answered is output to display module, to show the result of tracking.

According to the example implementations of present disclosure, a kind of equipment, including one or more processors are provided；With And storage device, for storing one or more programs.When one or more programs are executed by one or more processors, make One or more processors realization is obtained according to the method for present disclosure.

According to the example implementations of present disclosure, a kind of computer-readable medium is provided, is stored thereon with meter Calculation machine program realizes the method according to present disclosure when the program is executed by processor.

Fig. 7 shows the block diagram that can implement the calculating equipment 700 of multiple embodiments of present disclosure.As schemed Show, equipment 700 includes central processing unit (CPU) 701, can be according to the calculating being stored in read-only memory (ROM) 702 Machine program instruction is loaded into the computer program instructions in random access storage device (RAM) 703 from storage unit 708, comes Execute various movements appropriate and processing.In RAM 703, it can also store equipment 700 and operate required various programs and data. CPU 701, ROM 702 and RAM 703 are connected with each other by bus 704.Input/output (I/O) interface 705 is also connected to always Line 704.

Multiple components in equipment 700 are connected to I/O interface 705, comprising: input unit 706, such as keyboard, mouse etc.； Output unit 707, such as various types of displays, loudspeaker etc.；Storage unit 708, such as disk, CD etc.；And it is logical Believe unit 709, such as network interface card, modem, wireless communication transceiver etc..Communication unit 709 allows equipment 700 by such as The computer network of internet and/or various telecommunication networks exchange information/data with other equipment.

Processing unit 701 executes each method as described above and processing, such as method 100 and/or method 200.Example Such as, in some embodiments, method 100 and/or method 200 can be implemented as computer software programs, visibly be wrapped Contained in machine readable media, such as storage unit 708.In some embodiments, some or all of of computer program can To be loaded into and/or be installed in equipment 700 via ROM 702 and/or communication unit 709.When computer program loads arrive RAM 703 and by CPU 701 execute when, one or more frames of method as described above 100 and/or method 200 can be executed. Alternatively, in other embodiments, CPU 701 can pass through other any modes (for example, by means of firmware) appropriate It is configured as execution method 100 and/or method 200.

Function described herein can be executed at least partly by one or more hardware logic components.Example Such as, without limitation, the hardware logic component for the exemplary type that can be used includes: field programmable gate array (FPGA), dedicated Integrated circuit (ASIC), Application Specific Standard Product (ASSP), the system (SOC) of system on chip, load programmable logic device (CPLD) etc..

Program code for implementing the method for present disclosure can be using any group of one or more programming languages It closes to write.These program codes can be supplied to general purpose computer, special purpose computer or other programmable data processing units Processor or controller so that program code when by processor or controller execution when make to be advised in flowchart and or block diagram Fixed function/operation is carried out.Program code can be executed completely on machine, partly be executed on machine, as independence Software package partly executes on machine and partly executes or hold on remote machine or server on the remote machine completely Row.

In the context of present disclosure, machine readable media can be tangible medium, may include or stores The program for using or being used in combination with instruction execution system, device or equipment for instruction execution system, device or equipment.Machine Device readable medium can be machine-readable signal medium or machine-readable storage medium.Machine readable media may include but unlimited In times of electronics, magnetic, optical, electromagnetism, infrared or semiconductor system, device or equipment or above content What appropriate combination.The more specific example of machine readable storage medium will include the electrical connection of line based on one or more, portable Formula computer disks, hard disk, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage are set Standby or above content any appropriate combination.

Although this should be understood as requiring operating in this way with shown in addition, depicting each operation using certain order Certain order out executes in sequential order, or requires the operation of all diagrams that should be performed to obtain desired result. Under certain environment, multitask and parallel processing be may be advantageous.Similarly, although containing several tools in being discussed above Body realize details, but these be not construed as to scope of the present disclosure limitation.In individual embodiment Certain features described in context can also be realized in combination in single realize.On the contrary, in the context individually realized Described in various features can also realize individually or in any suitable subcombination in multiple realizations.

Although having used specific to this theme of the language description of structure feature and/or method logical action, answer When understanding that theme defined in the appended claims is not necessarily limited to special characteristic described above or movement.On on the contrary, Special characteristic described in face and movement are only to realize the exemplary forms of claims.

Claims

1. a kind of method for face tracking, comprising:

Target face is detected in the first video frame of video sequence to determine the first human face region；

Based on first human face region, trace regions, the tracking are calculated in the second video frame in the video sequence Region instruction includes the candidate region of the target face；

Determine that detection zone, the detection zone include at least described chase after in second video frame based on the trace regions Track region；

The target face is detected in the detection zone to determine the second human face region；And

The trace regions are matched in response to second human face region, determination is tracked successfully.

2. according to the method described in claim 1, wherein the determination tracks and successfully includes:

Compare second human face region and the trace regions to determine the first offset；And

In response to first offset not higher than threshold value is examined, determination is tracked successfully.

3. according to the method described in claim 2, further comprising: it is tracked successfully in response to determination,

Based on the trace regions, new trace regions are calculated in the third video frame of the video sequence, described new chases after The instruction of track region includes the candidate region of the target face.

4. according to the method described in claim 2, further include:

It is higher than the inspection threshold value in response to first offset, determines tracking failure；And

In response to determining tracking failure, it is based on second human face region, is calculated in the third video frame of the video sequence New trace regions, the new trace regions instruction include the candidate region of the target face.

5. according to the method described in claim 1, further include:

Camera lens frame is set based on the trace regions, the camera lens frame is used to show that in the video sequence to include the target The image-region of face.

6. according to the method described in claim 5, wherein the trace regions are defined as B '_track=(x, y, w, h), wherein x, Y is the lower-left angular coordinate of the trace regions, and w is the width of the trace regions, and h is the height of the trace regions；

Being provided with the camera lens frame includes:

By the camera lens frame B '_camIt is set as B '_cam=(x-x₀, y-y₀, W, H), wherein x₀、y₀For predefined deviant, W is pre- The width of the camera lens frame of definition, H are the height of predefined camera lens frame, wherein W > w and H > h.

7. method according to claim 5 or 6, further includes:

Compare the camera lens frame with the first front lens frame for the preceding video frame in the video sequence to determine the second offset； And

It is not higher than smooth threshold value in response to second offset, the video sequence is shown based on the camera lens frame；And

It is higher than the smooth threshold value in response to second offset, the camera lens frame is smoothed.

8. according to the method described in claim 7, wherein the smoothing processing includes:

Obtain the interpolation between the camera lens frame and the first front lens frame；And

Based on the interpolation, the camera lens frame is set.

9. according to the method described in claim 1, wherein the trace regions are defined as B '_track=(x, y, w, h), x, y are The coordinate in the lower left corner of the trace regions, w are the width of the trace regions, and h is the height of the trace regions；

Wherein determine that detection zone includes: by the detection zone in second video frame based on the trace regions R′_detectIt is set as R '_detect=(x-w, y-h, 3*w, 3*h).

10. according to the method described in claim 2, wherein the coordinate at the center of second human face region be (X, Y), it is described to chase after The coordinate at the center in track region is (X ', Y '),

Wherein determine that the first offset includes: to set first offset d to

11. according to the method described in claim 8, wherein the lower-left angular coordinate of the camera lens frame is (X ', Y '), the previous mirror The lower-left angular coordinate of head frame is (X, Y),

Wherein determine that second offset includes: to set the second offset s to

12. according to the method for claim 11, wherein obtaining the interpolation includes: by the interpolation B "_camIt is set asWherein W is the width of predefined camera lens frame Degree, H are the height of predefined camera lens frame.

13. a kind of device for face tracking, comprising:

Detection module, the detection module are configured as detecting target face in the first video frame of video sequence to determine the One human face region；

Computing module, the computing module are configured as based on first human face region, and second in the video sequence Trace regions are calculated in video frame, the trace regions instruction includes the candidate region of the target face；

Inspection module, the inspection module are configured as:

Determine that detection zone, the detection zone include at least described chase after in second video frame based on the trace regions Track region；And

The target face is detected in the detection zone to determine the second human face region；

Determining module, determining module are configured to respond to second human face region and are matched with the trace regions, and determination chases after Track success.

14. device according to claim 13, wherein the determination tracks and successfully includes:

15. device according to claim 14, further comprises: it is tracked successfully in response to determination,

16. device according to claim 14, further includes:

17. device according to claim 13, further includes:

Display module, the display module are configured as that camera lens frame is arranged based on the trace regions, and the camera lens frame is for showing Show the image-region including the target face in the video sequence.

18. device according to claim 17, wherein the trace regions are defined as B '_track=(x, y, w, h), wherein X, y is the lower-left angular coordinate of the trace regions, and w is the width of the trace regions, and h is the height of the trace regions；

Being provided with the camera lens frame includes:

19. device described in 7 or 18 according to claim 1, wherein the display module further includes smooth submodule,

The smooth submodule is configured as:

20. device according to claim 19, wherein the smoothing processing includes:

Based on the interpolation, the camera lens frame is set.

21. device according to claim 13, wherein the trace regions are defined as B '_track=(x, y, w, h), x, y For the coordinate in the lower left corner of the trace regions, w is the width of the trace regions, and h is the height of the trace regions；

22. device according to claim 14, wherein the coordinate at the center of second human face region is (X, Y), it is described The coordinate at the center of trace regions is (X ', Y '),

Wherein determine that the first offset includes: to set first offset d to

23. device according to claim 20, wherein the lower-left angular coordinate of the camera lens frame is (X ', Y '), it is described previously The lower-left angular coordinate of camera lens frame is (X, Y),

Wherein determine that second offset includes: to set the second offset s to

24. device according to claim 23, wherein obtaining the interpolation includes: by the interpolation B "_camIt is set asWherein W is the width of predefined camera lens frame Degree, H are the height of predefined camera lens frame.

25. a kind of equipment, the equipment include:

One or more processors；And

Storage device, for storing one or more programs, when one or more of programs are by one or more of processing Device executes, so that one or more of processors are realized is used for face tracking according to claim 1 described in any one of -12 Method.

26. a kind of computer readable storage medium is stored thereon with computer program, realization when described program is executed by processor The method of face tracking is used for described in any one of -12 according to claim 1.