[go: nahoru, domu]

CN111556350B - Intelligent terminal and man-machine interaction method - Google Patents

Intelligent terminal and man-machine interaction method Download PDF

Info

Publication number
CN111556350B
CN111556350B CN202010315144.1A CN202010315144A CN111556350B CN 111556350 B CN111556350 B CN 111556350B CN 202010315144 A CN202010315144 A CN 202010315144A CN 111556350 B CN111556350 B CN 111556350B
Authority
CN
China
Prior art keywords
image
gesture
determining
display
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010315144.1A
Other languages
Chinese (zh)
Other versions
CN111556350A (en
Inventor
孟祥奇
冯谨强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Co Ltd
Original Assignee
Hisense Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Co Ltd filed Critical Hisense Co Ltd
Priority to CN202010315144.1A priority Critical patent/CN111556350B/en
Publication of CN111556350A publication Critical patent/CN111556350A/en
Application granted granted Critical
Publication of CN111556350B publication Critical patent/CN111556350B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/438Interfacing the downstream path of the transmission network originating from a server, e.g. retrieving encoded video stream packets from an IP network
    • H04N21/4383Accessing a communication channel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/443OS processes, e.g. booting an STB, implementing a Java virtual machine in an STB or power management in an STB
    • H04N21/4438Window management, e.g. event handling following interaction with the user interface

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses an intelligent terminal and a man-machine interaction method, wherein the intelligent terminal comprises: a display and a processor. A display for displaying an interface; a processor configured to perform the following processes: the method comprises the steps of acquiring a video stream acquired by an image acquisition device, then identifying N frames of images in the video stream, and determining position variation and/or angle variation of M key pixel points in a hand image area in the N frames of images according to an identification result. In one case, when the position variation and/or the angle variation of the M key pixels satisfy a first setting condition, text information corresponding to the N frames of images is determined, a first control signaling corresponding to the text information is generated, and the display is controlled to display the text information. According to the method, human-computer interaction with the intelligent terminal can be achieved through gestures under the condition that a user does not touch the intelligent terminal or a remote controller of the intelligent terminal, so that the method is more convenient and faster, and is beneficial for hearing-impaired users to use the intelligent terminal.

Description

Intelligent terminal and man-machine interaction method
Technical Field
The application relates to the field of intelligent terminals, in particular to an intelligent terminal and a man-machine interaction method.
Background
At present, a user generally controls the smart television through a remote controller, but the smart television is controlled through the remote controller, and the smart television has the following defects: for careless users, the remote controller can not be found frequently, the user also needs to replace the battery of the remote controller regularly, and the problems of hardware failure, button failure and the like easily occur after the remote controller is used for a long time.
Based on the above analysis, the embodiment of the present application provides another way to control the smart tv, which can control the smart tv without using a remote controller.
Disclosure of Invention
The invention provides an intelligent terminal and a human-computer interaction method, which can realize human-computer interaction with the intelligent terminal through gestures under the condition that a user does not touch the intelligent terminal or a remote controller of the intelligent terminal, is more convenient and faster, and is beneficial to a hearing-impaired user to use the intelligent terminal.
In a first aspect, the present invention provides an intelligent terminal, including: and the display is used for displaying the interface. A processor configured to:
and acquiring the video stream acquired by the image acquisition device. The method comprises the steps of identifying N frames of images in a video stream, and determining position variation and/or angle variation of M key pixel points in a hand image area in the N frames of images. When the position variation and/or the angle variation of the M key pixel points meet a first set condition, text information corresponding to the N frames of images is determined, a first control signaling corresponding to the text information is generated, and the display is controlled to display the text information based on the first control signaling.
Illustratively, a video stream acquired by the image acquisition device is acquired, wherein the video stream comprises a plurality of frames of images, the processor further detects whether a wakeup gesture exists in the identification image, and further positions a human face through the wakeup gesture, so as to lock the human hand of the user. The processor of the intelligent terminal identifies the gesture of the user in the subsequent image, automatically distinguishes static and dynamic gestures, inputs the gestures into different identification models for gesture identification and outputs a control command, and the display displays the corresponding image or characters according to the control command.
In some embodiments, the first setting condition includes that the variation of the positions of the M key pixels is greater than or equal to a first threshold, and/or the variation of the angles of the M key pixels is greater than or equal to a second threshold. Whether the gesture in the image changes is determined by the position variation quantity and the angle variation quantity of M key pixel points in the two images, and then whether the gesture is a dynamic gesture or a static gesture is effectively distinguished, so that a corresponding control command is determined according to a corresponding gesture recognition model.
In some embodiments, the processor is further configured to: when the position variation and/or the angle variation of the M key pixel points meet a second setting condition, determining a second control signaling corresponding to the gesture image in the N frames of images, wherein the second setting condition comprises that the position variation of the M key pixel points is smaller than a first threshold value, and/or the angle variation of the M key pixel points is smaller than a second threshold value; and sending second control signaling to the display.
In some embodiments, before determining the amount of change in position and/or the amount of change in angle of the M key pixels of the hand image region in the N-frame image, the processor is further configured to: and determining that an image corresponding to the set wake-up gesture image exists in the video stream.
Exemplarily, the embodiment of the application effectively prevents the false triggering during the control of the television by presetting the wake-up gesture which is not easy to be triggered by mistake, and once the wake-up gesture is identified, the gesture operations corresponding to the user within a period of time are all effective gesture operations, and other people can not perform operation control temporarily. Optionally. When the user does not output an effective gesture within a set time, or the user actively executes a preset terminating gesture, other people can wake up the intelligent terminal through a wake-up gesture and then perform operation control. By setting the awakening gesture, the trouble caused by mistakenly triggering the television by the user is effectively avoided, and the scheme is more efficient to implement.
In some embodiments, the processor is further configured to: determining face feature information in an image corresponding to a set awakening gesture image; identifying N frames of images in a video stream, and determining the position variation and/or the angle variation of M key pixel points in a hand image area in the N frames of images, wherein the method comprises the following steps: the method comprises the steps of identifying N frames of images including face feature information in a video stream, and determining position variation and/or angle variation of M key pixel points in a hand image area in the N frames of images.
By adopting the face feature recognition technology, the embodiment of the application avoids the situation that the face feature recognition technology cannot recognize even registered members when the user is a side face or a back-to-television, and further prevents the user from being triggered by mistake. In the embodiment of the application, when the human face is gazing at the intelligent terminal (the image acquisition device can be suspended right above the television), the gesture operation performed at the moment is an effective gesture, the characteristics of the complete human face can be identified at the moment, the tracking of the human hand of the corresponding user is realized through the characteristics of the tracked human face, and the problem of gesture mis-tracking is solved. Optionally, a human hand detection area can be set according to the size of the human face pixel as a reference, human hand detection and recognition are carried out, and the problem that the human hand is filtered due to the fact that the hand is too small when the human hand is tracked in the image can be effectively solved.
In certain embodiments, the processor is configured to perform in particular: when a plurality of images corresponding to the set awakening gesture image exist in the video stream, calculating the similarity between the plurality of images and the set awakening gesture image; and determining a first image with the maximum similarity from the plurality of images, and determining the face feature information in the first image. The first user is determined from the multiple possible users through the similarity to control the television, the problem that the control command of the user cannot be accurately identified due to the fact that the multiple users control the television at the same time is solved, the gesture of the first user is only identified, the beneficial effect of accurately identifying the control command of the user is achieved, and the television is efficiently controlled.
In certain embodiments, the processor is configured to perform in particular:
aiming at a first pixel point, the first pixel point is any one of M key pixel points, and the following processing is executed: determining the coordinate of a first pixel point of a t-th frame image in the N frame images, and determining the Euclidean distance between the coordinate of the first pixel point in the t-a frame image as a position variation, wherein the position variation satisfies the following formula:
Figure BDA0002459149600000031
wherein,LiIt refers to the amount of change in position,
Figure BDA0002459149600000032
is the horizontal and vertical coordinates of the first pixel point of the t frame image,
Figure BDA0002459149600000033
Figure BDA0002459149600000034
the horizontal and vertical coordinates of a first pixel point of the t-a frame image are obtained;
and/or, aiming at U pixel points in the M key pixel points, executing the following processing: determining a first angle between U pixel points in the t-th frame image, determining a second angle between U pixel points in the t-a frame image, and determining the absolute value of the difference between the second angle and the first angle as an angle variation. According to the formula, the moving distance or the angle change of each key point can be accurately calculated, and when the moving distance or the angle change of part of the key points exceeds the respective set threshold (or both exceed the respective set thresholds), the corresponding gesture is further judged to be a dynamic gesture, so that the effect of accurately distinguishing dynamic and static gestures is achieved.
In some embodiments, the smart terminal may be a smart television.
In a second aspect, the present invention provides a human-computer interaction method, including: acquiring a video stream acquired by an image acquisition device, identifying N frames of images in the video stream, and determining position variation and/or angle variation of M key pixel points in a hand image area in the N frames of images; when the position variation and/or the angle variation of the M key pixel points meet a first set condition, text information corresponding to the N frames of images is determined, wherein M, N and K are positive integers, a first control signaling corresponding to the text information is generated, and the display is controlled to display the text information based on the first control signaling.
The first setting condition includes that the position variation of the M key pixel points is greater than or equal to a first threshold, and/or the angle variation of the M key pixel points is greater than or equal to a second threshold.
When the position variation and/or the angle variation of the M key pixel points meet a second setting condition, determining a second control signaling corresponding to the gesture image in the N frames of images, wherein the second setting condition comprises that the position variation of the M key pixel points is smaller than a first threshold value, and/or the angle variation of the M key pixel points is smaller than a second threshold value;
and sending second control signaling to the display.
In some embodiments, before the intelligent terminal determines the position variation and/or the angle variation of M key pixel points in a hand image region in the N frames of images, it is determined that an image corresponding to the set wake-up gesture image exists in the video stream.
In some embodiments, the intelligent terminal determines face feature information in an image corresponding to a set wake-up gesture image; identifying N frames of images in a video stream, and determining the position variation and/or the angle variation of M key pixel points in a hand image area in the N frames of images, wherein the method comprises the following steps: the method comprises the steps of identifying N frames of images including face feature information in a video stream, and determining position variation and/or angle variation of M key pixel points in a hand image area in the N frames of images.
In some embodiments, when a plurality of images corresponding to the set wake-up gesture image exist in the video stream, calculating the similarity between the plurality of images and the set wake-up gesture image; and determining a first image with the maximum similarity from the plurality of images, and determining the face feature information in the first image.
In some embodiments, the intelligent terminal performs the following processing for a first pixel point, where the first pixel point is any one of M key pixel points: determining the coordinate of a first pixel point of a t-th frame image in the N frame images, and determining the Euclidean distance between the coordinate of the first pixel point in the t-a frame image as a position variation, wherein the position variation satisfies the following formula:
Figure BDA0002459149600000041
wherein L isiIt refers to the amount of change in position,
Figure BDA0002459149600000042
is the horizontal and vertical coordinates of the first pixel point of the t frame image,
Figure BDA0002459149600000043
Figure BDA0002459149600000044
the horizontal and vertical coordinates of a first pixel point of the t-a frame image are obtained;
and/or, aiming at U pixel points in the M key pixel points, executing the following processing: determining a first angle between U pixel points in the t-th frame image, determining a second angle between U pixel points in the t-a frame image, and determining the absolute value of the difference between the second angle and the first angle as an angle variation.
In some embodiments, the smart terminal may be a smart television.
The invention has the following beneficial effects: the intelligent terminal is controlled without a remote controller or voice, the man-machine interaction efficiency is improved, and the intelligent terminal can quickly recognize the input gesture of the user, so that the beneficial effect of controlling the terminal or inputting a command to the terminal is achieved.
An embodiment of the present invention provides a computing device, including a memory for storing program instructions; and the processor is used for calling the program instructions stored in the memory and executing the human-computer interaction method according to the obtained program.
An embodiment of the present invention provides a computer-readable non-volatile storage medium, which includes computer-readable instructions, and when the computer-readable instructions are read and executed by a computer, the computer is enabled to execute any one of the above human-computer interaction methods.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of an intelligent terminal according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a hardware configuration of a display according to an embodiment of the present invention;
FIG. 3 is a block diagram illustrating an architectural configuration of an operating system in a display memory according to an embodiment of the present invention;
fig. 4 is a schematic view of an application scenario provided in the embodiment of the present invention;
fig. 5 is a schematic flowchart of a human-computer interaction method according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an interactive interface according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a dynamic gesture according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a human hand key point setting provided by an embodiment of the present invention;
FIG. 9 is a schematic view of an angle of a key point according to an embodiment of the present invention;
FIG. 10 is a schematic diagram illustrating movement of key points in a dynamic gesture according to an embodiment of the present invention;
FIG. 11 is a schematic illustration of a static gesture according to an embodiment of the present invention;
FIG. 12 is a schematic diagram illustrating a hand detection area determination according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of a computer according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, the present invention is further described with reference to the accompanying drawings and examples. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, a schematic view of an application scenario of an intelligent terminal provided in an embodiment of the present invention is shown. As shown in fig. 1, an image capturing apparatus 10 and a smart terminal 20. The image acquisition device 10 and the intelligent terminal 20 can communicate in a wired or wireless mode. It should be noted that the image capturing apparatus 10 may be an external device of the intelligent terminal 20, or may be a partial component of the intelligent terminal 20, such as a camera installed on the intelligent terminal 20.
The image acquisition device 10 is configured to acquire an image of a surrounding environment of the intelligent terminal and generate a video stream, and the image acquisition device 10 may be a camera, a scanner, or other equipment with a camera function. The image capturing device 10 may also be an intelligent terminal, such as a mobile phone, a tablet computer, a notebook computer, a computer, or the like. For example, the mobile phone captures an image of an environment around the smart terminal 20, and the mobile phone transmits the image to the smart terminal 20 in a wireless or wired manner, where the smart terminal 20 may be a smart television.
The intelligent terminal 20 may provide a network tv function of a broadcast receiving function and a computer support function. Illustratively, the smart terminal may be a digital television, a web television, an Internet Protocol Television (IPTV), or the like.
The smart terminal 20 also performs data communication with the server 30 through various communication means. The intelligent terminal 20 may be allowed to make communication connections through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 30 may provide various contents and interactions to the smart terminal 20. For example, the smart terminal 20 may send and receive information such as: receiving Electronic Program Guide (EPG) data, receiving software program updates, or accessing a remotely stored digital media library. The servers 30 may be a group or groups of servers, and may be one or more types of servers. Other web service content such as video on demand and advertising services are provided through the server 30.
In the embodiment of the present application, the intelligent terminal 20 includes: a processor 100 and a display 200.
The display 200 may be a liquid crystal display, an organic light emitting display, a projection device, among others. The specific display type, size, resolution, etc. are not limited.
A hardware configuration block diagram of the display 200 is exemplarily shown in fig. 2. As shown in fig. 2, the display 200 may include a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a memory 260, a user interface 265, a video processor 270, a display 275, an audio processor 280, an audio output interface 285, and a power supply 290.
The tuner demodulator 210 receives broadcast television signals by wire or wirelessly, and may perform modulation and demodulation processes such as amplification, mixing, and resonance, and is configured to demodulate audio/video signals carried in a frequency of a television channel selected by a user and additional information (e.g., Electrical Program Guide (EPG) data) from a plurality of wireless or wire broadcast television signals.
The tuner demodulator 210 is responsive to the user selected frequency of the television channel and the television signal carried by the frequency, as selected by the user and controlled by the controller 250.
The tuner demodulator 210 may receive signals according to different broadcasting systems of the smart device signals, such as: terrestrial broadcasting, cable broadcasting, satellite broadcasting, internet broadcasting, or the like; and according to different modulation types, a digital modulation mode or an analog modulation mode can be adopted; and can demodulate the analog signal and the digital signal according to the different kinds of the received television signals.
In other exemplary embodiments, the tuning demodulator 210 may also be in an external device, such as an external set-top box. In this way, the set-top box outputs the signal after modulation and demodulation, and inputs the signal into the display 200 through the external device interface 240.
The communicator 220 is a component for communicating with an external device or an external server according to various communication protocol types. For example, the display 200 may transmit content data to an external device connected via the communicator 220, or browse and download content data from an external device connected via the communicator 220. The communicator 220 may include a network communication protocol module or a near field communication protocol module, such as a WIFI module 221, a bluetooth communication protocol module 222, and a wired ethernet communication protocol module 223, so that the communicator 220 may receive a control signal of the processor 100 according to the control of the controller 250 and implement the control signal as a WIFI signal, a bluetooth signal, a radio frequency signal, and the like.
The detector 230 is a component of the display 200 for collecting signals of an external environment or interaction with the outside. The detector 230 may include a sound collector 231, such as a microphone, which may be used to receive a user's sound, such as a voice signal of a control instruction of the user to control the display 200; alternatively, ambient sounds may be collected that identify the type of ambient scene, enabling the display 200 to adapt to ambient noise. The detector 230 may include an image collector 232 for collecting images of the external environmental scene.
In some other exemplary embodiments, the detector 230, which may also be the image capturing device 10, such as a camera, a video camera, etc., may be used to capture the external environment scene to adaptively change the display parameters of the display 200; and the function of collecting the attribute of the user or interacting gestures with the user so as to realize the interaction between the display and the user.
In some other exemplary embodiments, the detector 230 may further include a light receiver for collecting the intensity of the ambient light to adapt to the display parameter variation of the display 200.
In some other exemplary embodiments, the detector 230 may further include a temperature sensor, such as by sensing an ambient temperature, and the display 200 may adaptively adjust a displayed color temperature of the image. For example, when the temperature is higher, the display 200 can be adjusted to display a cool color tone; when the temperature is lower, the display 200 can be adjusted to display a warmer color temperature of the image.
The external device interface 240 is a component for providing the controller 250 to control data transmission between the display 200 and an external device. The external device interface 240 may be connected to an external apparatus such as a set-top box, a game device, a notebook computer, etc. in a wired/wireless manner, and may receive data such as a video signal (e.g., moving image), an audio signal (e.g., music), additional information (e.g., EPG), etc. of the external apparatus.
The external device interface 240 may include: a High Definition Multimedia Interface (HDMI) terminal 241, a Composite Video Blanking Sync (CVBS) terminal 242, an analog or digital Component terminal 243, a Universal Serial Bus (USB) terminal 244, a Component terminal (not shown), a red, green, blue (RGB) terminal (not shown), and the like.
The controller 250 controls the operation of the display 200 and responds to the user's operation by running various software control programs (such as an operating system and various application programs) stored on the memory 260.
As shown in fig. 2, the controller 250 includes a Random Access Memory (RAM)251, a Read Only Memory (ROM)252, a graphics processor 253, a CPU processor 254, a communication interface 255, and a communication bus 256. The RAM251, the ROM252, the graphic processor 253, the CPU processor 254, and the communication interface 255 are connected by a communication bus 256.
The ROM252 stores various system boot instructions. If the display 200 is powered on upon receiving the power-on signal, the CPU processor 254 executes the system boot instruction in the ROM252 and copies the operating system stored in the memory 260 to the RAM251 to start running the boot operating system. After the start of the operating system is completed, the CPU processor 254 copies the various application programs in the memory 260 to the RAM251 and then starts running and starting the various application programs.
And a graphic processor 253 for generating various graphic objects such as icons, operation menus, and user input instruction display graphics, etc. The graphic processor 253 may include an operator for performing an operation by receiving various interactive instructions input by a user, and further displaying various objects according to display attributes; and a renderer for generating various objects based on the operator and displaying the rendered result on the display 275.
A CPU processor 254 for executing operating system and application program instructions stored in memory 260. And according to the received user input instruction, processing of various application programs, data and contents is executed so as to finally display and play various audio-video contents.
In some example embodiments, the CPU processor 254 may comprise a plurality of processors. The plurality of processors may include one main processor and a plurality of or one sub-processor. A main processor for performing some initialization operations of the display 200 in the display preload mode and/or operations for displaying a screen in the normal mode. A plurality of or a sub-processor for performing an operation in a display standby mode or the like.
The communication interface 255 may include a first interface to an nth interface. These interfaces may be network interfaces that are connected to external devices via a network.
The controller 250 may control the overall operation of the display 200. For example: in response to receiving a user input command for selecting a GUI object displayed on the display screen 275, the controller 250 may perform an operation related to the object selected by the user input command.
Where the object may be any one of the selectable objects, such as a hyperlink or an icon. The operation related to the selected object is, for example, an operation of displaying a link to a hyperlink page, document, image, or the like, or an operation of executing a program corresponding to the object. The user input command for selecting the GUI object may be a command input through various input devices (e.g., a mouse, a keyboard, a touch pad, etc.) connected to the display 200 or a voice command corresponding to a voice spoken by the user.
A memory 260 for storing various types of data, software programs, or applications that drive and control the operation of the display 200. The memory 260 may include volatile and/or nonvolatile memory. And the term "memory" includes the memory 260, the RAM251 and the ROM252 of the controller 250, or a memory card in the display 200.
In some embodiments, the memory 260 is specifically used for storing an operating program for driving the controller 250 of the display 200; storing various application programs built in the display 200 and downloaded by a user from an external device; data such as visual effect images for configuring various GUIs provided by the display 275, various objects related to the GUIs, and selectors for selecting GUI objects are stored.
In some embodiments, memory 260 is specifically configured to store drivers and associated data for tuner demodulator 210, communicator 220, detector 230, external device interface 240, video processor 270, display 275, audio processor 280, etc., such as external data (e.g., audio-visual data) received from the external device interface or user data (e.g., key information, voice information, touch information, etc.) received by the user interface.
In some embodiments, memory 260 specifically stores software and/or programs representing an Operating System (OS), which may include, for example: a kernel, middleware, an Application Programming Interface (API), and/or an application program. Illustratively, the kernel may control or manage system resources, as well as functions implemented by other programs (e.g., the middleware, APIs, or applications); at the same time, the kernel may provide an interface to allow middleware, APIs, or applications to access the controller to enable control or management of system resources.
In FIG. 2, user interface 265 may be used to receive various user interactions. Specifically, it is used to transmit an input signal of a user to the controller 250 or transmit an output signal from the controller 250 to the user. For example, the remote controller may transmit an input signal, such as a power switch signal, a channel selection signal, a volume adjustment signal, etc., input by the user to the user interface 265, and then the input signal is transferred to the controller 250 through the user interface 265; alternatively, the remote controller may receive an output signal such as audio, video, or data output from the user interface 265 via the controller 250, and display the received output signal or output the received output signal in audio or vibration form.
In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on the display screen 275, and the user interface 265 receives the user input commands through the GUI. Specifically, the user interface 265 may receive user input commands for controlling the position of a selector in the GUI to select different objects or items. Among these, "user interfaces" are media interfaces for interaction and information exchange between an application or operating system and a user, which enable the conversion between an internal form of information and a form acceptable to the user. A commonly used presentation form of the user interface is a Graphical User Interface (GUI), which refers to a user interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, window, control, etc. displayed in the display of the electronic device, where the control may include a visual interface element such as an icon, control, menu, tab, text box, dialog box, status bar, channel bar, Widget, etc.
Alternatively, the user may input a user command by inputting a specific sound or gesture, and the user interface 265 receives the user input command by recognizing the sound or gesture through the sensor.
The video processor 270 is configured to receive an external video signal, and perform video data processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to a standard codec protocol of the input signal, so as to obtain a video signal that is directly displayed or played on the display screen 275.
Illustratively, the video processor 270 includes a demultiplexing module, a video decoding module, an image synthesizing module, a frame rate conversion module, a display formatting module, and the like.
The demultiplexing module is configured to demultiplex an input audio/video data stream, where, for example, an input MPEG-2 stream (based on a compression standard of a digital storage media moving image and voice), the demultiplexing module demultiplexes the input audio/video data stream into a video signal and an audio signal.
And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like.
And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display.
The frame rate conversion module is configured to convert a frame rate of an input video, for example, convert a frame rate of an input 60Hz video into a frame rate of 120Hz or 240Hz, where a common format is implemented by using, for example, an interpolation frame method.
And a display formatting module for converting the signal output by the frame rate conversion module into a signal conforming to a display format of a display, such as converting the format of the signal output by the frame rate conversion module to output an RGB data signal.
And a display screen 275 for receiving the image signal from the video processor 270 and displaying the video content, the image and the menu manipulation interface. The display video content may be from the video content in the broadcast signal received by the tuner-demodulator 210, or from the video content input by the communicator 220 or the external device interface 240. A display screen 275, while displaying a user manipulation interface UI generated in the display 200 and used to control the display 200.
And, the display screen 275 may include a display component for presenting a picture and a driving component for driving the display of an image. Alternatively, in case the display 200 is a projection display, it may also comprise a projection device and a projection screen.
The audio processor 280 is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform audio data processing such as noise reduction, digital-to-analog conversion, and amplification processing to obtain an audio signal that can be played by the speaker 286.
Illustratively, audio processor 280 may support various audio formats. Such as MPEG-2, MPEG-4, Advanced Audio Coding (AAC), high efficiency AAC (HE-AAC), and the like.
The audio output interface 285 is used for receiving an audio signal output by the audio processor 280 under the control of the controller 250, and the audio output interface 285 may include a speaker 286 or an external sound output terminal 287, such as an earphone output terminal, for outputting to a generating device of an external device.
In other exemplary embodiments, video processor 270 may comprise one or more chips. Audio processor 280 may also comprise one or more chips.
And, in other exemplary embodiments, the video processor 270 and the audio processor 280 may be separate chips or may be integrated with the controller 250 in one or more chips.
And a power supply 290 for supplying power to the display 200 from the power input from the external power source under the control of the controller 250. The power supply 290 may be a built-in power supply circuit installed inside the display 200 or may be a power supply installed outside the display 200.
A block diagram of the architectural configuration of the operating system in the memory of the display device 200 is illustrated in fig. 3. The operating system architecture comprises an application layer, a middleware layer and a kernel layer from top to bottom.
The application layer, the application programs built in the system and the non-system-level application programs belong to the application layer. Is responsible for direct interaction with the user. The application layer may include a plurality of applications such as a setup application, a post application, a media center application, and the like. These applications may be implemented as Web applications that execute based on a WebKit engine, and in particular may be developed and executed based on HTML5, Cascading Style Sheets (CSS), and JavaScript.
Here, HTML, which is called HyperText Markup Language (HyperText Markup Language), is a standard Markup Language for creating web pages, and describes the web pages by Markup tags, where the HTML tags are used to describe characters, graphics, animation, sound, tables, links, etc., and a browser reads an HTML document, interprets the content of the tags in the document, and displays the content in the form of web pages.
CSS, known as Cascading Style Sheets (Cascading Style Sheets), is a computer language used to represent the Style of HTML documents, and may be used to define Style structures, such as fonts, colors, locations, etc. The CSS style can be directly stored in the HTML webpage or a separate style file, so that the style in the webpage can be controlled.
JavaScript, a language applied to Web page programming, can be inserted into an HTML page and interpreted and executed by a browser. The interaction logic of the Web application is realized by JavaScript. The JavaScript can package a JavaScript extension interface through a browser, realize the communication with the kernel layer,
the middleware layer may provide some standardized interfaces to support the operation of various environments and systems. For example, the middleware layer may be implemented as multimedia and hypermedia information coding experts group (MHEG) middleware related to data broadcasting, DLNA middleware which is middleware related to communication with an external device, middleware which provides a browser environment in which each application program in the display device operates, and the like.
The kernel layer provides core system services, such as: file management, memory management, process management, network management, system security authority management and the like. The kernel layer may be implemented as a kernel based on various operating systems, for example, a kernel based on the Linux operating system.
The kernel layer also provides communication between system software and hardware, and provides device driver services for various hardware, such as: provide display driver for the display, provide camera driver for the camera, provide button driver for the remote controller, provide wiFi driver for the WIFI module, provide audio driver for audio output interface, provide power management drive for Power Management (PM) module etc..
Referring to fig. 4, a schematic view of an application scenario provided by an embodiment of the present invention is shown, in which an image capturing device 10 captures an image located in an image capturing area (a front area of a display 200), if a user issues a gesture control instruction in the image capturing area, a processor 100 may obtain a video stream captured by the image capturing device 10, and the processor 100 identifies an image in the video stream, and determines at least one of a position variation and an angle variation of a plurality of key pixel points in a hand image area in the image.
In a possible case, when it is determined that at least one of the position variation and the angle variation satisfies a first set condition, the gesture of the user is determined to be a dynamic gesture, text information corresponding to the dynamic gesture is determined, and the display generates a first control signaling corresponding to the text information and controls the display to display the text information based on the first control signaling.
In another possible case, when it is determined that at least one of the position variation and the angle variation satisfies the second setting condition, the gesture of the user is determined to be a static gesture, a second control signaling corresponding to the static gesture is determined, and the second control signaling is sent to the display.
According to the method, the man-machine interaction with the intelligent terminal can be achieved through gestures under the condition that a user does not touch the intelligent terminal or a remote controller of the intelligent terminal is not touched, the method is more convenient and faster, and particularly for users with hearing and speaking obstacles, the method is beneficial to the users to use the gestures to input control instructions, and is beneficial to the users with hearing and speaking obstacles to use the intelligent terminal.
It should be noted that the term "gesture" used in the embodiments of the present application refers to a user behavior that is used by a user to express an intended idea, action, purpose, or result through a change of hand shape or a motion of a hand.
Fig. 5 is a schematic flow chart of a human-computer interaction method according to the present application. For the actual control process, the processor 100 is further configured to perform the following program steps:
in step 501, the processor 100 obtains a video stream captured by the image capturing apparatus 10.
The image capturing device 10 may be a camera, the smart terminal may be a smart tv, and a camera is mounted above the display 200 of the smart tv, and captures an image of a front area of the display 200 of the smart tv in real time to generate a video stream. The camera may actively send the video stream to the processor 100 in real time in a wired or wireless manner, or the processor 100 may periodically obtain the video stream from the camera.
The image capturing device 10 captures a user, determines N frames of images, and generates a video stream. The intelligent terminal 10 may be a smart tv, and when the display 200 displays an interactive interface for changing channels as shown in fig. 6, the user may output a corresponding command to the smart tv by using a gesture, for example, when the user makes a dynamic gesture against the smart tv as shown in fig. 7, the processor 100 is configured to perform the following steps.
In step 502, the processor 100 identifies N frames of images in the video stream, and determines at least one of a position variation and an angle variation of M key pixels in the hand detection image region.
For example, the processor 100 may obtain coordinate information of each pixel point in the image through human body posture recognition openposition. Further, according to coordinate information of the pixel point in the t-th frame image and the t-a frame image, the Euclidean distance between the two coordinates is further calculated, and therefore the position variation of the pixel point is determined.
Exemplarily, the processor 100 may further calculate a difference between two angles according to at least three pixels of each pixel, such as the pixel 1, the pixel 2, the angle information ≧ 1t corresponding to the pixel 3 in the t-th frame, and the angle information ≧ 1t-a corresponding to the t-a frame image, so as to determine the angle variation Δ corresponding to the pixel 1, the pixel 2, and the pixel 31
The processor 100 may set each pixel point of the hand detection image region in advance to obtain a hand pixel point schematic diagram including 20 key pixel points as shown in fig. 8, and exemplarily, angle information corresponding to the pixel point 20, the pixel point 5, and the pixel point 17 is shown in fig. 9.
Illustratively, after the user makes the dynamic gesture shown in fig. 7 to the smart terminal 10, the processor 100 is configured to determine 2 pixel points shown in fig. 10 according to 20 key pixel points of the predetermined user hand image region shown in fig. 8, where the 2 pixel points include the pixel point 12 and the pixel point 8, which are respectively and correspondingly moved to the pixel point 12 'and the pixel point 8'. Processor 100 is further configured to calculate a position variation L of pixel 12', pixel 812、L812 ', 8' relative angular variation Δ12、Δ8At least one of (1).
In step 503, the processor 100 determines whether at least one of the position variation and the angle variation of the M key pixels meets a first setting condition. When satisfied, execute step 504, and when not, execute step 505.
The first setting condition includes that the position variation of the M key pixel points is greater than or equal to a first threshold, and/or the angle variation of the M key pixel points is greater than or equal to a second threshold.
In the first case, in step 503, the processor 100 determines whether the position variation of the M key pixels is greater than or equal to the first threshold. In the second case, in step 503, the processor 100 determines whether the angle variation of the M key pixels is greater than or equal to the second threshold. In the third case, in step 503, the processor 100 determines whether the position variation of the M key pixels is greater than or equal to the first threshold, and whether the angle variation of the key pixels is greater than or equal to the second threshold. Optionally, the processor 100 may also select a part of the pixel points randomly or according to a certain rule from among the M key pixel points to perform position variation/angle variation calculation, so as to perform dynamic and static judgment on the gesture.
In step 504, the processor 100 determines text information and a first control signaling corresponding to the N frames of images.
For example, after it is determined in step 503 that the first setting condition is satisfied, the corresponding gesture is determined as a dynamic gesture and is input into the dynamic gesture recognition model, for example, the text information corresponding to the dynamic gesture is "20" as shown in fig. 7, and the corresponding first control signaling may be to change the channel of the smart television to "20".
Step 505, the processor 100 further determines whether at least one of the position variation and the angle variation of the M key pixels meets a second setting condition, and if so, executes step 506; when not, step 507 is executed.
The second setting condition includes that the position variation of the M key pixel points is smaller than a first threshold, and/or the angle variation of the M key pixel points is smaller than a second threshold. In the first case, step 505 is executed by the processor 100 to determine whether the position variation of the M key pixels is smaller than a first threshold. In the second case, in step 505, the processor 100 determines whether the angle variation of the M key pixels is smaller than a second threshold; in the third case, in step 505, the processor 100 determines whether the position variation of the M key pixels is smaller than the first threshold and whether the angle variation of the key pixels is smaller than the second threshold.
In step 506, the processor 100 determines the text information and the second control signaling corresponding to the N frames of images.
Illustratively, outputting the corresponding gesture is judged as a static gesture and inputting the static gesture into the static gesture recognition model, for example, as shown in fig. 11, a text message corresponding to a static gesture first from left to right in the static gesture is "up", and a corresponding first control signaling may be to switch the smart television to a last channel.
At step 507, the processor 100 outputs a third control signaling, illustratively, the display 200 displays "this gesture cannot be recognized, please re-input".
In conjunction with the above scenario, in step 504, for example, the processor 100 sends a first control signaling corresponding to the text information "20" corresponding to the dynamic gesture in fig. 7 to the display 200, and the display 200 fills in "20" in an input box of the interactive interface, and if the user does not input any other gesture or inputs an "ok" gesture within a set time, for example, within 3 seconds, the smart tv may change the channel to 20. Therefore, a user inputs text information into the smart television by making gestures to complete man-machine interaction, and efficient man-machine interaction independent of voice or a remote controller is realized. In the embodiment of the present application, when the user inputs a user command by inputting a gesture, the intelligent terminal 10 recognizes the gesture of the user through the processor 100, so as to receive the user input command.
The "interactive interface" is a medium interface for performing interaction and information exchange between the intelligent terminal 10 and the user, and realizes conversion between an internal form of information and a form acceptable to the user. A commonly used presentation form of the interactive interface is a Graphical User Interface (GUI), which refers to a user interface related to the operation of the intelligent terminal 10 and displayed in a graphical manner. It may be an interface element such as an icon, window, control, etc. displayed in the display 200, where the control may include a visual interface element such as an icon, control, menu, tab, text box, dialog box, status bar, channel bar, Widget, etc.
Further, in step 502, the method specifically includes: the processor 100 executes the following processing for a first pixel point, where the first pixel point is any one of the M key pixel points: determining the coordinate of a first pixel point of a t-th frame image in the N frame images, and determining the Euclidean distance between the coordinate of the first pixel point in the t-a frame image as a position variation, wherein the position variation satisfies the following formula:
Figure BDA0002459149600000141
wherein L isiIt refers to the amount of change in position,
Figure BDA0002459149600000151
is the horizontal and vertical coordinates of the first pixel point of the t frame image,
Figure BDA0002459149600000152
Figure BDA0002459149600000153
the horizontal and vertical coordinates of a first pixel point of the t-a frame image are obtained;
and/or, aiming at U pixel points in the M key pixel points, executing the following processing: determining a first angle between U pixel points in the t-th frame image, determining a second angle between U pixel points in the t-a frame image, and determining the absolute value of the difference between the second angle and the first angle as an angle variation.
Illustratively, the processor 100 is configured to be based on a formula
Figure BDA0002459149600000154
The amount of change L in the positions of points 12 and 8 in FIG. 10 is determined12、L8And/or calculating the angle variation quantity related to the key pixel point. The intelligent terminal 10 adopts the technical scheme to automatically recognize and distinguish the static gesture and the dynamic sign language, so that the static and dynamic gestures made by the user can be effectively and automatically distinguished, and are automatically sent to the corresponding recognition model to be recognized, and the high efficiency of human-computer interaction is realized.
Optionally, in order to avoid confusion of the execution sequence between the gesture commands, the processing method is executed according to the sequence of the corresponding image acquisition.
For example, in step 505, when the user makes a static gesture as any one shown in fig. 11, the intelligent terminal 10 may determine that the gesture is the static gesture according to that a key pixel point corresponding to the gesture meets at least one of a position variation smaller than a preset first threshold and an angle variation respectively preset second thresholds, and then input the relevant image into the static gesture recognition model and determine that the corresponding second control instruction corresponds to the static gesture. Optionally, the intelligent terminal 10 predefines each operation gesture in fig. 11, and constructs a whole set of operation logic, where the operations may include the following: up, down, left, right, ok, and cancel, as well as an end control command.
In one possible embodiment, static gestures may be responsible for control commands and dynamic gestures may be responsible for control text input. The smart terminal 10 can simultaneously perform the functions of text input, deletion, transmission, determination, etc. according to sign language entry.
In one possible embodiment, the intelligent terminal 10 numbers each static gesture, the serial number of no gesture is 0, and the serial numbers of up, down, left, right, confirm, cancel, and end control are 1 to 7 respectively.
Further, when the intelligent terminal 10 is started, the gesture number is set to 0; when other gestures are recognized according to the hand static/dynamic gesture recognition model, the gesture serial number is converted into a corresponding serial number only when a plurality of continuous pictures are recognized as the same gesture. Alternatively, to avoid the problem of outputting multiple commands for one gesture, the processor 100 is configured to execute the command of the latest gesture once only when a change in gesture is recognized.
Optionally, the smart terminal 10 may also collect various sign language data sets, and collect sign language data related to different expression habits and speeds. Furthermore, a multi-modal model can be used algorithmically, and comprises a sign language sequence, a facial expression and a recognition model related to user voice, so that multi-modal data can be comprehensively utilized, complementation is realized, ambiguity is eliminated, and the recognition uncertainty is reduced. Meanwhile, because people in different places may have different expression modes for the same word and personal habits are added, even under the same standard, the action presentation may be different. The same action may have very different meanings in different contexts, and the same word may be expressed by using different actions in different contexts, which makes sign language identification difficult, so in the embodiment of the present application, the intelligent terminal 10 may further combine facial expressions and sounds to distinguish emotions to generate statements conforming to the situation, thereby improving the analysis capability of various scenes, better identifying the meaning of sign language, helping the user interact and control with the intelligent terminal 10, and enabling the deaf-mute to conveniently watch the intelligent television.
Further, prior to step 502, the processor 100 is configured to determine that an image corresponding to the set wake-up gesture image exists in the video stream.
According to the technical scheme, the intelligent terminal 10 is prevented from being triggered mistakenly (namely, the intelligent terminal 10 is operated due to an unintentional gesture) by adopting the wakeup gesture in the embodiment of the application. The gesture that is not easy to trigger by mistake can be used as the wake-up gesture, once the wake-up gesture is recognized, the gesture operations corresponding to the user within a period of time are all effective gesture operations, other people cannot operate temporarily, and when the user does not output an effective gesture within a period of time, or the user actively executes a terminating gesture, then other users can wake up through the wake-up gesture and then operate the intelligent terminal 10. The false triggering of the intelligent terminal 10 is effectively prevented, the human-computer interaction efficiency is improved, and the user experience is improved.
Further, in conjunction with the above description, the processor 100 is configured to determine facial feature information in an image corresponding to the set wake-up gesture image. Next, in step 502, the processor 100 is further configured to identify N frames of images in the video stream, where the N frames of images include facial feature information, and determine a position variation and/or an angle variation of M key pixel points of a hand image region in the N frames of images.
Optionally, when a plurality of images corresponding to the set wake-up gesture image exist in the video stream, the processor 100 calculates a similarity between the plurality of images and the set wake-up gesture image; and determining a first image with the maximum similarity from the plurality of images, and determining the face feature information in the first image.
In one possible embodiment, the intelligent terminal 10 may employ a face feature recognition technology, which is implemented when the face is a side face or the user faces away from the television, and the face feature recognition technology cannot recognize even registered members. Therefore, the face features of the user facing the intelligent terminal 10 are accurately recognized (the camera can be arranged right above the intelligent terminal 10), and the gesture operation of the user is performed at the moment and is an effective gesture. The method can also realize face recognition, face tracking and corresponding hand detection area tracking, and effectively solve the problem of gesture error tracking.
As shown in fig. 12, after the wake-up gesture is determined, the intelligent terminal 10 determines the detection area of the user according to the wake-up gesture, identifies the facial features in the detection area, and determines the hand detection area in the subsequent image by using the facial features of the user as a reference, so as to perform hand detection and identification of people, thereby accurately determining the hand detection area in the whole image, and avoiding the occurrence of a situation that the hand is too small and is filtered.
In this embodiment, the intelligent terminal 10 may determine the face position of the user according to the position of the wake-up gesture, because the user is used to fix the hand at the chest when expressing the sign language action. And enlarging the gesture box according to a certain proportion, and determining the detection area of the user, namely the search box comprising the face. Then, the intelligent terminal 10 identifies the face in the detection area of the user by using a convolutional neural network-based model, calculates the face characteristics, and realizes tracking of the face in subsequent images, so that the error tracking is effectively reduced under the condition that a plurality of persons appear in the multi-frame images of the video stream at the same time, and the human-computer interaction efficiency is improved.
Based on the same inventive concept, embodiments of the present invention also provide a computer-readable storage medium, which includes a program and when the program runs on a computer, the computer executes the above-mentioned human-computer interaction method.
Based on the same inventive concept, the embodiment of the present invention further provides a computer program product, which, when the program runs on a computer, causes the computer to execute the above human-computer interaction method.
Based on the same technical concept, an embodiment of the present invention provides a computer, as shown in fig. 13, including at least one processor 1301 and a memory 1302 connected to the at least one processor, where a specific connection medium between the processor 1301 and the memory 1302 is not limited in the embodiment of the present invention, and the processor 1301 and the memory 1302 are connected through a bus in fig. 13 as an example. The bus may be divided into an address bus, a data bus, a control bus, etc.
In an embodiment of the present invention, the memory 1302 stores instructions executable by the at least one processor 1301, and the at least one processor 1301 may execute the steps included in the processing method based on the distributed batch processing system by executing the instructions stored in the memory 1302.
The processor 1301 is a control center of a computer, and may connect various parts of the computer by using various interfaces and lines, and implement data processing by executing or executing instructions stored in the memory 1302 and calling data stored in the memory 1302. Optionally, the processor 1301 may include one or more processing units, and the processor 1301 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application program, and the like, and the modem processor mainly processes an instruction issued by an operation and maintenance worker. It is to be appreciated that the modem processor described above may not be integrated into processor 1301. In some embodiments, processor 1301 and memory 1302 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.
The processor 1301 may be a general-purpose processor, such as a Central Processing Unit (CPU), a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, configured to implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the disclosed method in connection with embodiments of the distributed batch processing system-based processing method may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules within the processor.
Memory 1302, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 1302 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 1302 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1302 in embodiments of the present invention may also be circuitry or any other device capable of performing a storage function to store program instructions and/or data.
It should be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
The embodiments provided in the present application are only a few examples of the concepts in the present application, and do not limit the scope of the present application. Any other embodiments extended according to the scheme of the present application without inventive efforts will be within the scope of protection of the present application for a person skilled in the art.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (7)

1. An intelligent terminal, comprising:
a display for displaying an interface;
a processor configured to:
acquiring a video stream acquired by an image acquisition device;
determining that an image corresponding to a set wake-up gesture image exists in the video stream; the set wake-up gesture image indicates a set wake-up gesture;
determining face feature information in an image corresponding to a set awakening gesture image; the face feature information indicates a user performing the set wake-up gesture;
based on the face feature information, identifying images including the face feature information in the video stream, and determining N frames of images including the face feature information;
for the N frames of images comprising the face feature information, determining hand image areas of the user in the N frames of images by taking the face feature information in the images as reference so as to obtain N hand image areas;
determining position variation and/or angle variation of M key pixel points in the hand image area aiming at any one hand image area in the N hand image areas; when the position variation and/or the angle variation of the M key pixels satisfy a first set condition, determining text information corresponding to the N frames of images including the face feature information, where M, N is a positive integer;
generating a first control signaling corresponding to the text information, and controlling the display to display the text information based on the first control signaling;
the first setting condition includes that the position variation of the M key pixels is greater than or equal to a first threshold, and/or the angle variation of the M key pixels is greater than or equal to a second threshold.
2. The intelligent terminal of claim 1, wherein the processor is further configured to:
when the position variation and/or the angle variation of the M key pixels meet a second setting condition, determining a second control signaling corresponding to the gesture image in the N-frame image, where the second setting condition includes that the position variation of the M key pixels is smaller than the first threshold, and/or the angle variation of the M key pixels is smaller than the second threshold;
sending the second control signaling to the display.
3. The intelligent terminal according to claim 1, wherein the processor is configured to specifically perform:
when a plurality of images corresponding to a set awakening gesture image exist in the video stream, calculating the similarity between the plurality of images and the set awakening gesture image;
and determining a first image with the maximum similarity from the plurality of images, and determining the face feature information in the first image.
4. The intelligent terminal according to claim 1, wherein the processor is configured to specifically perform:
aiming at a first pixel point, the first pixel point is any one of the M key pixel points, and the following processing is executed: determining the Euclidean distance between the coordinates of the first pixel point of the t-th frame image in the N frame image and the coordinates of the first pixel point in the t-a frame image as the position variation, wherein the position variation satisfies the following formula:
Figure DEST_PATH_IMAGE001
wherein,
Figure 828915DEST_PATH_IMAGE002
refers to the amount of change in position (a)
Figure DEST_PATH_IMAGE003
) For the horizontal and vertical coordinates of the first pixel point of the t-th frame image(s) ((
Figure 808372DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE005
) The horizontal and vertical coordinates of the first pixel point of the t-a frame image are obtained;
and/or, aiming at U pixel points in the M key pixel points, executing the following processing: determining a first angle between the U pixel points in the t-th frame image, determining a second angle between the U pixel points in the t-a frame image, and determining that the absolute value of the difference between the second angle and the first angle is the angle variation, wherein U, t and a are positive integers.
5. The intelligent terminal according to claim 1, wherein the intelligent terminal is an intelligent television.
6. A human-computer interaction method, comprising:
acquiring a video stream acquired by an image acquisition device;
determining that an image corresponding to a set wake-up gesture image exists in the video stream; the set wake-up gesture image indicates a set wake-up gesture;
determining face feature information in an image corresponding to a set awakening gesture image; the face feature information indicates a user performing the set wake-up gesture;
based on the face feature information, identifying images including the face feature information in the video stream, and determining N frames of images including the face feature information;
for the N frames of images comprising the face feature information, determining hand image areas of the user in the N frames of images by taking the face feature information in the images as reference so as to obtain N hand image areas;
determining position variation and/or angle variation of M key pixel points in the hand image area aiming at any one hand image area in the N hand image areas; when the position variation and/or the angle variation of the M key pixels satisfy a first set condition, determining text information corresponding to the N frames of images including the face feature information, where M, N is a positive integer;
generating a first control signaling corresponding to the text information, and controlling a display to display the text information based on the first control signaling;
the first setting condition includes that the position variation of the M key pixels is greater than or equal to a first threshold, and/or the angle variation of the M key pixels is greater than or equal to a second threshold.
7. The method of claim 6, further comprising:
when the position variation and/or the angle variation of the M key pixels meet a second setting condition, determining a second control signaling corresponding to the gesture image in the N-frame image, where the second setting condition includes that the position variation of the M key pixels is smaller than the first threshold, and/or the angle variation of the M key pixels is smaller than the second threshold;
sending the second control signaling to the display.
CN202010315144.1A 2020-04-21 2020-04-21 Intelligent terminal and man-machine interaction method Active CN111556350B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010315144.1A CN111556350B (en) 2020-04-21 2020-04-21 Intelligent terminal and man-machine interaction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010315144.1A CN111556350B (en) 2020-04-21 2020-04-21 Intelligent terminal and man-machine interaction method

Publications (2)

Publication Number Publication Date
CN111556350A CN111556350A (en) 2020-08-18
CN111556350B true CN111556350B (en) 2022-03-25

Family

ID=72007558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010315144.1A Active CN111556350B (en) 2020-04-21 2020-04-21 Intelligent terminal and man-machine interaction method

Country Status (1)

Country Link
CN (1) CN111556350B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686169A (en) * 2020-12-31 2021-04-20 深圳市火乐科技发展有限公司 Gesture recognition control method and device, electronic equipment and storage medium
CN113842209B (en) * 2021-08-24 2024-02-09 深圳市德力凯医疗设备股份有限公司 Ultrasonic device control method, ultrasonic device and computer readable storage medium
CN113778217B (en) * 2021-09-13 2024-07-23 海信视像科技股份有限公司 Display device and display device control method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331158A (en) * 2014-10-29 2015-02-04 山东大学 Gesture-controlled human-computer interaction method and device
CN109190461A (en) * 2018-07-23 2019-01-11 中南民族大学 A kind of dynamic gesture identification method and system based on gesture key point

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160078289A1 (en) * 2014-09-16 2016-03-17 Foundation for Research and Technology - Hellas (FORTH) (acting through its Institute of Computer Gesture Recognition Apparatuses, Methods and Systems for Human-Machine Interaction
CN104267819B (en) * 2014-10-09 2017-07-14 苏州触达信息技术有限公司 Can gesture wake up electronic equipment and electronic equipment gesture awakening method
CN105302301B (en) * 2015-10-15 2018-02-13 广东欧珀移动通信有限公司 A kind of awakening method of mobile terminal, device and mobile terminal
CN105700372A (en) * 2016-03-11 2016-06-22 珠海格力电器股份有限公司 Intelligent device and control method thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331158A (en) * 2014-10-29 2015-02-04 山东大学 Gesture-controlled human-computer interaction method and device
CN109190461A (en) * 2018-07-23 2019-01-11 中南民族大学 A kind of dynamic gesture identification method and system based on gesture key point

Also Published As

Publication number Publication date
CN111556350A (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN111200746B (en) Method for awakening display equipment in standby state and display equipment
CN111556350B (en) Intelligent terminal and man-machine interaction method
CN111343512B (en) Information acquisition method, display device and server
CN111818378B (en) Display device and person identification display method
CN112565839A (en) Display method and display device of screen projection image
CN113556593B (en) Display device and screen projection method
CN111045557A (en) Moving method of focus object and display device
CN112055256A (en) Image processing method and display device for panoramic image
CN112188249A (en) Electronic specification-based playing method and display device
CN111954059A (en) Screen saver display method and display device
CN111464869B (en) Motion position detection method, screen brightness adjustment method and intelligent device
CN112055245B (en) Color subtitle realization method and display device
CN111541924B (en) Display apparatus and display method
CN115836528A (en) Display device and screen projection method
CN113489938A (en) Virtual conference control method, intelligent device and terminal device
CN113115092A (en) Display device and detail page display method
CN113556590B (en) Method for detecting effective resolution of screen-projected video stream and display equipment
CN112004127B (en) Signal state display method and display equipment
CN113115093B (en) Display device and detail page display method
CN111526414B (en) Subtitle display method and display equipment
CN113015006A (en) Display apparatus and display method
CN111931692A (en) Display device and image recognition method
CN112565915A (en) Display apparatus and display method
CN113727162A (en) Display device, server and character introduction display method
CN113115081A (en) Display device, server and media asset recommendation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant