JP5704909B2

JP5704909B2 - Attention area detection method, attention area detection apparatus, and program

Info

Publication number: JP5704909B2
Application number: JP2010273899A
Authority: JP
Inventors: 正雄山中; 優和真継
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2010-12-08
Filing date: 2010-12-08
Publication date: 2015-04-22
Anticipated expiration: 2030-12-08
Also published as: JP2012123631A

Description

本発明は、画像中の注目領域検出に関し、特に注目領域検出における視覚的顕著度の記述方法ならびに当該視覚的顕著度を用いた注目領域検出に関するものである。 The present invention relates to attention area detection in an image, and more particularly to a method of describing visual saliency in attention area detection and attention area detection using the visual saliency.

入力画像中から人間にとって意味のあるひと固まりの領域を検出する方法として、例えば、特許文献１では、以下のような方法が提案されている。すなわち、入力画像中から基礎特徴画像を複数種類抽出し、その多重解像度表現である多重解像度画像を抽出する。また、多重解像度画像の各種類について解像度の異なる画像の間の差分である解像度差分画像を複数抽出し、解像度差分画像の各種類について、解像度の異なる解像度差分画像を統合することにより視覚的顕著度画像を抽出する。さらに、得られた視覚的顕著度画像において顕著度がある閾値以上の領域として注目領域が検出できる。 For example, Patent Document 1 proposes the following method as a method for detecting a group of regions meaningful to a human being from an input image. That is, a plurality of types of basic feature images are extracted from the input image, and a multi-resolution image that is a multi-resolution expression is extracted. In addition, by extracting a plurality of resolution difference images, which are differences between images of different resolutions for each type of multi-resolution image, and integrating the resolution difference images of different resolutions for each type of resolution difference image, Extract images. Furthermore, a region of interest can be detected as a region having a saliency above a threshold value in the obtained visual saliency image.

特開２００９−３６１５号公報JP 2009-3615 A

Ｈｉｄｏ，Ｓ．，Ｔｓｕｂｏｉ，Ｙ．，Ｋａｓｈｉｍａ，Ｈ．，Ｓｕｇｉｙａｍａ，Ｍ．，＆Ｋａｎａｍｏｒｉ，Ｔ．Ｓｔａｔｉｓｔｉｃａｌｏｕｔｌｉｅｒｄｅｔｅｃｔｉｏｎｕｓｉｎｇｄｉｒｅｔｄｅｎｓｉｔｙｒａｔｉｏｅｓｔｉｍａｔｉｏｎ．ＫｎｏｗｌｅｄｇｅａｎｄＩｎｆｏｒｍａｔｉｏｎＳｙｓｔｅｍｓ，ｔｏａｐｐｅａｒ．Hido, S .; Tsuboi, Y .; , Kashima, H .; , Sugiyama, M .; , & Kanamori, T .; Statistical outer detection using directivity ratio estimation. Knowledge and Information Systems, to appear.

しかし、このような入力画像中からの基礎特徴画像を直接用いて視覚的顕著度を算出する場合、環境的、観測的要因に依るノイズの影響を受けやすく、注目領域の検出精度が低下するという問題があった。 However, when the visual saliency is calculated by directly using the basic feature image from such an input image, it is easily affected by noise due to environmental and observational factors, and the detection accuracy of the attention area is reduced. There was a problem.

上記課題を解決するために、本発明に係る注目領域検出方法は、入力画像内の複数の点のそれぞれに対して第１の抽出領域と該第１の抽出領域とは異なる第２の抽出領域とを設定する抽出領域設定工程と、前記第１の抽出領域および前記第２の抽出領域のそれぞれから特徴量を抽出する特徴量抽出工程と、同一の点に対して設定された前記第１の抽出領域と前記第２の抽出領域の特徴量との確率密度の関係に基づいて前記複数の点における視覚的顕著度を算出する算出工程と、
前記複数の点における視覚的顕著度に基づいて注目領域を検出する検出工程とを備える。 In order to solve the above-described problem, an attention area detection method according to the present invention includes a first extraction area and a second extraction area different from the first extraction area for each of a plurality of points in an input image. And an extraction region setting step for extracting feature amounts from each of the first extraction region and the second extraction region, and the first region set for the same point. A calculation step of calculating visual saliency at the plurality of points based on a probability density relationship between an extraction region and a feature amount of the second extraction region;
A detection step of detecting a region of interest based on visual saliency at the plurality of points .

本発明によれば、注目領域の検出精度を向上させることができる。 According to the present invention, it is possible to improve the detection accuracy of a region of interest.

第１実施形態に係る注目領域検出装置の機能構成図である。It is a functional lineblock diagram of the attention field detecting device concerning a 1st embodiment. 検出部の機能を説明する図である。It is a figure explaining the function of a detection part. データ抽出領域を説明する図である。It is a figure explaining a data extraction area. 視覚的顕著度の極大値が検出された検出点を説明する図である。It is a figure explaining the detection point from which the maximum value of visual saliency was detected. 局所的注目領域群を説明する図である。It is a figure explaining a local attention area group. 設定される注目領域を説明する図である。It is a figure explaining the attention area set. 学習部の機能を説明する図である。It is a figure explaining the function of a learning part. パラメータ候補を説明する図である。It is a figure explaining a parameter candidate. 係数算出部の処理を説明する図である。It is a figure explaining the process of a coefficient calculation part. パラメータの決定を説明する図である。It is a figure explaining determination of a parameter. パラメータ候補を説明する図である。It is a figure explaining a parameter candidate. 注目領域検出装置を実現する情報処理装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the information processing apparatus which implement | achieves an attention area detection apparatus.

以下、図面を参照しながら、本発明の各実施形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

［第１の実施形態］
図１は、本発明の第１の実施形態に係る注目領域検出装置１の機能構成を示すブロック図である。本実施形態に係わる物体識別装置は、半導体集積回路（ＬＳＩ）を用いて実現されるものとする。図１に示すように注目領域検出装置１は、学習部１１、検出部１２、制御部１３を有する。これらの構成要素は注目領域検出装置１が果たす機能にそれぞれ対応している。 [First Embodiment]
FIG. 1 is a block diagram showing a functional configuration of a region of interest detection device 1 according to the first embodiment of the present invention. The object identification device according to the present embodiment is realized using a semiconductor integrated circuit (LSI). As illustrated in FIG. 1, the attention area detection device 1 includes a learning unit 11, a detection unit 12, and a control unit 13. These components correspond to the functions performed by the attention area detection device 1.

注目領域検出装置１が果たす機能を大別すると２つある。その１つは学習機能であり、学習部１１で実行される。もう１つは検出機能であり、検出部１２で実行される。ここで、学習部１１では、検出部１２で用いられるパラメータαを算出する。また、検出部１２は、学習部１１で算出したパラメータαを用いて入力画像内の注目領域を検出する。さらに、制御部１３は、物体識別装置１の各構成要素の制御を行っている。 The function performed by the attention area detection device 1 is roughly divided into two. One of them is a learning function, which is executed by the learning unit 11. The other is a detection function, which is executed by the detection unit 12. Here, the learning unit 11 calculates the parameter α used by the detection unit 12. In addition, the detection unit 12 detects a region of interest in the input image using the parameter α calculated by the learning unit 11. Further, the control unit 13 controls each component of the object identification device 1.

一方、注目領域の検出結果は、注目領域検出装置１の上位階層の装置に伝達され、様々なアプリケーション（デジタルスチルカメラのオートフォーカス機能やセキュリティーカメラの異常検出機能など）に適用される。 On the other hand, the detection result of the attention area is transmitted to a higher-level device of the attention area detection device 1 and applied to various applications (such as an autofocus function of a digital still camera and an abnormality detection function of a security camera).

図２は検出部１２の機能構成を示すブロック図である。図２に示すように検出部１２は、第１の算出部である特徴量算出部１２１、第２の算出部である視覚的顕著度算出部１２２、極大値探索部１２３、統合部１２４から構成される。 FIG. 2 is a block diagram illustrating a functional configuration of the detection unit 12. As illustrated in FIG. 2, the detection unit 12 includes a feature amount calculation unit 121 that is a first calculation unit, a visual saliency calculation unit 122 that is a second calculation unit, a maximum value search unit 123, and an integration unit 124. Is done.

図３は、データ抽出領域を説明する図である。特徴量算出部１２１は、図３のように、注目領域検出装置１の外部から入力される入力画像中に、訓練データ抽出領域と検証データ抽出領域を設定する抽出領域設定を実行し、各々の領域から無作為に所定の個数の複数種類の低次特徴量抽出を行う。 FIG. 3 is a diagram for explaining the data extraction area. As shown in FIG. 3, the feature amount calculation unit 121 executes extraction region setting for setting a training data extraction region and a verification data extraction region in an input image input from the outside of the attention region detection device 1. A predetermined number of low-order feature quantities are extracted from a region at random.

ここで、訓練データ抽出領域と検証データ抽出領域は、大きさの異なる２つの円形領域で与えられる。このうちの半径の大きい方の円形領域を第１の抽出領域である検証データ抽出領域とし、半径の小さい方の円形領域を第２の抽出領域である訓練データ抽出領域とする。 Here, the training data extraction area and the verification data extraction area are given as two circular areas having different sizes. Of these, the circular area with the larger radius is set as the verification data extraction area as the first extraction area, and the circular area with the smaller radius is set as the training data extraction area as the second extraction area.

また、訓練データ抽出領域と検証データ抽出領域の半径、抽出する低次特徴量の個数、およびその種類（輝度値、エッジ強度、テクスチャなど）は、あらかじめ学習部１１により決定され、制御部１３により伝達される。 Further, the radius of the training data extraction region and the verification data extraction region, the number of low-order feature amounts to be extracted, and their types (luminance value, edge strength, texture, etc.) are determined in advance by the learning unit 11 and are controlled by the control unit 13. Communicated.

なお、ここで得られた低次特徴量は視覚的顕著度算出部１２２に出力される。 Note that the low-order feature amount obtained here is output to the visual saliency calculating unit 122.

視覚的顕著度算出部１２２は、特徴量算出部１２１で得られた低次特徴量に基づいて、入力画像内の任意の点における視覚的顕著度Ｓを算出する。具体的には、訓練データ抽出領域から得られた低次特徴量と、検証データ抽出領域から得られた低次特徴量とのそれぞれを用いて推定される確率密度の比に基づいて、視覚的顕著度Ｓを算出する。 The visual saliency calculator 122 calculates a visual saliency S at an arbitrary point in the input image based on the low-order feature obtained by the feature calculator 121. Specifically, based on the ratio of probability density estimated using the low-order feature value obtained from the training data extraction region and the low-order feature value obtained from the verification data extraction region, The saliency S is calculated.

ここで、訓練データ抽出領域における低次特徴量の確率密度ｐ_{ｔｒａｉｎ}と、検証データ抽出領域における低次特徴量の確率密度ｐ_ｔｅｓｔとの密度比（ｐ_ｔｅｓｔ／ｐ_{ｔｒａｉｎ}）は、たとえば、非特許文献１の密度比推定手法を用いて算出できる。 Here, the density ratio (p _test / p _train ) between the probability density p _train of the low-order feature quantity in the training data extraction region and the probability density p _test of the low-order feature quantity in the verification data extraction region is, for example, non-patent It can be calculated by using the density ratio estimation method of Document 1.

これから、視覚的顕著度Ｓは、特徴量算出部１２１で抽出される低次特徴量が単一種の場合（たとえば、輝度値Ｙのみの場合）は、その低次特徴量による確率密度の比の標準偏差σの逆数１／σで与えられる。 From this, the visual saliency S is the ratio of the probability density of the low-order feature quantity when the low-order feature quantity extracted by the feature quantity calculation unit 121 is a single type (for example, only the luminance value Y). It is given by the reciprocal 1 / σ of the standard deviation σ.

また、視覚的顕著度Ｓは、特徴量算出部１２１で抽出される低次特徴量が複数種の場合（たとえば、輝度値Ｙ、エッジ強度Ｅ、テクスチャＴの３種類の場合）は、以下のようになる。すなわち、学習部１１より入力されるパラメータα＝（α_Ｙ，α_Ｅ，α_Ｔ）と、各々の低次特徴量による確率密度の比の標準偏差（σ_Ｙ，σ_Ｅ，σ_Ｔ，）の逆数（１／σ_Ｙ，１／σ_Ｅ，１／σ_Ｔ）による線形和を用いて、式（１）のように与えられる。 The visual saliency S is as follows when there are a plurality of types of low-order feature amounts extracted by the feature amount calculation unit 121 (for example, when there are three types of luminance value Y, edge strength E, and texture T). It becomes like this. That is, the standard deviation (σ _Y , σ _E , σ _T ) of the ratio of the parameter α = (α _Y , α _E , α _T ) input from the learning unit 11 and the probability density due to each low-order feature amount is calculated. Using a linear sum of reciprocals (1 / σ _Y , 1 / σ _E , 1 / σ _T ), it is given as in equation (1).

さらに、視覚的顕著度Ｓは、特徴量算出部１２１で抽出される低次特徴量が複数種の場合（たとえば、Ｎ種類の場合）は、式（１）を容易に拡張でき、視覚的顕著度Ｓは式（２）のように与えられる。あるいは、より一般的に、視覚的顕著度Ｓは式（３）のようにσ_ｎ（ｎ＝０〜Ｎ）に関する非線形関数としてもよい。 Further, the visual saliency S can be easily expanded when the number of low-order feature amounts extracted by the feature amount calculation unit 121 (for example, N types) can be easily extended. The degree S is given by equation (2). Alternatively, more generally, the visual saliency S may be a non-linear function related to σ _n (n = 0 to _N ) as in Expression (3).

なお、得られた視覚的顕著度Ｓは、極大値探索部１２３に出力される。 The obtained visual saliency S is output to the local maximum search unit 123.

極大値探索部１２３は、特徴量算出部１２１と視覚的顕著度算出部１２２を用いて、入力画像内のさまざまな点（ｘ，ｙ）における視覚的顕著度Ｓ＝Ｓ（ｘ，ｙ）を算出し、視覚的顕著度Ｓの統計的分布を求めるために、極大値（または、所定閾値以上の値）を与える点を検出する。 The local maximum search unit 123 uses the feature amount calculation unit 121 and the visual saliency calculation unit 122 to calculate the visual saliency S = S (x, y) at various points (x, y) in the input image. In order to calculate and obtain a statistical distribution of the visual saliency S, a point giving a maximum value (or a value equal to or greater than a predetermined threshold) is detected.

なお、極大値（または、所定閾値以上の値）の検出された点ｐ_ｋ（ｋ＝０〜Ｋ）は、統合部１２４に出力される。ただし、Ｋは、視覚的顕著度Ｓの極大値（または、所定閾値以上の値）が検出された点（ｘ，ｙ）の個数を表す。 The point p _k (k = 0 to K) at which the maximum value (or a value equal to or greater than the predetermined threshold value) is detected is output to the integration unit 124. However, K represents the number of points (x, y) at which the maximum value (or a value equal to or greater than a predetermined threshold value) of the visual saliency S is detected.

統合部１２４は、極大値探索部１２３で得られた、視覚的顕著度Ｓの極大値（または、所定閾値以上の値）が検出された検出点ｐ_ｋ（ｋ＝０〜Ｋ）に対応する領域を局所的注目領域として設定する局所的注目領域設定を実行する。そして、それら局所的注目領域を２つの検出点の間の距離ｄに基づいて統合する。 The integration unit 124 corresponds to the detection point p _k (k = 0 to K) at which the maximum value of the visual saliency S (or a value equal to or greater than a predetermined threshold) obtained by the maximum value search unit 123 is detected. The local attention area setting for setting the area as the local attention area is executed. Then, these local attention areas are integrated based on the distance d between the two detection points.

例えば、図４のように、入力画像内の３点（ｐ_ｉ，ｐ_ｊ，ｐ_ｋ）において視覚的顕著度Ｓの極大値（または、所定閾値以上の値）が得られたとする。図４において、ｐ_ｉ，ｐ_ｊ，ｐ_ｋは、各々の円形領域の中心であり、それぞれの半径の大きさは訓練データ抽出領域の半径の大きさに相当する。まず、点ｐ_ｉとｐ_ｊに着目し、これらの点の間の距離ｄ_ｉｊが所定閾値ｄ_ｔｈより小さい場合、この２つの点（ｐ_ｉ，ｐ_ｊ）を同一グループとして統合して局所的注目領域群を生成する。次に、点ｐ_ｊとｐ_ｋに着目し、これらの点の間の距離ｄ_ｊｋが所定閾値ｄ_ｔｈより小さい場合、この２つの点（ｐ_ｊ，ｐ_ｋ）を同一のグループとして統合する。同様に、点ｐ_ｉとｐ_ｋに着目し、これらの点の間の距離ｄ_ｉｋが所定閾値ｄ_ｔｈより小さい場合、この２つの点を同一のグループとして統合する。ただし、上記所定閾値ｄ_ｔｈは、学習部１２により決定され、制御部１３により伝達される。 For example, as shown in FIG. 4, it is assumed that the maximum value of visual saliency S (or a value equal to or greater than a predetermined threshold value) is obtained at three points ( _pi , _pj , _pk ) in the input image. In FIG. 4, p _i , p _j , and _pk are the centers of the respective circular regions, and the size of each radius corresponds to the size of the radius of the training data extraction region. First, paying attention to the points p _i and p _j , if the distance d _ij between these points is smaller than the predetermined threshold value d _th , the two points (p _i , p _j ) are integrated into the same group and are locally A group of attention areas is generated. Next, paying attention to the points p _j and p _k , if the distance d _jk between these points is smaller than the predetermined threshold value d _th , the two points (p _j , p _k ) are integrated as the same group. Similarly, paying attention to the points p _i and p _k , if the distance d _ik between these points is smaller than the predetermined threshold value d _th , the two points are integrated as the same group. However, the predetermined threshold value d _th is determined by the learning unit 12 and transmitted by the control unit 13.

これを視覚的顕著度Ｓの極大値（または、所定閾値以上の値）が検出された点ｐ_ｋ（ｋ＝０〜Ｋ）の、すべての組に対して実行する。これにより、点ｐ_ｋ（ｋ＝０〜Ｋ）に対応する局所的注目領域を複数のグループ（局所的注目領域群）ｇ_ｍ（ｍ＝０〜Ｍ）に統合する（図５）。さらに、グループｇ_ｍ（ｍ＝０〜Ｍ）毎に視覚的顕著度Ｓの合計値Ｓ_ｍ（ｍ＝０〜Ｍ）を算出し、その最大値を与えるグループｇ_ｍ’を包含する矩形領域を設定し（図６）、これを最終的な注目領域とする注目領域設定を実行する。ただし、上記Ｍは、グループ個数を表す（図５の例ではＭ＝３）。 This is executed for all sets of points p _k (k = 0 to K) at which the maximum value of visual saliency S (or a value equal to or greater than a predetermined threshold) is detected. Thereby, the local attention area corresponding to the point p _k (k = 0 to K) is integrated into a plurality of groups (local attention area group) g _m (m = 0 to M) (FIG. 5). Further, a total value S _m (m = 0 to M) of visual saliency S is calculated for each group g _m (m = 0 to M), and a rectangular region including the group g _{m ′} that gives the maximum value is calculated. The region of interest is set (FIG. 6), and the region-of-interest setting is executed with this as the final region of interest. However, M represents the number of groups (M = 3 in the example of FIG. 5).

なお、ここで得られた注目領域の入力画像上における位置とサイズは、制御部１２に出力される。 Note that the position and size of the region of interest on the input image obtained here are output to the control unit 12.

なお、上記手段が注目領域検出装置１における検出部１２の一例に相当する。 The above means corresponds to an example of the detection unit 12 in the attention area detection device 1.

図７は学習部１１の機能構成を示すブロック図である。図７に示すように学習部１１は、画像データベース１１１、係数算出部１１２から構成される。学習部１１は、検出部１２で用いられるパラメータαを、画像データベース１１１に格納されたＧＴ（グランドトゥルース）画像に基づいて決定する。ここで、ＧＴ画像とは、入力画像内の予め意味のあるまとまった物体領域として、注目領域を図６の点線で示す矩形枠で定義し、その位置とサイズがあらかじめ設定された画像である。係数算出部１１２は、検出部１２を用いて同様の注目領域検出結果が得られるように、パラメータαを学習する。 FIG. 7 is a block diagram showing a functional configuration of the learning unit 11. As shown in FIG. 7, the learning unit 11 includes an image database 111 and a coefficient calculation unit 112. The learning unit 11 determines the parameter α used in the detection unit 12 based on a GT (ground truth) image stored in the image database 111. Here, the GT image is an image in which a region of interest is defined by a rectangular frame indicated by a dotted line in FIG. 6 as a meaningful object region in the input image, and its position and size are set in advance. The coefficient calculation unit 112 learns the parameter α so that the same attention area detection result can be obtained using the detection unit 12.

具体的には、パラメータαの候補α_０，ｍ（ｍ＝０〜Ｍ）をＮ次元空間のランダムな点の座標で与える。ただし、Ｎは低次特徴量の種類数を表し、本実施例では、輝度値Ｙ、エッジ強度Ｅ、テクスチャＴの３種類（Ｎ＝３）を考えることにする。また、Ｍはパラメータαの候補数を表す。 Specifically, the parameter α candidates α _{0, m} (m = 0 to M) are given by the coordinates of random points in the N-dimensional space. However, N represents the number of types of low-order feature values, and in this embodiment, three types (N = 3) of luminance value Y, edge strength E, and texture T are considered. M represents the number of candidates for parameter α.

すると、パラメータαの候補α_０，ｍ（ｍ＝０〜Ｍ）は、図８のように、３次元空間（α_Ｙ，α_Ｅ，α_Ｔ）における半径１の球体内のランダムな点の座標で与えられる。ここで、パラメータαの候補数Ｍは、学習時間を短縮したい場合は、比較的小さめ（例えば１０〜５０）に、注目領域の検出精度を重視したい場合は、比較的大きめ（例えば１００〜５００）に設定する。 Then, the parameter α candidates α _{0, m} (m = 0 to M) are coordinates of random points in a sphere with a radius 1 in a three-dimensional space (α _Y , α _E , α _T ) as shown in FIG. Given in. Here, the candidate number M of the parameter α is relatively small (for example, 10 to 50) when it is desired to shorten the learning time, and is relatively large (for example, 100 to 500) when the detection accuracy of the attention area is important. Set to.

また、係数算出部１１２は、検出部１２を用いて、パラメータαの候補α_０，ｍ（ｍ＝０〜Ｍ）の各々を用いた場合の検出精度Ｒ_０，ｍ（ｍ＝０〜Ｍ）を測定する。ここで、係数算出部１１２は、図９のように、画像データベース１１１に格納されたＧＴ画像と、検出部１２を用いて得られた注目領域検出結果を各々照らし合わせる。検出精度Ｒ_０，ｍは、それらの重複する面積ｓとＧＴ画像における注目領域の面積ｓ’との面積比（ｓ／ｓ’）の平均値で与えられる。 Also, the coefficient calculation unit 112 uses the detection unit 12 to detect the detection accuracy R _{0, m} (m = 0 to M) when each of the parameter α candidates α _{0, m} (m = 0 to M) is used. Measure. Here, as shown in FIG. 9, the coefficient calculation unit 112 compares the GT image stored in the image database 111 with the attention area detection result obtained using the detection unit 12. The detection accuracy R _{0, m} is given by the average value of the area ratio (s / s ′) between the overlapping area s and the area s ′ of the region of interest in the GT image.

この結果、図１０のように、最も良好な検出精度Ｒ_ｍ’を与えるパラメータαの候補α_０，ｍ’が特定できたとする。次に、係数算出部１１２は、図１１のように、新たなパラメータαの候補α_１，ｍ（ｍ＝０〜Ｍ）を、３次元空間（α_Ｙ，α_Ｅ，α_Ｔ）における点α_０，ｍ’を中心とする、半径１＊γ（０＜γ＜１）の球体内のランダムな点の座標で与える。ここで同様に、係数算出部１１２は、検出部１２を用いて、パラメータαの候補α_１，ｍ（ｍ＝０〜Ｍ）の各々を用いた場合の検出精度Ｒ_１，ｍ（ｍ＝０〜Ｍ）を算出する。 As a result, as shown in FIG. 10, it is assumed that the candidate α _{0, m ′} for the parameter α that gives the best detection accuracy R _{m ′} can be identified. Next, as shown in FIG. 11, the coefficient calculation unit 112 converts a new parameter α candidate α _{1, m} (m = 0 to M) to a point α in the three-dimensional space (α _Y , α _E , α _T ). It is given by the coordinates of a random point in the sphere having a radius of 1 * γ (0 <γ <1) with _{0, m ′} as the center. Similarly, the coefficient calculation unit 112 uses the detection unit 12 to detect the detection accuracy R _{1, m} (m = 0) when each of the parameter α candidates α _{1, m} (m = 0 to M) is used. ~ M).

係数算出部１１２は、以下同様に上記の操作を繰り返し、球体の半径が所定閾値γ_ｔｈより小さくなった場合、処理を打ち切り、その時点で得られたパラメータαを制御部１３に出力する。ただし、上記所定閾値γ_ｔｈは、非負の実数として、試行錯誤的に決定される。以上が注目領域検出装置１における学習部１１の一例に相当する。 The coefficient calculation unit 112 repeats the above operation in the same manner, and when the radius of the sphere becomes smaller than the predetermined threshold γ _th , the coefficient calculation unit 112 aborts the process and outputs the parameter α obtained at that time to the control unit 13. However, the predetermined threshold γ _th is determined by trial and error as a non-negative real number. The above corresponds to an example of the learning unit 11 in the attention area detection device 1.

このようにして得られた注目領域検出結果は、注目領域検出装置１のさらに上位階層に伝達される。例えば、注目領域検出後にロボットアームにより注目対象物体をピックアップするような状況においては、ロボットアームと注目領域検出装置１とを制御するための装置、プログラム等に伝達され、様々なアプリケーションに適用される。 The attention area detection result obtained in this way is transmitted to a higher hierarchy of the attention area detection device 1. For example, in a situation where the target object is picked up by the robot arm after the attention area is detected, the attention is transmitted to a device, a program, or the like for controlling the robot arm and the attention area detection device 1 and applied to various applications. .

以上が注目領域検出装置１の一例に相当する。 The above corresponds to an example of the attention area detection device 1.

［他の実施形態］
第１の実施形態では、注目領域検出装置は学習機能および識別機能の両者を実行するものとしたが、学習機能および検出機能のいずれか一方のみを実行するようにしてもよい。 [Other Embodiments]
In the first embodiment, the attention area detection device executes both the learning function and the identification function. However, only one of the learning function and the detection function may be executed.

第１の実施形態では、特徴量算出部１２１において、訓練データ抽出領域および検証データ抽出領域は２つの円形領域としていたが、円形状のみならず、楕円形状、矩形形状など、その他の任意形状を用いてもよい。また、最終的な注目領域も矩形領域に限らず、他の形状でもよい。 In the first embodiment, in the feature amount calculation unit 121, the training data extraction region and the verification data extraction region are two circular regions, but other arbitrary shapes such as an elliptical shape and a rectangular shape are used as well as a circular shape. It may be used. Further, the final attention area is not limited to the rectangular area, but may be another shape.

第１の実施形態では、視覚的顕著度算出部１２２は式（２）を用いて視覚的顕著度Ｓを計算していたが、式（２）の代わりに、訓練データ抽出領域の面積ｓを用いて、式（４）を用いてもよい。また、訓練データ抽出領域の面積ｓの任意関数ｆ（ｓ）を用いて、式（５）を用いてもよい。 In the first embodiment, the visual saliency calculating unit 122 calculates the visual saliency S using Expression (2), but instead of using Expression (2), the area s of the training data extraction region is calculated. And equation (4) may be used. Moreover, you may use Formula (5) using the arbitrary function f (s) of the area s of a training data extraction area | region.

また、第１の実施形態では、視覚的顕著度算出部１２２は、視覚的顕著度Ｓを、訓練データ抽出領域における低次特徴量の確率密度と、検証データ抽出領域における低次特徴量の確率密度との関係として、両者の密度比に基づいて算出していた。これに限らず、それぞれの抽出領域の低次特徴量のヒストグラムを求め、それぞれのヒストグラムの対応するヒストグラムビン間の差分絶対値に基づいて、視覚的顕著度Ｓを算出するようにしてもよい。 Further, in the first embodiment, the visual saliency calculation unit 122 calculates the visual saliency S using the probability density of the low-order feature quantity in the training data extraction area and the probability of the low-order feature quantity in the verification data extraction area. As a relationship with the density, it was calculated based on the density ratio between the two. However, the visual saliency S may be calculated based on absolute differences between histogram bins corresponding to the respective histogram bins.

なお、本発明は、複数の機器（例えばホストコンピュータ、インタフェース機器、リーダ、プリンタなど）から構成されるシステムに適用しても、一つの機器からなる装置（例えば、複写機、ファクシミリ装置など）に適用してもよい。 Note that the present invention can be applied to a system (for example, a copier, a facsimile machine, etc.) consisting of a single device even when applied to a system composed of a plurality of devices (for example, a host computer, interface device, reader, printer, etc.). You may apply.

また、本発明は、以下の処理を実施することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ）がプログラムを読み出して実行する処理である。 The present invention can also be realized by performing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and the computer (or CPU or MPU) of the system or apparatus reads out and executes the program. It is processing to do.

図１２は、プログラムを実行することで上述した注目領域検出装置を実現する情報処理装置のハードウェア構成を示すブロック図である。 FIG. 12 is a block diagram illustrating a hardware configuration of an information processing apparatus that realizes the above-described attention area detection device by executing a program.

ＣＰＵ２０１は、各種プログラムを実行し、装置各部の制御を行う。ＲＯＭ２０２は、不揮発性のメモリであり、情報処理装置を初期動作させる際に必要なプログラムなどを記憶する。ＲＡＭ２０３は、ＣＰＵ２０１のワークエリアを提供し、２次記憶装置２０４から読み出されたプログラムなどを一時記憶する。２次記憶装置２０４は、ＣＰＵ２０１が使用するプログラム２１０を記録し、画像データベースで利用する画像を格納する。なお、プログラム２１０は、ＯＳ２１１、アプリケーション２１２、モジュール２１３、およびデータ２１４から構成される。 The CPU 201 executes various programs and controls each part of the apparatus. The ROM 202 is a non-volatile memory, and stores programs and the like necessary for initial operation of the information processing apparatus. A RAM 203 provides a work area for the CPU 201 and temporarily stores a program read from the secondary storage device 204. The secondary storage device 204 records the program 210 used by the CPU 201 and stores an image used in the image database. Note that the program 210 includes an OS 211, an application 212, a module 213, and data 214.

各デバイス２０１〜２０４は、バス２０５を通じて情報をやり取りする。情報処理装置は、バス２０５を介して、ディスプレイ２０６、キーボード２０７、マウス２０８、Ｉ／Ｏデバイス２０９とつながっている。 Each device 201 to 204 exchanges information through the bus 205. The information processing apparatus is connected to a display 206, a keyboard 207, a mouse 208, and an I / O device 209 via a bus 205.

ディスプレイ２０６は、ユーザに処理結果や処理の途中経過等の情報を表示するのに用いられる。キーボード２０７とマウス２０８は、ユーザからの指示を入力するのに用いられ、特にマウス２０８は表示上の位置を入力するのに用いられる。Ｉ／Ｏデバイス２０９は、処理対象の画像を取り込むために用いられる。例えば、Ｉ／Ｏデバイス２０９は、対象物体を撮影する撮影装置から入力画像を取り込む。また、Ｉ／Ｏデバイス２０９は、情報処理結果として得られた注目領域を撮影装置や画像処理装置など他の情報処理装置へ出力することもできる。 A display 206 is used to display information such as a processing result and a progress of the processing to the user. A keyboard 207 and a mouse 208 are used for inputting an instruction from the user, and in particular, the mouse 208 is used for inputting a position on the display. The I / O device 209 is used for capturing an image to be processed. For example, the I / O device 209 captures an input image from a photographing apparatus that photographs a target object. Also, the I / O device 209 can output the attention area obtained as the information processing result to another information processing apparatus such as a photographing apparatus or an image processing apparatus.

Claims

An extraction region setting step of setting a first extraction region and a second extraction region different from the first extraction region for each of a plurality of points in the input image;
A feature amount extraction step of extracting a feature amount from each of the first extraction region and the second extraction region;
A calculation step of calculating visual saliency at the plurality of points based on a probability density relationship between the first extraction region and the second extraction region set for the same point; attention area detection method characterized by having a detection step of detecting a region of interest based on the visual saliency of the plurality of points.

The attention area detection method according to claim 1, wherein in the feature quantity extraction step, a plurality of types of feature quantities are extracted from each of the first extraction area and the second extraction area.

The attention area detection method according to claim 2, wherein the plurality of types of feature amounts include any one of a luminance value, an edge strength, and a texture.

In the calculating step, the visual saliency is calculated based on a difference absolute value between corresponding histogram bins of the histogram of the feature amount of the first extraction region and the histogram of the feature amount of the second extraction region. The attention area detection method according to claim 1, wherein the attention area is detected.

In the calculating step, the visual saliency is calculated based on a ratio of the probability density with respect to the feature amount obtained in the first extraction region and the probability density with respect to the feature amount obtained in the second extraction region. The attention area detection method according to claim 1, wherein the attention area is detected.

6. The attention area detection method according to claim 1, wherein in the detection step, an attention area is detected based on a statistical distribution of the visual saliency.

The detection step includes
A local attention area setting step of setting a plurality of local attention areas based on the visual saliency,
Integrating the plurality of local attention areas based on the size of their distance to generate a group of local attention areas;
A specific step of calculating a total value of visual saliency for each local attention area group, and specifying a local attention area group that gives the maximum value;
The attention area detection method according to claim 6, further comprising: an attention area setting step of setting an area including the specified local attention area group as the attention area.

Extraction region setting means for setting a first extraction region and a second extraction region different from the first extraction region for each of a plurality of points in the input image;
Feature quantity extraction means for extracting a feature quantity from each of the first extraction area and the second extraction area;
Calculating means for calculating visual saliency at the plurality of points based on a probability density relationship between the first extraction region and the feature amount of the second extraction region set for the same point; An attention area detection apparatus comprising: a detection means for detecting an attention area based on visual saliency at the plurality of points.

9. The attention area detection apparatus according to claim 8, wherein the feature quantity extraction unit extracts a plurality of types of feature quantities from each of the first extraction area and the second extraction area.

The attention area detection apparatus according to claim 9, wherein the plurality of types of feature amounts include any one of a luminance value, an edge strength, and a texture.

The calculation means calculates the visual saliency based on a difference absolute value between histogram histograms corresponding to the histogram of the feature amount of the first extraction region and the histogram of the feature amount of the second extraction region. The attention area detection device according to any one of claims 8 to 10, wherein

The calculation means calculates the visual saliency based on a ratio of a probability density with respect to a feature amount obtained in the first extraction region and a probability density with respect to the feature amount obtained in the second extraction region. The attention area detection device according to any one of claims 8 to 10, wherein

The attention area detection apparatus according to claim 8, wherein the detection unit detects an attention area based on a statistical distribution of the visual saliency.

The detection means includes
Local attention area setting means for setting a plurality of local attention areas based on the visual saliency,
An integration means for generating a group of local attention areas by integrating the plurality of local attention areas based on the size of their distance;
Calculating a total value of visual saliency for each local attention area group, and specifying means for specifying a local attention area group that gives the maximum value;
The attention area detection device according to claim 13, further comprising attention area setting means for setting an area including the identified local attention area group as the attention area.

The program for making a computer perform each process of the attention area detection method of any one of Claim 1 to 7.