JP4303490B2

JP4303490B2 - Image and document matching method and apparatus, and matching program

Info

Publication number: JP4303490B2
Application number: JP2003043657A
Authority: JP
Inventors: 浩明武部
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2003-02-21
Filing date: 2003-02-21
Publication date: 2009-07-29
Anticipated expiration: 2023-02-21
Also published as: JP2004252810A

Description

【０００１】
【発明の属する技術分野】
本発明は画像と文書のマッチング方法及び装置に関する。更に詳しくは、映像と映像に映し出されたプレゼンテーションファイルの各スライドを自動的に対応付け、映像の表示と同期をとりながら最適なスライドを表示する画像と文書のマッチング方法及び装置に関する。
【０００２】
【従来の技術】
近年、教育分野で、インターネットを使用したｅ−Ｌｅａｒｎｉｎｇは、インターネット等を使ったコンピュータによる遠隔地教育で、ＷＥＢ画面を使って行われることが多い。ｅ−ＬｅａｒｎｉｎｇにおけるＷＢＴ（ＷｅｂＢａｓｅｄＴｒａｉｎｉｎｇ）は、ユーザがブラウザでウェブ画面を見ながら学習する新しい教育方式である。これにより、ユーザは自分の好きな時に好きな場所で学習することができる。
【０００３】
このＷＢＴでは、映像とそれに同期した資料を拡大表示する学習コンテンツが主コンテンツの一つになっている。そのような学習コンテンツ作成には、資料（例えばプレゼンテーションファイル）を説明する映像に対して、各時刻における映像とそこに映し出されている資料がどのスライドかを対応させる対応付け作業が必要である。
【０００４】
このようなシステムを実現するためには、プロジェクタを用いて説明する動画像の表示と、そこに映し出されているスライドの画像の拡大表示とが時間的に同期がとれている必要がある。現在、こうしたコンテンツを作成するための、様々なｅ−Ｌｅａｒｎｉｎｇのコンテンツ作成用のオーサリングシステム（オーサリングソフト）が製品化されている。
【０００５】
このような、動画像とスライドとの同期をとるためには、動画像上に表示されるスライドと記憶領域に予め記憶させておく複数のスライドとの類似度を評価して、最も類似度の高いものを動画像上に表示されているスライドであると判断してコンピュータ画面上に表示させることが行なわれる。
【０００６】
このような操作を行なわせるには、動画像上のスライド画像（第１の画像）と、コンピュータ内に記憶されているスライド画像（第２の画像）との類似度（マッチング）を求める技術が必要となる。
第１の画像と第２の画像のマッチングを行なう技術としては、以下のような技術が知られている。
【０００７】
画像データの特徴点を抽出し、動画像もしくはステレオ画像の左右の対応点の探索をする画像マッチングにおいて、対応をとるもとの画像のテンプレートを作成し、対応をとる対象の画像に探索領域を設定し、テンプレートの特徴と該探索領域のテンプレートと同じ大きさの領域との特徴の相関をとり、テンプレートの画像に対応する画像を求めるマッチング処理部を備え、連続する時刻ｔ１−１と時刻ｔ１における相関値もしくはその統計量の変化の程度を判定する相関変化判定部を備え、相関変化が予め定めた基準より大きければ画像探索をする条件を変更する探索パラメータを変更し、小さければ探索画像の位置についての情報を出力する（例えば特許文献１参照）。
【０００８】
しかしながら、従来の映像と資料の対応付けを行なうオーサリングソフトは、この対応付け作業を手動で行わなければならないため、コンテンツ作成には膨大な時間がかかっている。そこで、本出願人は、講演者がパワーポイント等のプレゼンテーション用ツールを使用してプレゼンテーションしている映像から静止画を生成し、その静止画像とプレゼンテーションファイルの各スライドをマッチングさせ、映像とプレゼンテーションファイルを自動で対応付ける技術（先行技術）を開発した（特願２００２−８０５００号,特願２００２−８９３３８号）。
【０００９】
特願２００２−８９３３８号では、プレゼンテーション映像からの静止画像を文字認識し、認識結果と各スライドをマッチングさせる方法が提案されている。図１０は映像中文字認識による映像とプレゼンテーションファイルの対応付けを示す図である。図において、１は映像であり、講演者がスライド画像２を参照しながら説明している。この映像は、時間と共に変化していく。
【００１０】
この映像１から静止画を得て、２値化すると共に文字抽出を行なって画像３を得る。この画像３とプレゼンテーションファイル４に記憶されている複数のスライド画像とのマッチングをとることになる。即ち、画像３と最もマッチング度（類似度）が高い画像を、映像１に映っているスライド画像であると判定するのである。
【００１１】
図１１は映像とプレゼンテーションファイルの対応付け結果を示す図である。図１０で得られたプレゼンテーションファイルのスライドを時間と共に対応付けている。期間Ｔ１はスライド４ａであり、期間Ｔ２はスライド４ｂであり、期間Ｔ３はスライド４ｃである。
【００１２】
【特許文献１】
特開平１１−１２０３５１号公報（第３頁、第４頁、図１）
【００１３】
【発明が解決しようとする課題】
前述した先行技術では、映像中の文字認識の信頼性が低いことに対応しながら、映像特有の変動を吸収することにより、映像の文字認識結果とスライドをマッチングするアルゴリズムが用いられている。しかしながら、プレゼンテーションファイルに図やグラフだけの場合や、テキスト情報が少ない場合に、マッチングに失敗してしまうという問題がある。
【００１４】
また、映像のフレームとプレゼンテーションファイルのスライドにおける画像特徴を比較してマッチングする時、撮影環境による画像劣化とプレゼンテーションファイルをプロジェクタで映すことによる映像中のプレゼンテーションファイルの歪みが大きな影響を及ぼす。
【００１５】
プレゼンテーションにおいて、映し出されるプレゼンテーションファイルの映像（スライド画像）は、照明や天候等の要因によって大きく変動すると共に、講演者自身やその影等によっても影響を受け、激しく変動する。また、プロジェクタやＯＨＰ（オーバヘッドプロジェクタ）の設置の仕方により、映されたスライドの外形は歪み、必ずしも長方形とは限らない。
【００１６】
本発明はこのような課題に鑑みてなされたものであって、映像変動と歪みに対して、映像のフレームとプレゼンテーションファイルのスライドを精度よく画像マッチングできる画像と文書のマッチング方法及び装置並びにマッチングプログラムを提供することを目的としている。
【００１７】
【課題を解決するための手段】
図１は本発明方法の原理を示すフローチャートである。本発明は、第１の画像と第２の画像をマッチングさせる時に各画素間の類似性を正規化相関値によって計算する第１のステップと、画素列間の最適対応を画素列の垂直方向への非線形マッチングによって求め、第１の画像と第２の画像間の類似度とする第２のステップと、を備えることを特徴とする。ここで、第１の画像及び第２の画像の「画像」は、スライド等も含む概念である。
（１）請求項１記載の発明は、第１の画像と第２の画像をマッチングさせる時に、各画素間の類似性を正規化相関を用いたテンプレートマッチングによって計算する第１のステップと、画素列間の最適対応を画素列の垂直方向への非線形マッチングによって求め、画像間の類似度とする第２のステップと、を備えることを特徴とする画像と文書のマッチング方法であって、あるプレゼンテーションファイルが映った画像に対し、データベースにあるプレゼンテーションファイルとマッチングすることにより、画像中のプレゼンテーションファイルを特定する方法で、画像を文字認識して得られる認識結果をもとにしたマッチングを行なうと共に、画像の特徴に基づくマッチングを行ない、両マッチング結果の信頼性を計量し、共に信頼性が高いか、或いは共に低い時、２つの類似度の和を最終の類似度とし、それ以外の場合は、信頼性の高い方の類似度を最終の類似度とすることを特徴とする。
【００１８】
このように構成すれば、文字認識により得られるマッチングと、画像特徴に基づくマッチングを行ない、これらマッチングを行なうことにより、信頼性が高い類似度を求めることができる。
（２）請求項２記載の発明は、あるプレゼンテーションファイルが映った画像に対し、データベースにあるプレゼンテーションファイルとマッチングすることにより、画像中のプレゼンテーションファイルを特定する方法であって、画像から２値のエッジ画像とその反転画像を作成し、それらの画像における連結成分のうち、ある一定以上の大きさと、ある範囲以内の縦横比をもつ連結成分を、画像中のプレゼンテーションファイル領域の候補とし、その候補領域に相当する部分画像とデータベースにあるプレゼンテーションファイルとをマッチングし、最も高い類似度を示す領域を画像中のプレゼンテーションファイル領域とすることを特徴とする。
【００１９】
このように構成すれば、画像中に映ったプレゼンテーションファイルに所定の処理を行なって得られた画像と、データベースにあるプレゼンテーションファイルとのマッチングを行なうことにより、データベースにあるプレゼンテーションファイルと類似度の高い領域をプレゼンテーションファイル領域とすることができる。
（３）請求項３記載の発明は、第１の画像と第２の画像をマッチングさせる時に、各画素間の類似性を正規化相関を用いたテンプレートマッチングによって計算する第１のステップと、画素列間の最適対応を画素列の垂直方向への非線形マッチングによって求め、画像間の類似度とする第２のステップと、を備えることを特徴とする画像と文書のマッチング方法であって、あるプレゼンテーションファイルが映った画像に対し、データベースにあるプレゼンテーションファイルとマッチングすることにより、画像中のプレゼンテーションファイルを特定する方法で、画像を文字認識して得られる認識結果をもとにしたマッチングを行なうと共に、画像の特徴に基づくマッチングを行ない、両マッチング結果の信頼性を計量し、共に信頼性が高いか、或いは共に低い時、２つの類似度の和を最終の類似度とし、それ以外の場合は、信頼性の高い方の類似度を最終の類似度とする処理をコンピュータに実行させることを特徴とする。
【００２０】
このように構成すれば、文字認識により得られるマッチングと、画像特徴に基づくマッチングを行ない、これらマッチングを行なうことにより、信頼性が高い類似度を求めることができる。
【００２１】
図２は本発明の原理ブロック図である。図において、１０は映像信号をサンプリングしてフレーム画像を得るサンプリング部、１１は該サンプリング部１０により得られたサンプリング画像から文字認識結果のマッチングを行なう文字認識結果マッチング部、１２は前記サンプリング画像から画像特徴のマッチングを行なう画像特徴マッチング部、１３は前記文字認識結果マッチング部１１と画像特徴マッチング部１２の出力を受けて所定の演算を行ない、最も類似度の高いプレゼンテーションファイルのスライド番号を決定して出力するスライド決定演算部１３である。
【００２２】
このように構成すれば、文字認識結果に基づくマッチングと画像特徴マッチングを用いて類似度を判定し、所定の演算を行なうことにより、映像信号に映っているスライドと類似度の高いプレゼンテーションファイルのスライド番号を出力することができる。この結果、ｅ−Ｌｅａｒｎｉｎｇコンテンツ（コンピュータの表示画面）には、講義画面とその講義で用いられているスライドの詳細画面を表示することができ、講義の実効を図ることができる。
【００２３】
また、この発明において、前記文字認識結果マッチング部は、サンプリング画像から文字認識を行なう文字認識処理部と、テンプレート画像から文字情報を抽出する文字情報抽出部と、これら文字認識処理部と文字情報抽出部の出力を受けて、対応関係を生成して２つの画像の類似度として出力する対応関係生成部と、とを有することを特徴とする。
このように構成すれば、文字認識処理による類似度を算出することができる。
【００２４】
また、この発明において、前記画像特徴マッチング部は、前記テンプレート画像をグレー画像に変換するグレー化部と、該グレー化部の出力を受けてスライド領域候補を抽出するスライド領域候補抽出部と、該スライド領域候補抽出部の出力を受けて正規化を行なう正規化部と、該正規化部の出力とテンプレート画像とを受けて２つの画像のマッチングを行ない類似度として出力するマッチング部と、を有することを特徴とする。
このように構成すれば、画像特徴に基づく類似度を作成することができる。
【００２５】
また、この発明において、前記スライド決定演算部は、スライド領域候補Ｒｉとスライドｘに対する文字認識結果に基づくマッチングによる類似度Ｓｃと、画像特徴に基づくマッチングによる類似度をＳ_Iとして、これら類似度ＳｃとＳ_Iと所定のいき値とを用いて全スライド候補領域と全スライドの中で最大の類似度Ｓ（Ｘ）を求め、この類似度を与えるスライド候補領域に対応するスライドｘを決定することを特徴とする。
このように構成すれば、文字認識結果に基づく類似度と画像特徴に基づく類似度から最も確からしいスライドを決定することができる。
【００２６】
【発明の実施の形態】
以下、図面を参照して本発明の実施の形態例を詳細に説明する。
先ず、本発明の基本的な部分について説明する。映像変動の問題に対しては、画像間の類似性を表わす評価尺度として正規化相関を用いることで対応する。正規化相関は２つの画像間の類似度を判断するために用いられるものである。正規化相関は、画像間の相互相関係数による方法であり、ノイズや一様な明度変動の影響を抑えた評価値を与えるため、映像変動に対して安定なマッチングが可能になる。
【００２７】
次に、映されるスライドの外形歪みの問題に対しては、スライドの境界線に基づく正規化とＤＰ（ダイナミック・プログラミング）によって吸収する。ＤＰマッチングは、局所的にコスト最小を与える対応を効率的に探索するマッチングであり、非線形なずれをも吸収する。
【００２８】
正規化相関を用いたＤＰマッチングは、具体的には、画像の列（縦）方向に、画像の行単位でＤＰマッチングを行ない、行単位の類似度は行同士の正規化相関を用いることによって実現する。これにより、行単位の類似度計算でノイズ・照明変動に対処すると共に、列方向のＤＰマッチングによって、列方向への歪みに対処することができる。以下に正規化相関とＤＰマッチングについて説明する。（正規化相関）
テンプレート（スライドのこと）をｔ（ｉ）（ｉ＝０，１，…，Ｎ−１）、フレームのマッチング対象画像をｆ（ｉ）（ｉ＝０，１，…，Ｎ−１）としたとき、テンプレートｔ（ｉ）と画像ｆ（ｉ）の正規化相関Ｒは次式（１）で与えられる。
【００２９】
【数１】

【００３０】
（ＤＰマッチング）
ＤＰマッチングは、ある段階ｉまでの最適化が、その一つ前の段階（ｉ−１）までの最適化と段階ｉでの最適化の和であるということに基づいて、再帰的に全過程の最適化を求めるアルゴリズムである。即ち、局所的にコスト最小を与える対応を効率的に探索するマッチングであり、非線形なずれをも吸収する。例えば以下のような漸化式によって、計算が行なわれる。ここで、座標（ｘｉ，ｙｊ）の値をｆ（ｘｉ，ｙｊ）とする。
【００３１】
【数２】

【００３２】
ここで、ｄは距離の関数である。ｆ（ｘｉ，ｙｊ）は、（２）式の３つの式の内の最も小さい値を採用する。図３はＤＰにおける漸化式の説明図である。今、ある段階（座標）（ｉ−１，ｊ−１）までの値が図に示すように与えられていると、次のポイント（ｘｉ，ｙｊ）までのルートは、（２）式により与えられるものである。
【００３３】
以下、本発明の実施の形態例について説明する。図４は本発明の全体の処理フローを示す図である。先ず、プレゼンテーションの映像からサンプリング部１０によりフレームをサンプリングして、フレームのカラー画像を得る（Ｓ１）。次に、この画像とデータベースに記憶されている各スライドを文字認識結果マッチング部１１と画像特徴マッチング部１２を用いてマッチングし（Ｓ２）、スライド決定演算部１３で最も一致するスライド番号を算出する（Ｓ３）。
【００３４】
図５は本発明によるマッチング処理のフローチャートである。本発明のマッチング処理は、文字認識結果に基づくマッチング処理を行ない（Ｓ１）、次に画像特徴に基づくマッチングを行なう（Ｓ２）ことにより行なわれる。
【００３５】
図６は本発明の一実施の形態例を示すブロック図である。図２と同一のものは、同一の符号を付して示す。図において、１０映像信号をサンプリングしてフレーム画像を得るサンプリング部、１１は該サンプリング部１０により得られたサンプリング画像から文字認識結果のマッチングを行なう文字認識結果マッチング部、１２は前記サンプリング画像から画像特徴のマッチングを行なう画像特徴マッチング部、１３は前記文字認識結果マッチング部１１と画像特徴マッチング部１２の出力を受けて所定の演算を行ない、最も類似度の高いプレゼンテーションファイルのスライド番号を決定して出力するスライド決定演算部である。
【００３６】
文字認識結果マッチング部１１において、２０はフレーム画像から文字認識を行なう文字認識処理部、２１はテンプレート画像から文字情報を抽出する文字情報抽出部、２２は文字認識処理部２０と文字情報抽出部２１の出力を受けて相互の画像の類似度を判定する対応関係生成部である。そして、該対応関係生成部２２からは、文字認識結果に基づく類似度Ｓｃが出力される。
【００３７】
画像特徴マッチング部１２において、３０はフレームサンプリング部１０からのカラーフレーム画像を入力してグレー画像に変換するグレー化部、３１は該グレー化部３０の出力を受けてスライド領域候補を抽出するスライド領域候補抽出部、３２は該スライド領域抽出部３１の出力を受けて画像を正規化する正規化部、３３は該正規化部３２の出力を受けてテンプレート画像との類似度を判定するマッチング部である。該マッチング部３３からは、画像特徴に基づく類似度Ｓ_Iが出力される。
【００３８】
１３は前記類似度Ｓｃと類似度Ｓ_Iを受けて、所定の演算を行ない、最も類似度の高いテンプレート画像の番号をスライド番号として出力するスライド決定演算部である。このように構成された装置の動作を以下に説明する。
（文字認識結果に基づくマッチング処理）
文字認識結果マッチング部１１の動作について説明する。文字認識処理部２０は、フレームサンプリング部１０からのフレーム画像を受けて静止画像を生成する静止画像生成部２０ａと、該静止画像生成部２０ａの出力を受けて静止画像を２値化する静止画像２値化部２０ｂと、該静止画像２値化部２０ｂの出力を受けて文字認識を行なう文字認識部２０ｃ（図示せず）とから構成されている。
【００３９】
静止画像生成部２０ａは、映像ファイルを入力して一定時間間隔で静止画像を生成する。ここで、映像ファイルの形式はＡＶＩである。静止画像２値化部２０ｂは、静止画像生成部２０ａで生成した全ての静止画像を２値化する。文字認識部２０ｃは、静止画像２値化部２０ｂが２値化した各静止画像中の全ての文字を認識し、認識した文字それぞれに対して（ｘ，ｙ，ｃｏｄｅ，ｃｅｒｔａｉｎｔｙ）なる４つの組の文字情報を作成して各静止画像に対応付けする。ここで、ｘ及びｙは静止画像平面での文字の外接矩形の中心点の座標であり、ｃｏｄｅは文字コードであり、ｃｅｒｔａｉｎｔｙは文字認識の信頼度である。
【００４０】
文字情報抽出部２１は、プレゼンテーションファイル（テンプレート画像）を入力し、入力したプレゼンテーションファイルの各スライドから全ての文字を抽出し、抽出した文字それぞれに対して、文字認識部と同様に、（ｘ，ｙ，ｃｏｄｅ，ｃｅｒｔａｉｎｔｙ）なる４つの組の文字情報を作成して各スライドに対応付ける。
【００４１】
対応関係生成部２２は、前記文字認識部２０ｃが作成した静止画像の文字情報と、文字情報抽出部２１が作成したスライドの文字情報とから、静止画像とスライドの一致度を算出する。そして、該対応関係生成部２２からは、文字認識に基づく類似度Ｓｃが出力され、スライド決定演算部１３に入る。
【００４２】
このように構成すれば、文字認識処理による類似度を算出することができる。（画像特徴に基づくマッチング処理）
画像特徴マッチング部１２は、プレゼンテーションの映像からサンプリングして得られるカラー画像に対し、先ずグレー画像を生成し、その中からプレゼンテーションファイルのスライドが映し出されている領域の外接矩形の候補領域を複数個抽出する。次に、スライド領域候補に対して、スライド領域の境界線をハフ変換を用いて抽出し、その境界線に基づいて正規化を行なう。ここで、ハフ変換とは、多値画像から直線を抜き出す時のパラメータを算出するための処理である。そして、その正規化画像と各スライドのテンプレート画像の類似度を正規化相関を用いたＤＰマッチングによって計算する。
【００４３】
図７は画像特徴に基づくマッチング処理を示すフローチャートである。先ず、グレー化部３０でカラー画像のグレー化を行ない（Ｓ１）、次にスライド領域候補抽出部３１でスライド領域候補の抽出を行なう（Ｓ２）。そして、抽出された画像に対して正規化部３２を用いて正規化する（Ｓ３）。最後に、マッチング部３３は、正規化されたスライド画像と予めデータベースに記憶されているスライド画像とのマッチングを行なう（Ｓ４）。このように画像特徴によるマッチング処理を行えば、映像の外的要因による変化を正規化相関をとることにより是正することができる。以下、それぞれの処理について詳細に説明する。
（カラー画像のグレー化：Ｓ１）
グレー化部３０は、プレゼンテーションの映像からサンプリング部１０で得られるカラー画像から、明度成分のグレースケール画像を抽出する。ここで、グレー化を行なうのは、画像処理が容易であるためである。Ｒ，Ｇ，Ｂ画像を適当な割合で混ぜることにより画像のグレー化を行なうことができる。
（スライド領域候補の抽出：Ｓ２）
スライド領域候補抽出部３１は、グレー画像の中からプレゼンテーションファイルのスライドが映し出されている領域の外接矩形を、以下のようにして抽出する。先ず、グレー画像からソーベルフィルタによりエッジを抽出する。次に、２値化処理によりエッジ２値画像を生成する。
【００４４】
次に、この２値画像から、ラベリング処理によって連結成分の外接矩形の集合を取り出す（▲１▼）。また、この２値画像の反転画像から、同じくラベリング処理によって連結成分の外接矩形の集合を取り出す（▲２▼）。更に、エッジ２値画像における連結成分の外接矩形集合に対し、重なり統合処理を行なうことにより、新たな矩形集合を取り出す（▲３▼）。このようにして得られた▲１▼〜▲３▼の３つの矩形集合の中から、一定以上の大きさとある範囲以内の縦横比を持つ矩形をスライド領域候補とする。以下では、これらスライド領域候補をＲｉ（ｉ＝１〜ｎ）と表わす。
【００４５】
このようにすれば、画像中に映ったプレゼンテーションファイルに所定の処理を行なって得られた画像と、データベースにあるプレゼンテーションファイルとのマッチングを行なうことにより、データベースにあるプレゼンテーションファイルと類似度の高い領域をプレゼンテーションファイル領域とすることができる。
（正規化：Ｓ３）
正規化処理は、前記したスライド領域候補Ｒｉ全てに対して行なう。正規化部３２は、以下の処理を行なう。映し出されたプレゼンテーションファイルのスライドの歪みに関して、次のようにモデル化する。即ち、歪みはスライド全体が回転（回転角θ）し、行の長さが上下で伸縮することによっても起きるとする。
【００４６】
そこで、このモデルに基づいて、各スライド領域候補内にあると仮定されるスライドを図８のようにして正規化する。図８は歪みのモデル化の説明図である。（ａ）は正規化前の状態である。この状態において、先ずスライドの境界線をハフ変換によって算出し、境界線を抽出する。次に、得られた境界線に基づいて、スライド全体を（ｂ）に示すように角度θだけ回転する。回転した矩形に対して行毎に伸縮し、（ｃ）に示すような直角のスライドを得る。
（マッチング：Ｓ４）
マッチング部３３は、正規化されたスライド領域候補と、各スライドのテンプレート画像の類似度を計算する。この場合に、マッチング部３３は、先ず（１）式に示す正規化相関を用いたテンプレートマッチングを行なう。つまり、画像の全画素に対して（１）式の計算を行ない、類似度を算出する。
【００４７】
全スライドのテンプレートの中で、この類似度が最も高い第１候補と２番目に高い第２候補との差が、あるしきい値よりも高ければ、第１候補を識別結果として出力する。次に、そうでない場合、つまり類似度が最も高い第１候補と２番目に高い第２候補との差が、あるしきい値よりも高くない場合、第１候補とその差があるしきい値以内に接近している第ｎ候補までを対象とし、正規化相関を用いたＤＰマッチングを行なう。
【００４８】
具体的には、図９に示すように、画像の列（縦）方向に、画像の行単位でＤＰマッチングを行ない、行単位の類似度を行同士の正規化相関によって計算する。このようにして、マッチング部３３は、最も類似度の高い候補の類似度Ｓ_Iを出力する。
このようにすれば、画像特徴に基づく類似度を作成することができる。
【００４９】
スライド決定演算部１３は、後述する演算に従って、文字認識結果マッチング部１１から出力される類似度Ｓｃと、画像特徴マッチング部３３から出力される類似度Ｓ_Iを受けて、後述するような演算を行ない、最も適当なスライドのスライド番号を出力する。
【００５０】
スライド領域候補Ｒｉとスライドｘに対する、文字認識結果に基づくマッチングによる類似度をＳｃ（ｉ，ｘ）、画像特徴に基づくマッチングによる類似度をＳ_I（ｉ，ｘ）とする。全スライド候補領域と、全スライドの中で、文字認識結果に基づくマッチングによる類似度の最高値をＳｃ、画像特徴に基づくマッチングによる類似度の最高値をＳ_Iとするとき、
（ａ）Ｓｃ≧ｔｈ_cかつＳ_I≧ｔｈ_I、または、Ｓｃ＜ｔｈ_cかつＳ_I＜ｔｈ_Iのとき、
Ｓ（ｉ，ｘ）＝Ｓｃ（ｉ，ｘ）＋Ｓ_I（ｉ，ｘ）
（ｂ）Ｓｃ＜ｔｈ_c且つＳ_I≧ｔｈ_Iのとき、
Ｓ（ｉ，ｘ）＝Ｓ_I（ｉ，ｘ）
（ｃ）Ｓｃ≧ｔｈ_c且つＳ_I＜ｔｈ_Iのとき
Ｓ（ｉ，ｘ）＝Ｓｃ（ｉ，ｘ）
によって、スライド領域候補Ｒｉとスライドｘに対する最終的な類似度Ｓ（ｉ，ｘ）を計算する。
【００５１】
このようにすれば、文字認識により得られるマッチングと、画像特徴に基づくマッチングを行ない、これらマッチングを行なうことにより、類似度を求めることができる。
【００５２】
そして、全スライド領域候補と全スライドの中で最大の類似度Ｓ（ｘ）を求め、それを与えるスライド領域候補Ｒｉをスライド領域と考え、ｘを最も一致するスライド番号として出力する。
このようにすれば、文字認識結果に基づく類似度と画像特徴に基づく類似度から最も確からしいスライドを決定することができる。
【００５３】
本発明によれば、図１に示すシーケンスをプログラムに記憶しておき、コンピュータに実行させることができる。これにより、映像の外的要因による変化を正規化相関をとることにより是正することができる。
【００５４】
以上、説明したように、本発明によれば、文字認識結果に基づくマッチングと画像特徴マッチングを用いて類似度を判定し、所定の演算を行なうことにより、映像信号に映っているスライドと類似度の高いプレゼンテーションファイルのスライド番号を出力することができる。この結果、ｅ−Ｌｅａｒｎｉｎｇコンテンツ（コンピュータの表示画面）には、講義画面とその講義で用いられているスライドの詳細画面を表示することができ、講義の実効を図ることができる。
【００５５】
また、本発明によれば、映像のフレームとプレゼンテーションファイルのスライドにおける画像特徴を比較してマッチングすることにおいて、正規化相関を用いることにより照明変動等の外乱による影響を抑え、ＤＰマッチングによって画像の正規化で吸収されないスライド画像の列方向への歪みに対応することができる。これにより、プレゼンテーションファイルに図やグラフだけの場合やテキスト情報が少ない場合等でも、高精度なマッチングが可能となる。
【００５６】
（付記１）第１の画像と第２の画像をマッチングさせる時に、各画素間の類似性を正規化相関値によって計算する第１のステップと、画素列間の最適対応を画素列の垂直方向への非線形マッチングによって求め、画像間の類似度とする第２のステップと、を備えることを特徴とする画像と文書のマッチング方法。
【００５７】
（付記２）あるプレゼンテーションファイルが映った画像に対し、データベースにあるプレゼンテーションファイルとマッチングすることにより、画像中のプレゼンテーションファイルを特定する方法で、画像を文字認識して得られる認識結果をもとにしたマッチングを行なうと共に、付記１記載の画像特徴に基づくマッチングを行ない、両マッチング結果の信頼性を計量し、共に信頼性が高いか、或いは共に低い時、２つの類似度の和を最終の類似度とし、それ以外の場合は、信頼性の高い方の類似度を最終の類似度とすることを特徴とする。
【００５８】
（付記３）あるプレゼンテーションファイルが映った画像に対し、データベースにあるプレゼンテーションファイルとマッチングすることにより、画像中のプレゼンテーションファイルを特定する方法で、画像から２値のエッジ画像とその反転画像を作成し、それらの画像における連結成分のうち、ある一定以上の大きさと、ある範囲以内の縦横比をもつ連結成分を、画像中のプレゼンテーションファイル領域の候補とし、その候補領域に相当する部分画像とデータベースにあるプレゼンテーションファイルとを付記２に記載したマッチング方法でマッチングし、最も高い類似度を示す領域をプレゼンテーションファイル領域とすることを特徴とする画像と文書のマッチング方法。
【００５９】
（付記４）第１の画像と第２の画像をマッチングさせる時に、各画素間の類似性を正規化相関値によって計算する第１のステップと、画素列間の最適対応を画素列の垂直方向への非線形マッチングによって求め、画像間の類似度とする第２のステップと、をコンピュータに実行させることを特徴とする画像と文書のマッチングプログラム。
【００６０】
（付記５）映像信号をサンプリングしてフレーム画像を得るサンプリング部と、該サンプリング部により得られたサンプリング画像から文字認識結果のマッチングを行なう文字認識結果マッチング部と、前記サンプリング画像から画像特徴のマッチングを行なう画像特徴マッチング部と、前記文字認識結果マッチング部と画像特徴マッチング部の出力を受けて所定の演算を行ない、最も類似度の高いプレゼンテーションファイルのスライド番号を決定して出力するスライド決定演算部と、を有することを特徴とする画像と文書のマッチング装置。
【００６１】
（付記６）前記文字認識結果マッチング部は、サンプリング画像から文字認識を行なう文字認識処理部と、テンプレート画像から文字情報を抽出する文字情報抽出部と、これら文字認識処理部と文字情報抽出部の出力を受けて、対応関係を生成して２つの画像の類似度として出力する対応関係生成部と、を有することを特徴とする付記５記載の画像と文書のマッチング装置。
【００６２】
（付記７）前記画像特徴マッチング部は、前記テンプレート画像をグレー画像に変換するグレー化部と、該グレー化部の出力を受けてスライド領域候補を抽出するスライド領域候補抽出部と、該スライド領域候補抽出部の出力を受けて正規化を行なう正規化部と、該正規化部の出力とテンプレート画像とを受けて２つの画像のマッチングを行ない類似度として出力するマッチング部と、とを有することを特徴とする付記５記載の画像と文書のマッチング装置。
【００６３】
（付記８）前記スライド決定演算部は、スライド領域候補Ｒｉとスライドｘに対する文字認識結果に基づくマッチングによる類似度Ｓｃと、画像特徴に基づくマッチングによる類似度をＳＩとして、これら類似度ＳｃとＳＩと所定のいき値とを用いて全スライド候補領域と全スライドの中で最大の類似度Ｓ（Ｘ）を求め、この類似度を与えるスライド候補領域に対応するスライドｘを決定することを特徴とする付記５記載の画像と文書のマッチング装置。
【００６４】
【発明の効果】
以上説明したように、本発明によれば、以下の効果が得られる。
（１）請求項１記載の発明によれば、文字認識により得られるマッチングと、画像特徴に基づくマッチングを行ない、これらマッチングを行なうことにより、信頼性が高い類似度を求めることができる。
（２）請求項２記載の発明によれば、画像中に映ったプレゼンテーションファイルに所定の処理を行なって得られた画像と、データベースにあるプレゼンテーションファイルとのマッチングを行なうことにより、データベースにあるプレゼンテーションファイルと類似度の高い領域をプレゼンテーションファイル領域とすることができる。
（３）請求項３記載の発明によれば、文字認識により得られるマッチングと、画像特徴に基づくマッチングを行ない、これらマッチングを行なうことにより、信頼性が高い類似度を求めることができる。
【００６５】
このように、本発明によれば、映像変動と歪みに対して、映像のフレームとプレゼンテーションファイルのスライドを精度よく画像マッチングできる画像と文書のマッチング方法及び装置並びにマッチングプログラムを提供することができる。
【図面の簡単な説明】
【図１】本発明方法の原理を示すフローチャートである。
【図２】本発明の原理ブロック図である。
【図３】ＤＰによる漸化式の説明図である。
【図４】本発明の全体の処理フローを示す図である。
【図５】本発明によるマッチング処理のフローチャートである。
【図６】本発明の一実施の形態例を示すブロック図である。
【図７】画像特徴に基づくマッチング処理を示すフローチャートである。
【図８】歪みのモデル化の説明図である。
【図９】マッチングの説明図である。
【図１０】映像中文字認識による映像とプレゼンテーションファイルの対応付けを示す図である。
【図１１】映像とプレゼンテーションファイルの対応付け結果を示す図である。
【符号の説明】
１０サンプリング部
１１文字認識結果マッチング部
１２画像特徴マッチング部
１３画像特徴マッチング部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image and document matching method and apparatus. More specifically, the present invention relates to an image and document matching method and apparatus for automatically associating each slide of a presentation file displayed on the video and displaying the optimum slide in synchronization with the display of the video.
[0002]
[Prior art]
In recent years, e-learning using the Internet in the education field is often performed using a WEB screen in remote education by a computer using the Internet or the like. WBT (Web Based Training) in e-Learning is a new educational method in which a user learns while viewing a web screen with a browser. Thereby, the user can learn at a favorite place when he / she likes it.
[0003]
In this WBT, learning content for enlarging and displaying video and materials synchronized therewith is one of the main contents. Such learning content creation requires an associating operation for associating a video explaining a material (for example, a presentation file) with a video at each time and which slide is the material displayed on the video.
[0004]
In order to realize such a system, it is necessary to synchronize temporally the display of the moving image explained using the projector and the enlarged display of the image of the slide displayed there. Currently, various e-learning authoring systems (authoring software) for creating such contents have been commercialized.
[0005]
In order to synchronize the moving image and the slide as described above, the similarity between the slide displayed on the moving image and a plurality of slides stored in advance in the storage area is evaluated, and the highest similarity is obtained. It is determined that a high slide is displayed on the moving image and is displayed on the computer screen.
[0006]
In order to perform such an operation, there is a technique for obtaining a similarity (matching) between the slide image (first image) on the moving image and the slide image (second image) stored in the computer. Necessary.
The following techniques are known as techniques for matching the first image and the second image.
[0007]
In image matching, which extracts feature points of image data and searches for corresponding points on the left and right of a moving image or stereo image, a template of the original image to be matched is created, and a search area is added to the target image to be matched. A matching processing unit that sets and correlates the features of the template with a region having the same size as the template of the search region, and obtains an image corresponding to the template image, and includes continuous time t1-1 and time t1 A correlation change determination unit that determines the degree of change in the correlation value or the statistic of the search parameter. If the correlation change is larger than a predetermined reference, the search parameter for changing the image search condition is changed. Information about the position is output (see, for example, Patent Document 1).
[0008]
However, the authoring software for associating the conventional video with the material has to perform this associating work manually, so it takes a lot of time to create the content. Therefore, the present applicant creates a still image from the video that the speaker is presenting using a presentation tool such as PowerPoint, matches the still image with each slide of the presentation file, and converts the video and the presentation file. The technology (prior art) for automatically matching was developed (Japanese Patent Application No. 2002-80500, Japanese Patent Application No. 2002-89338).
[0009]
Japanese Patent Application No. 2002-89338 proposes a method of recognizing a still image from a presentation video and matching the recognition result with each slide. FIG. 10 is a diagram showing a correspondence between a video and a presentation file by character recognition in the video. In the figure, reference numeral 1 denotes a video, which is described by a speaker with reference to a slide image 2. This video changes over time.
[0010]
A still image is obtained from the video 1 and binarized, and character extraction is performed to obtain an image 3. This image 3 and a plurality of slide images stored in the presentation file 4 are matched. That is, the image having the highest matching degree (similarity) with the image 3 is determined as the slide image shown in the video 1.
[0011]
FIG. 11 is a diagram showing a result of associating a video with a presentation file. The slides of the presentation file obtained in FIG. 10 are associated with time. The period T1 is the slide 4a, the period T2 is the slide 4b, and the period T3 is the slide 4c.
[0012]
[Patent Document 1]
Japanese Patent Application Laid-Open No. 11-120351 (page 3, page 4, FIG. 1)
[0013]
[Problems to be solved by the invention]
In the above-described prior art, an algorithm for matching a character recognition result of an image with a slide is used by absorbing a variation peculiar to the image while dealing with low reliability of character recognition in the image. However, there is a problem in that matching fails when the presentation file contains only figures and graphs or when there is little text information.
[0014]
Further, when image features in a video frame and a slide of a presentation file are compared and matched, the image degradation due to the shooting environment and the distortion of the presentation file in the video caused by projecting the presentation file are greatly affected.
[0015]
In the presentation, the video (slide image) of the presentation file to be projected varies greatly depending on factors such as lighting and weather, and also varies greatly due to the influence of the speaker itself and its shadow. Further, the external shape of the projected slide is distorted depending on how the projector and OHP (overhead projector) are installed, and is not necessarily rectangular.
[0016]
The present invention has been made in view of such a problem, and an image and document matching method and apparatus, and a matching program capable of accurately matching a video frame and a slide of a presentation file with respect to video fluctuation and distortion. The purpose is to provide.
[0017]
[Means for Solving the Problems]
Figure1 is a flowchart showing the principle of the method of the present invention. According to the present invention, the first step of calculating the similarity between the pixels based on the normalized correlation value when matching the first image and the second image, and the optimum correspondence between the pixel columns in the vertical direction of the pixel columns. And a second step of obtaining a similarity between the first image and the second image by non-linear matching. Here, the “image” of the first image and the second image is a concept including a slide and the like.
  (1) According to the first aspect of the present invention, the first step of calculating the similarity between each pixel by template matching using the normalized correlation when the first image and the second image are matched, and the pixel A method for matching an image and a document, comprising: a second step of obtaining an optimum correspondence between columns by non-linear matching of pixel columns in a vertical direction to obtain a similarity between images; By matching the presentation file in the database with the presentation file in the database for the image in which the file is reflected, while performing matching based on the recognition result obtained by character recognition of the image, Matching is performed based on image characteristics, and the reliability of both matching results is measured. Or, or when both low, the sum of the two similarity final similarity, otherwise, characterized in that the degree of similarity higher reliability and the final similarity.
[0018]
  If configured in this way,Matching obtained by character recognition and matching based on image features are performed, and by performing these matchings, a highly reliable similarity is obtained.be able to.
(2) The invention according to claim 2 is a method for identifying a presentation file in an image by matching an image showing a presentation file with a presentation file in a database.Then, a binary edge image and its inverted image are created from the image, and among the connected components in these images, a connected component having a certain size or more and an aspect ratio within a certain range is represented in the presentation. A candidate for a file area is set, a partial image corresponding to the candidate area is matched with a presentation file in the database, and an area showing the highest similarity is set as a presentation file area in the image.It is characterized by that.
[0019]
  If configured in this way,By matching an image obtained by performing a predetermined process on the presentation file shown in the image with a presentation file in the database, an area having a high similarity to the presentation file in the database is set as the presentation file area.be able to.
(3) The invention described in claim 3A first step of calculating similarity between pixels by template matching using normalized correlation when matching the first image and the second image, and determining the optimum correspondence between the pixel columns in the vertical direction of the pixel columns; A method for matching an image and a document, characterized by comprising a second step of obtaining a similarity between images obtained by non-linear matching to an image, wherein an image showing a presentation file is in a database Matching with presentation files to identify presentation files in images, matching based on recognition results obtained by character recognition of images and matching based on image characteristics, both matching Measures the reliability of the result, two when both are high or low The sum of the similarities to a final similarity, otherwise, to execute processing for the similarity of the higher reliability and the final similarity computerIt is characterized by that.
[0020]
  If configured in this way,Matching obtained by character recognition and matching based on image features are performed, and by performing these matchings, a highly reliable similarity is obtained.be able to.
[0021]
Figure2 is a principle block diagram of the present invention. In the figure, 10 is a sampling unit for sampling a video signal to obtain a frame image, 11 is a character recognition result matching unit for matching a character recognition result from the sampled image obtained by the

sampling unit

10, and 12 is from the sampled image. An image feature matching unit 13 for matching image features receives the outputs of the character recognition result matching unit 11 and the image feature matching unit 12 and performs a predetermined calculation to determine the slide number of the presentation file with the highest similarity. Is a slide determination calculation unit 13 that outputs a
[0022]
With this configuration, the similarity is determined using matching based on the character recognition result and image feature matching, and a predetermined calculation is performed, so that the slide of the presentation file having a high similarity with the slide shown in the video signal A number can be output. As a result, the lecture screen and the detailed screen of the slide used in the lecture can be displayed on the e-learning content (computer display screen), and the lecture can be effectively executed.
[0023]
In the present invention, the character recognition result matching unit includes a character recognition processing unit that performs character recognition from the sampled image, a character information extraction unit that extracts character information from the template image, and the character recognition processing unit and the character information extraction unit. And a correspondence generation unit that receives the output of the unit and generates a correspondence and outputs it as the degree of similarity between the two images.
If comprised in this way, the similarity by a character recognition process is computable.
[0024]
Further, in this invention, the image feature matching unit includes a graying unit that converts the template image into a gray image, a slide region candidate extraction unit that receives the output of the graying unit and extracts a slide region candidate, A normalization unit that receives the output of the slide region candidate extraction unit and performs normalization; and a matching unit that receives the output of the normalization unit and the template image, matches two images, and outputs the similarity It is characterized by that.
If comprised in this way, the similarity based on an image feature can be created.
[0025]
Further, in the present invention, the slide determination calculating unit calculates the similarity Sc based on the matching based on the character recognition result for the slide region candidate Ri and the slide x, and the similarity based on the matching based on the image feature S._IThese similarities Sc and S_IAnd the predetermined threshold value are used to obtain the maximum similarity S (X) among all the slide candidate areas and all the slides, and the slide x corresponding to the slide candidate area giving this similarity is determined. To do.
If comprised in this way, the most likely slide can be determined from the similarity based on the character recognition result and the similarity based on the image feature.
[0026]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
First, the basic part of the present invention will be described. The problem of video fluctuation is dealt with by using normalized correlation as an evaluation measure representing similarity between images. Normalized correlation is used to determine the degree of similarity between two images. Normalized correlation is a method based on a cross-correlation coefficient between images, and gives an evaluation value that suppresses the influence of noise and uniform brightness fluctuation, so that stable matching can be achieved against video fluctuation.
[0027]
Next, the problem of the external distortion of the projected slide is absorbed by normalization based on the boundary line of the slide and DP (dynamic programming). DP matching is a matching that efficiently searches for a correspondence that locally gives the minimum cost, and absorbs a non-linear deviation.
[0028]
In DP matching using normalized correlation, specifically, DP matching is performed for each row of the image in the column (vertical) direction of the image, and the similarity in units of rows is obtained by using normalized correlation between rows. Realize. Accordingly, it is possible to deal with noise and illumination fluctuations by calculating the similarity in units of rows, and to deal with distortion in the column direction by DP matching in the column direction. Hereinafter, normalized correlation and DP matching will be described. (Normalized correlation)
The template (slide) is t (i) (i = 0, 1,..., N−1), and the frame matching target image is f (i) (i = 0, 1,..., N−1). Then, the normalized correlation R between the template t (i) and the image f (i) is given by the following equation (1).
[0029]
[Expression 1]

[0030]
(DP matching)
DP matching is a recursive process based on the fact that the optimization up to a certain stage i is the sum of the optimization up to the previous stage (i-1) and the optimization at stage i. Is an algorithm for obtaining the optimization of That is, it is matching that efficiently searches for a correspondence that gives the minimum cost locally, and absorbs non-linear deviation. For example, the calculation is performed by the following recurrence formula. Here, the value of the coordinate (xi, yj) is assumed to be f (xi, yj).
[0031]
[Expression 2]

[0032]
Here, d is a function of distance. For f (xi, yj), the smallest value of the three expressions (2) is adopted. FIG. 3 is an explanatory diagram of a recurrence formula in DP. Now, if values up to a certain stage (coordinates) (i-1, j-1) are given as shown in the figure, the route to the next point (xi, yj) is given by equation (2). It is what
[0033]
Hereinafter, embodiments of the present invention will be described. FIG. 4 is a diagram showing an overall processing flow of the present invention. First, a frame is sampled from the presentation video by the sampling unit 10 to obtain a color image of the frame (S1). Next, the image and each slide stored in the database are matched using the character recognition result matching unit 11 and the image feature matching unit 12 (S2), and the slide determination calculation unit 13 calculates the best matching slide number. (S3).
[0034]
FIG. 5 is a flowchart of the matching process according to the present invention. The matching process of the present invention is performed by performing a matching process based on a character recognition result (S1) and then performing a matching based on an image feature (S2).
[0035]
FIG. 6 is a block diagram showing an embodiment of the present invention. The same components as those in FIG. 2 are denoted by the same reference numerals. In the figure, a sampling unit that samples 10 video signals to obtain a frame image, 11 is a character recognition result matching unit that matches a character recognition result from the sampled image obtained by the

sampling unit

10, and 12 is an image from the sampled image. An image feature matching unit 13 that performs feature matching receives the outputs of the character recognition result matching unit 11 and the image feature matching unit 12 and performs predetermined calculations to determine the slide number of the presentation file with the highest similarity. It is the slide determination calculating part to output.
[0036]
In the character recognition

result matching unit

11, 20 is a character recognition processing unit that performs character recognition from a frame image, 21 is a character information extraction unit that extracts character information from a template image, and 22 is a character recognition processing unit 20 and a character information extraction unit 21. This is a correspondence generation unit that receives the output of and determines the similarity between images. The correspondence generation unit 22 outputs the similarity Sc based on the character recognition result.
[0037]
In the image

feature matching unit

12, 30 is a graying unit that inputs a color frame image from the frame sampling unit 10 and converts it into a gray image, and 31 is a slide that receives the output of the graying unit 30 and extracts slide area candidates. An area candidate extraction unit, 32 is a normalization unit that receives the output of the slide region extraction unit 31 and normalizes the image, and 33 is a matching unit that receives the output of the normalization unit 32 and determines the similarity to the template image It is. From the matching unit 33, the similarity S based on the image features is displayed._IIs output.
[0038]
13 is the similarity Sc and similarity S_IIn response to this, the slide determination calculation unit performs a predetermined calculation and outputs the number of the template image having the highest similarity as the slide number. The operation of the apparatus configured as described above will be described below.
(Matching process based on character recognition results)
The operation of the character recognition result matching unit 11 will be described. The character recognition processing unit 20 receives a frame image from the frame sampling unit 10 and generates a still image, and a still image that receives the output of the still image generation unit 20a and binarizes the still image A binarization unit 20b and a character recognition unit 20c (not shown) that receives characters from the still image binarization unit 20b and performs character recognition are configured.
[0039]
The still image generation unit 20a receives a video file and generates a still image at regular time intervals. Here, the format of the video file is AVI. The still image binarization unit 20b binarizes all the still images generated by the still image generation unit 20a. The character recognition unit 20c recognizes all characters in each still image binarized by the still image binarization unit 20b, and four sets of (x, y, code, certificate) for each recognized character. Is created and associated with each still image. Here, x and y are the coordinates of the center point of the circumscribed rectangle of the character on the still image plane, code is the character code, and certainty is the reliability of character recognition.
[0040]
The character information extraction unit 21 inputs a presentation file (template image), extracts all characters from each slide of the input presentation file, and, for each extracted character, (x, Four sets of character information (y, code, certificate) are created and associated with each slide.
[0041]
The correspondence generation unit 22 calculates the degree of coincidence between the still image and the slide from the character information of the still image created by the character recognition unit 20c and the character information of the slide created by the character information extraction unit 21. The correspondence generation unit 22 outputs the similarity Sc based on character recognition and enters the slide determination calculation unit 13.
[0042]
If comprised in this way, the similarity by a character recognition process is computable. (Matching process based on image features)
The image feature matching unit 12 first generates a gray image for a color image obtained by sampling from the presentation video, and then selects a plurality of circumscribed rectangle candidate areas from which the slide of the presentation file is displayed. Extract. Next, the boundary line of the slide area is extracted by using the Hough transform for the slide area candidate, and normalization is performed based on the boundary line. Here, the Hough transform is a process for calculating parameters for extracting a straight line from a multi-valued image. Then, the similarity between the normalized image and the template image of each slide is calculated by DP matching using normalized correlation.
[0043]
FIG. 7 is a flowchart showing matching processing based on image features. First, the graying unit 30 grays the color image (S1), and the slide region candidate extraction unit 31 extracts slide region candidates (S2). Then, the extracted image is normalized using the normalizing unit 32 (S3). Finally, the matching unit 33 performs matching between the normalized slide image and the slide image stored in the database in advance (S4). If matching processing based on image features is performed in this way, changes due to external factors in the video can be corrected by taking a normalized correlation. Hereinafter, each process will be described in detail.
(Graying of color image: S1)
The graying unit 30 extracts a grayscale image of the brightness component from the color image obtained by the sampling unit 10 from the presentation video. Here, graying is performed because image processing is easy. Graying of an image can be performed by mixing R, G, and B images at an appropriate ratio.
(Slide area candidate extraction: S2)
The slide area candidate extraction unit 31 extracts the circumscribed rectangle of the area where the slide of the presentation file is projected from the gray image as follows. First, an edge is extracted from a gray image by a Sobel filter. Next, an edge binary image is generated by binarization processing.
[0044]
Next, a set of circumscribed rectangles of connected components is extracted from the binary image by labeling ((1)). Further, a set of circumscribed rectangles of connected components is extracted from the inverted image of the binary image by the labeling process ((2)). Further, a new rectangular set is extracted by performing overlapping integration processing on the circumscribed rectangular set of connected components in the edge binary image ((3)). Of the three rectangular sets {circle around (1)} to {circle around (3)} thus obtained, a rectangle having a certain size or more and an aspect ratio within a certain range is set as a slide region candidate. Hereinafter, these slide area candidates are represented as Ri (i = 1 to n).
[0045]
In this way, by matching the presentation file in the database with the image obtained by performing predetermined processing on the presentation file shown in the image, an area having high similarity to the presentation file in the database Can be used as a presentation file area.
(Normalization: S3)
The normalization process is performed for all the slide area candidates Ri described above. The normalizing unit 32 performs the following processing. The slide distortion of the projected presentation file is modeled as follows. That is, it is assumed that distortion also occurs when the entire slide rotates (rotation angle θ) and the row length expands and contracts vertically.
[0046]
Therefore, based on this model, the slide assumed to be in each slide region candidate is normalized as shown in FIG. FIG. 8 is an explanatory diagram of distortion modeling. (A) is a state before normalization. In this state, first, the boundary line of the slide is calculated by Hough transform, and the boundary line is extracted. Next, based on the obtained boundary line, the entire slide is rotated by an angle θ as shown in FIG. The rotated rectangle is expanded and contracted line by line to obtain a right-angle slide as shown in (c).
(Matching: S4)
The matching unit 33 calculates the similarity between the normalized slide region candidate and the template image of each slide. In this case, the matching unit 33 first performs template matching using the normalized correlation shown in Equation (1). That is, the calculation of equation (1) is performed for all the pixels of the image to calculate the similarity.
[0047]
If the difference between the first candidate having the highest similarity and the second candidate having the second highest among the templates of all slides is higher than a certain threshold value, the first candidate is output as an identification result. Next, if this is not the case, that is, if the difference between the first candidate with the highest similarity and the second candidate with the second highest is not higher than a certain threshold value, the threshold value with the difference between the first candidate and the second candidate. DP matching using normalized correlation is performed up to the n-th candidate that is close within.
[0048]
Specifically, as shown in FIG. 9, DP matching is performed for each line of the image in the column (vertical) direction of the image, and the similarity for each line is calculated by the normalized correlation between the lines. In this way, the matching unit 33 determines the similarity S of the candidate with the highest similarity._IIs output.
In this way, similarity based on image features can be created.
[0049]
The slide determination calculation unit 13 performs a similarity score Sc output from the character recognition result matching unit 11 and a similarity score S output from the image feature matching unit 33 according to a calculation described later._IIn response, the calculation as described later is performed to output the most appropriate slide number.
[0050]
For the slide area candidate Ri and slide x, the similarity based on the matching based on the character recognition result is Sc (i, x), and the similarity based on the matching based on the image feature is S_I(I, x). Among all the slide candidate areas and all the slides, Sc is the highest similarity value by matching based on the character recognition result, and S is the highest similarity value by matching based on the image feature._IAnd when
(A) Sc ≧ th_cAnd S_I≧ th_IOr Sc <th_cAnd S_I<Th_IWhen,
S (i, x) = Sc (i, x) + S_I(I, x)
(B) Sc <th_cAnd S_I≧ th_IWhen,
S (i, x) = S_I(I, x)
(C) Sc ≧ th_cAnd S_I<Th_IWhen
S (i, x) = Sc (i, x)
To calculate the final similarity S (i, x) between the slide area candidate Ri and the slide x.
[0051]
By doing this, the matching can be obtained by performing matching based on image recognition and matching based on image features, and performing these matching.
[0052]
Then, the maximum similarity S (x) among all the slide area candidates and all the slides is obtained, and the slide area candidate Ri that gives it is regarded as the slide area, and x is output as the most consistent slide number.
In this way, the most probable slide can be determined from the similarity based on the character recognition result and the similarity based on the image feature.
[0053]
According to the present invention, the sequence shown in FIG. 1 can be stored in a program and executed by a computer. Thereby, the change by the external factor of an image | video can be corrected by taking a normalization correlation.
[0054]
As described above, according to the present invention, the similarity is determined using the matching based on the character recognition result and the image feature matching, and the predetermined calculation is performed, whereby the similarity with the slide shown in the video signal is determined. The slide number of a high presentation file can be output. As a result, the lecture screen and the detailed screen of the slide used in the lecture can be displayed on the e-learning content (computer display screen), and the lecture can be effectively executed.
[0055]
In addition, according to the present invention, in comparing and matching image features in a video frame and a slide of a presentation file, the influence of disturbance such as illumination fluctuation is suppressed by using normalized correlation, and image matching is performed by DP matching. It is possible to deal with distortion in the row direction of the slide image that is not absorbed by normalization. As a result, even when the presentation file contains only figures or graphs or when there is little text information, highly accurate matching is possible.
[0056]
(Supplementary Note 1) When matching the first image and the second image, the first step of calculating the similarity between the pixels based on the normalized correlation value, and the optimal correspondence between the pixel columns in the vertical direction of the pixel columns A second step of obtaining the similarity between the images by non-linear matching to the image, and a method for matching an image and a document.
[0057]
(Appendix 2) Based on the recognition result obtained by character recognition of the image by matching the presentation file in the database with the presentation file in the database for the image in which the presentation file is reflected. In addition, the matching based on the image feature described in Appendix 1 is performed, the reliability of both matching results is measured, and when both are high or low, the sum of the two similarities is the final similarity. In other cases, the degree of similarity with higher reliability is set as the final degree of similarity.
[0058]
(Supplementary note 3) A binary edge image and its reverse image are created from the image by matching the presentation file in the database with the presentation file in the database for the image in which the presentation file is reflected. Among the connected components in these images, a connected component having a certain size or more and an aspect ratio within a certain range is set as a presentation file region candidate in the image, and the partial image and database corresponding to the candidate region are stored in the database. A matching method between an image and a document, wherein a certain presentation file is matched by the matching method described in Appendix 2 and an area showing the highest similarity is set as a presentation file area.
[0059]
(Supplementary Note 4) When matching the first image and the second image, the first step of calculating the similarity between the pixels by the normalized correlation value, and the optimum correspondence between the pixel columns in the vertical direction of the pixel columns An image and document matching program that causes a computer to execute a second step of obtaining a similarity between images obtained by non-linear matching.
[0060]
(Supplementary Note 5) A sampling unit that samples a video signal to obtain a frame image, a character recognition result matching unit that matches a character recognition result from the sampled image obtained by the sampling unit, and a matching of image features from the sampled image An image feature matching unit, and a slide determination calculation unit that performs a predetermined calculation in response to outputs from the character recognition result matching unit and the image feature matching unit, and determines and outputs the slide number of the presentation file having the highest similarity And an image and document matching device.
[0061]
(Supplementary Note 6) The character recognition result matching unit includes a character recognition processing unit that performs character recognition from a sampled image, a character information extraction unit that extracts character information from a template image, and a character recognition processing unit and a character information extraction unit. The image and document matching apparatus according to claim 5, further comprising: a correspondence generation unit that receives the output, generates a correspondence, and outputs the correspondence as a similarity between the two images.
[0062]
(Supplementary Note 7) The image feature matching unit includes a graying unit that converts the template image into a gray image, a slide region candidate extraction unit that receives an output of the graying unit and extracts a slide region candidate, and the slide region A normalization unit that receives the output of the candidate extraction unit and performs normalization; and a matching unit that receives the output of the normalization unit and the template image, matches two images, and outputs the similarity as a degree of similarity The image and document matching apparatus according to appendix 5, characterized by:
[0063]
(Additional remark 8) The said slide determination calculating part makes these similarity Sc and SI into the similarity Sc by the matching based on the character recognition result with respect to the slide area | region candidate Ri and the slide x, and the similarity by matching based on an image feature as SI. A maximum similarity S (X) among all slide candidate areas and all slides is obtained using a predetermined threshold value, and a slide x corresponding to the slide candidate area giving this similarity is determined. The image and document matching device according to appendix 5.
[0064]
【The invention's effect】
As described above, according to the present invention, the following effects can be obtained.
(1) According to the invention described in claim 1,Matching obtained by character recognition and matching based on image features are performed, and by performing these matchings, a highly reliable similarity is obtained.be able to.
(2) According to the invention described in claim 2,By matching an image obtained by performing a predetermined process on the presentation file shown in the image with a presentation file in the database, an area having a high similarity to the presentation file in the database is set as the presentation file area.be able to.
(3) According to the invention described in claim 3,Matching obtained by character recognition and matching based on image features are performed, and by performing these matchings, a highly reliable similarity is obtained.be able to.
[0065]
As described above, according to the present invention, it is possible to provide an image and document matching method and apparatus, and a matching program that can accurately perform image matching between a video frame and a slide of a presentation file against video fluctuation and distortion.
[Brief description of the drawings]
FIG. 1 is a flowchart showing the principle of the method of the present invention.
FIG. 2 is a principle block diagram of the present invention.
FIG. 3 is an explanatory diagram of a recurrence formula using DP.
FIG. 4 is a diagram showing an overall processing flow of the present invention.
FIG. 5 is a flowchart of matching processing according to the present invention.
FIG. 6 is a block diagram illustrating an exemplary embodiment of the present invention.
FIG. 7 is a flowchart showing matching processing based on image features.
FIG. 8 is an explanatory diagram of distortion modeling;
FIG. 9 is an explanatory diagram of matching.
FIG. 10 is a diagram illustrating a correspondence between a video and a presentation file by character recognition in the video.
FIG. 11 is a diagram illustrating a result of associating a video with a presentation file.
[Explanation of symbols]
10 Sampling section
11 Character recognition result matching part
12 Image feature matching section
13 Image feature matching section

Claims

A first step of calculating similarity between pixels by template matching using normalized correlation when matching the first image and the second image;
A second step of determining an optimum correspondence between the pixel columns by nonlinear matching in the vertical direction of the pixel columns and setting the similarity between the images;
A method for matching an image and a document , comprising:
By matching the presentation file in the database with the presentation file in the database for the image in which a presentation file is reflected, matching is performed based on the recognition result obtained by character recognition of the image. In addition, matching is performed based on image characteristics, the reliability of both matching results is measured, and when both are high or low, the sum of the two similarities is taken as the final similarity, otherwise Is a method for matching an image and a document, characterized in that a similarity with higher reliability is set as a final similarity.

A method for identifying a presentation file in an image by matching an image showing a presentation file with a presentation file in a database ,
A binary edge image and its inverted image are created from the image, and among the connected components in those images, a connected component having a certain size or more and an aspect ratio within a certain range is added to the presentation file area in the image. A method for matching an image and a document, characterized in that a partial image corresponding to the candidate area is matched with a presentation file in a database, and an area showing the highest similarity is used as a presentation file area in the image.

A first step of calculating similarity between pixels by template matching using normalized correlation when matching the first image and the second image;
A second step of determining an optimum correspondence between the pixel columns by nonlinear matching in the vertical direction of the pixel columns and setting the similarity between the images;
A method for matching an image and a document, comprising:
By matching the presentation file in the database with the presentation file in the database for the image in which a presentation file is reflected, matching is performed based on the recognition result obtained by character recognition of the image. In addition, matching is performed based on image characteristics, the reliability of both matching results is measured, and when both are high or low, the sum of the two similarities is taken as the final similarity, otherwise A computer program for matching an image and a document, which causes a computer to execute a process of setting a similarity with a higher reliability as a final similarity .