JP6097776B2

JP6097776B2 - Word selection device, method, and program

Info

Publication number: JP6097776B2
Application number: JP2015035818A
Authority: JP
Inventors: 陽子徳永; 隆明長谷川; 吉岡　理; 理吉岡; 鷲崎　誠司; 誠司鷲崎
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-02-25
Filing date: 2015-02-25
Publication date: 2017-03-15
Anticipated expiration: 2035-02-25
Also published as: JP2016157019A

Description

本発明は、単語選択装置、方法、プログラムに係り、特に、入力された発話内容を特定するための単語選択装置、方法、プログラムに関する。 The present invention relates to a word selection device, method, and program, and more particularly, to a word selection device, method, and program for specifying input utterance content.

従来、利用者発話の認識候補文から利用者発話文を絞り込むために、認識候補文の信頼度が主に用いられてきた。信頼度とは、その認識候補文が音響モデルや言語モデルの観点から、どれくらい尤もらしいかを表す指標であり、従来技術において、計算方法が提案されている（特許文献１）。一方、信頼度だけで利用者発話文を絞り込むには限界があり、音声認識エンジンから得られる情報以外の情報を組み合わせて利用者発話の内容を特定する手法も提案されている。 Conventionally, the reliability of a recognition candidate sentence has been mainly used to narrow down the user utterance sentence from the recognition candidate sentence of the user utterance. The reliability is an index indicating how likely the recognition candidate sentence is from the viewpoint of an acoustic model or a language model, and a calculation method has been proposed in the prior art (Patent Document 1). On the other hand, there is a limit to narrowing down the user utterance sentence only by the reliability, and a method for specifying the content of the user utterance by combining information other than information obtained from the speech recognition engine has been proposed.

従来、過去の発話履歴と単語のカテゴリ情報を用いて、認識候補文の単語にスコア付けを行い、これに基づいて利用者発話文を決定する方法がある（非特許文献１）。 Conventionally, there is a method of scoring words of recognition candidate sentences using past utterance history and word category information and determining user utterance sentences based on the scores (Non-patent Document 1).

また、住所の階層構造や施設の住所情報を活用し、カーナビ等で地図検索を行う場合に、確認の問返しを行いながら対話を進め、利用者の発話を特定する戦略を提案する方法がある（非特許文献２）。 There is also a method of proposing a strategy to identify the user's utterance by using the hierarchical structure of the address and the address information of the facility and conducting a map search with car navigation etc. while proceeding with the dialogue while asking confirmation (Non-patent document 2).

特開２０１３−０７２９２２号公報JP 2013-072922 A

藤原敬記，伊藤敏彦，荒木健治，甲斐充彦，小西達裕，伊東幸宏，“認識信頼度と対話履歴を用いた音声言語理解方法”，電子情報通信学会論文誌，Vol.J89-D No.7，pp.1493-1503, 2006Takaki Fujiwara, Toshihiko Ito, Kenji Araki, Mitsuhiko Kai, Tatsuhiro Konishi, Yukihiro Ito, “Spoken Language Understanding Method Using Recognition Reliability and Dialog History”, IEICE Transactions, Vol.J89-D No.7 , Pp.1493-1503, 2006 北岡教英，矢野浩利，中川聖一，“誤認識の修復のための自然で効率的な音声対話戦略”，情報処理学会研究報告. SLP, 音声言語情報処理,pp.37-42, 2006Kitaoka Norihide, Yano Hirotoshi, Nakagawa Seiichi, “Natural and Efficient Spoken Dialogue Strategy for Repairing Misrecognition”, Information Processing Society of Japan Research Report. SLP, Spoken Language Information Processing, pp.37-42, 2006

しかし、上記非特許文献１の技術においては、単語のバリエーションに富んだ大量の発話履歴が必要となる上、レシピ検索・地図検索など、複数のドメインを含むサービスの場合に、対話の戦略をドメイン毎に考える必要があるという問題がある。 However, in the technique of Non-Patent Document 1, a large amount of utterance history rich in word variations is required, and in the case of a service including a plurality of domains such as recipe search and map search, a dialogue strategy is set to a domain. There is a problem that it is necessary to think every time.

また、非特許文献２の技術では、サービスが対象とするドメインの検索において、階層構造を持たない検索条件がある場合、この方法で絞り込みを行うことは困難であるという問題がある。 Further, in the technique of Non-Patent Document 2, there is a problem that it is difficult to narrow down by this method when there is a search condition that does not have a hierarchical structure in the search of the domain targeted by the service.

また、従来の技術においては、過去の対話履歴や単語の階層構造の情報などの外部データが必要である。また、様々なドメインに対応するためには、ドメイン毎に内容語がどれに該当するか検討し、問返しの戦略を定める必要がある。さらに、従来はシステムから利用者に発話内容の問返しを行う際、「〜で宜しいですか？」というような確認を行う対話が主だった。しかし、提示された候補が利用者の発話意図と異なる場合に、利用者がどのように返答すればよいのかわかりにくいという問題がある。 Further, in the conventional technology, external data such as past dialogue history and information on the hierarchical structure of words is required. In addition, in order to deal with various domains, it is necessary to determine which content word corresponds to each domain and to determine a strategy for answering questions. Furthermore, in the past, when a question was returned from a system to a user, the dialogue was mainly a confirmation such as “Are you sure?”. However, there is a problem that it is difficult to understand how the user should respond when the presented candidate is different from the user's intention to speak.

本発明では、上記問題点を解決するために成されたものであり、入力された発話内容を特定するための問い返す単語を適切に選択する単語選択装置、方法、プログラムを提供することを目的とする。 The present invention has been made in order to solve the above-mentioned problems, and an object thereof is to provide a word selection device, method, and program for appropriately selecting a question-returning word for specifying input utterance content. To do.

上記目的を達成するために、第１の発明の単語選択装置は、入力された利用者の音声データに対する音声認識結果である複数の認識候補文毎に、特定の文字列の各々に対して特定の動作を行う規則である対話シナリオについて予め定められている、前記特定の文字列を抽出するための少なくとも１つの対話関数に基づいて、前記認識候補文から前記対話関数により抽出された少なくとも１つの前記特定の文字列を格納した内容配列を生成する内容解析部と、前記内容解析部により生成された前記内容配列に格納された前記少なくとも１つの前記特定の文字列に基づいて前記利用者に対して問い返す単語を選択する単語候補抽出部と、を含んで構成されている。 In order to achieve the above object, the word selection device according to the first aspect of the present invention specifies for each of a specific character string for each of a plurality of recognition candidate sentences that are voice recognition results for the input user's voice data. At least one extracted from the recognition candidate sentence by the dialogue function based on at least one dialogue function for extracting the specific character string that is predetermined for the dialogue scenario that is a rule for performing the operation of A content analysis unit that generates a content array storing the specific character string, and the user based on the at least one specific character string stored in the content array generated by the content analysis unit A word candidate extraction unit that selects words to be replied to.

第２の発明の単語選択方法は、内容解析部と、単語候補抽出部とを含む単語選択装置における、単語選択方法であって、前記内容解析部は、入力された利用者の音声データに対する音声認識結果である複数の認識候補文毎に、特定の文字列の各々に対して特定の動作を行う規則である対話シナリオについて予め定められている、前記特定の文字列を抽出するための少なくとも１つの対話関数に基づいて、前記認識候補文から前記対話関数により抽出された少なくとも１つの前記特定の文字列を格納した内容配列を生成し、前記単語候補抽出部は、前記内容解析部により生成された前記内容配列に格納された前記少なくとも１つの前記特定の文字列に基づいて前記利用者に対して問い返す単語を選択する。 A word selection method according to a second aspect of the present invention is a word selection method in a word selection device including a content analysis unit and a word candidate extraction unit, wherein the content analysis unit is configured to generate a voice for input voice data of a user. For each of a plurality of recognition candidate sentences that are recognition results, at least one for extracting the specific character string, which is predetermined for a dialogue scenario that is a rule for performing a specific action on each of the specific character strings. Based on two dialogue functions, a content array storing at least one specific character string extracted from the recognition candidate sentence by the dialogue function is generated, and the word candidate extraction unit is generated by the content analysis unit A word to be asked to the user is selected based on the at least one specific character string stored in the content array.

第１及び第２の発明によれば、内容解析部により、入力された利用者の音声データに対する音声認識結果である複数の認識候補文毎に、特定の文字列の各々に対して特定の動作を行う規則である対話シナリオについて予め定められている、特定の文字列を抽出するための少なくとも１つの対話関数に基づいて、認識候補文から対話関数により抽出された少なくとも１つの特定の文字列を格納した内容配列を生成し、単語候補抽出部により、生成された内容配列に格納された少なくとも１つの特定の文字列に基づいて利用者に対して問い返す単語を選択する。 According to the first and second inventions, the content analysis unit performs a specific operation for each of the specific character strings for each of a plurality of recognition candidate sentences that are voice recognition results for the input user's voice data. At least one specific character string extracted from the recognition candidate sentence by the interactive function based on at least one interactive function for extracting the specific character string, which is predetermined for the interactive scenario that is a rule for performing The stored content array is generated, and the word candidate extraction unit selects a word to be asked to the user based on at least one specific character string stored in the generated content array.

このように、入力された利用者の音声データに対する音声認識結果である複数の認識候補文毎に、対話シナリオについて予め定められている、少なくとも１つの対話関数に基づいて、内容配列を生成し、生成された内容配列に格納された少なくとも１つの特定の文字列に基づいて利用者に対して問い返す単語を選択することにより、入力された発話内容を特定するための問い返す単語を適切に選択することができる。 In this way, for each of a plurality of recognition candidate sentences that are voice recognition results for the input user's voice data, a content array is generated based on at least one dialogue function predetermined for the dialogue scenario, Appropriate selection of the word to be queried to identify the input utterance content by selecting the word to be queried to the user based on at least one specific character string stored in the generated content array Can do.

また、第１の発明において、前記単語候補抽出部は、前記内容解析部により生成された内容配列と、前記内容配列に格納された前記特定の文字列を抽出した前記対話関数に対応する対話関数名との組み合わせを１行として表す内容表を作成し、前記作成された内容表に前記対話関数名が１種類のみ存在する場合、前記作成された内容表に前記内容配列が一致する行が存在するときに、前記一致する行を統合し、前記内容表に前記内容配列が包含関係となる行が存在するときに、前記包含関係となる行のうち、前記内容配列に格納されている特定の文字列の数が少ない方の行を削除し、前記内容表の、前記内容配列の要素に対応する列のうち、前記特定の文字列が差異となる列が１つのみ存在するときに、前記差異となる列に含まれる前記特定の文字列の各々を前記問い返す単語として選択してもよい。 Moreover, in the first invention, the word candidate extraction unit includes a content function generated by the content analysis unit and a dialog function corresponding to the dialog function obtained by extracting the specific character string stored in the content array. If a content table that represents a combination of names as one row is created and only one type of interactive function name exists in the created content table, there is a row that matches the content array in the created content table When the matching rows are integrated, and there is a row in which the content array is in an inclusive relationship in the content table, among the rows in the inclusive relationship, a specific stored in the content array When the row with the smaller number of character strings is deleted, and there is only one column in which the specific character string is different among the columns corresponding to the elements of the content array in the content table, The identification included in the difference column Each string may be selected as the ask back word.

また、本発明のプログラムは、コンピュータを、上記の単語選択装置を構成する各部として機能させるためのプログラムである。 Moreover, the program of this invention is a program for functioning a computer as each part which comprises said word selection apparatus.

以上説明したように、本発明の単語選択装置、方法、及びプログラムによれば、入力された利用者の音声データに対する音声認識結果である複数の認識候補文毎に、対話シナリオについて予め定められている、少なくとも１つの対話関数に基づいて、内容配列を生成し、生成された内容配列に格納された少なくとも１つの特定の文字列に基づいて利用者に対して問い返す単語を選択することにより、入力された発話内容を特定するための問い返す単語を適切に選択することができる。 As described above, according to the word selection device, method, and program of the present invention, a dialogue scenario is determined in advance for each of a plurality of recognition candidate sentences that are voice recognition results for the input user's voice data. Generating a content array based on at least one interactive function and selecting a word to be queried to the user based on at least one specific character string stored in the generated content array It is possible to appropriately select a question-returning word for specifying the uttered content.

対話関数の例を示す図である。It is a figure which shows the example of a dialogue function. 本発明の第１の実施形態に係る単語選択装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the word selection apparatus which concerns on the 1st Embodiment of this invention. 内容配列の例を示す図である。It is a figure which shows the example of a content arrangement | sequence. 内容配列の例を示す図である。It is a figure which shows the example of a content arrangement | sequence. 内容配列の例を示す図である。It is a figure which shows the example of a content arrangement | sequence. 内容表の例を示す図である。It is a figure which shows the example of a content table. 内容表の例を示す図である。It is a figure which shows the example of a content table. 内容表の例を示す図である。It is a figure which shows the example of a content table. 本発明の第１の実施の形態に係る単語選択装置における単語選択処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the word selection processing routine in the word selection apparatus concerning the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る単語選択装置における信頼度に基づく問返し方法の判定処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the determination processing routine of the question return method based on the reliability in the word selection apparatus concerning the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る単語選択装置における対話関数に基づく問返し方法の判定処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the determination processing routine of the question return method based on the dialogue function in the word selection apparatus concerning the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る単語選択装置における問返し方法の決定処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the decision processing routine of the question return method in the word selection apparatus concerning the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る単語選択装置における問返しの実行処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the execution processing routine of the question return in the word selection apparatus concerning the 1st Embodiment of this invention. 本発明の第２の実施形態に係る単語選択装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the word selection apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係る単語選択装置における単語選択処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the word selection process routine in the word selection apparatus concerning the 2nd Embodiment of this invention.

以下、図面を参照して本発明の実施形態を詳細に説明する。本実施形態においては、利用者とシステムの音声対話によって、利用者の目的達成を図るサービスを想定する。以下、利用者発話とは、サービスの利用者による音声発話を指し、これを正しくテキストに変換した文を「利用者発話文」と呼ぶ。音声認識エンジンにおいて、利用者発話を認識した結果、複数の利用者発話文の候補が得られることがあり、これを「認識候補文」と呼ぶ。また、システムが音声を合成して発話することを「システム発話」と呼び。合成の元となるテキストを「システム発話文」と呼ぶ。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the present embodiment, a service that achieves the purpose of the user through a voice dialogue between the user and the system is assumed. Hereinafter, the user utterance refers to a speech utterance by a service user, and a sentence that is correctly converted into text is referred to as a “user utterance sentence”. As a result of recognizing user utterances in the speech recognition engine, a plurality of user utterance sentence candidates may be obtained, which are referred to as “recognition candidate sentences”. In addition, when the system synthesizes and speaks, it is called “system utterance”. The text that is the basis of composition is called a “system utterance”.

＜本実施形態に係る単語選択装置の原理＞
まず、本実施形態に係る単語選択装置の原理について説明する。本実施形態に係る単語選択装置は、利用者から入力される音声の音声認識結果の複数候補を解析し、問返しの必要有無や問返しで提示する選択股の抽出を行う。なお、本実施形態に係る単語選択装置は、通常の対話を行うために事前に記述された対話戦略を用いる。 <Principle of the word selection device according to the present embodiment>
First, the principle of the word selection device according to this embodiment will be described. The word selection device according to the present embodiment analyzes a plurality of candidates for speech recognition results of speech input from a user, and extracts necessity / unnecessity of question answering and selection of selected crotch presented by question answering. Note that the word selection device according to the present embodiment uses a dialog strategy described in advance in order to perform a normal dialog.

次に、本実施形態に係る単語選択装置の説明において用いる用語について説明する。 Next, terms used in the description of the word selection device according to the present embodiment will be described.

「問返し」とは、利用者発話文を正しく設定するために、システム発話で選択股を提示し、利用者に選択してもらう対話を通して、システムが利用者発話文を決定することである。例えば、下記（１）、及び（２）に示すような２つの認識候補文があるとする。 “Question return” means that the system determines the user utterance sentence through a dialogue that presents the selected crotch in the system utterance and asks the user to select it in order to correctly set the user utterance sentence. For example, assume that there are two recognition candidate sentences as shown in (1) and (2) below.

（１）「タイカレーのレシピが知りたい」
（２）「野菜カレーのレシピが知りたい」 (1) “I want to know Thai curry recipes”
(2) “I want to know the recipe of vegetable curry”

上記（１）及び（２）の２つの認識候補文から利用者発話文を選定する行為は、利用者の発話意図を特定することに繋がる。もし、誤った認識候補文を選定した場合、システムから提示される検索結果は利用者の意図とは異なるものとなり、利用者の満足度低下に繋がる。そこで、問返しの対話を通じて、正しい認識候補文を利用者に選んでもらうことを考える。 The act of selecting a user utterance sentence from the above two recognition candidate sentences (1) and (2) leads to specifying the user's utterance intention. If an erroneous recognition candidate sentence is selected, the search result presented by the system is different from the user's intention, leading to a decrease in user satisfaction. Therefore, let us consider that the correct recognition candidate sentence is selected by the user through a question-and-answer dialogue.

問返しの戦略には、全文問返しと単語問返しとの２種類が考えられる。全文問返しとは、複数の認識候補文をそのまま羅列して問い返す方法である。この場合、「タイカレーのレシピが知りたい、ですか、野菜カレーのレシピが知りたい、ですか」というシステム発話文を生成し、これを合成して発話する。利用者は違いを意識しながら再度発話し、それを受けてシステムが利用者発話文を決定する。全文問返しの戦略は、利用者の発話を一字一句間違えないように入力するような場面で有効であり、音声入力によるメール作成サービスなどでの利用が想定される。 There are two types of question answering strategies: full text answering and word answering. The full text query return is a method in which a plurality of recognition candidate sentences are enumerated as they are. In this case, a system utterance sentence “Do you want to know the recipe for Thai curry or do you want to know the recipe for vegetable curry?” Is generated, synthesized, and uttered. The user speaks again while being aware of the difference, and the system determines the user utterance sentence in response. The full-text query return strategy is effective in situations where the user's utterances are entered so as not to make a mistake in each letter, and is expected to be used in a mail creation service by voice input.

一方、単語問返しとは、複数の認識候補文のうち、内容が異なる部分のみを抽出し、それを羅列して問い返すことで、利用者発話文の一部を選択するだけで利用者発話文を決定することである。例えば、上記（１）、及び（２）の例の場合、「タイカレーですか、野菜カレーですか」というシステム発話文を生成し、これを合成して発話する。利用者は、選択肢として提示された単語を発話することで、利用者発話文を決定することができる。単語問返しの戦略は、各種ドメインのデータベース検索やウェブ検索などで有効であり、スマートフォン等の検索アプリやサイネージを用いた観光案内、カーナビでの行き先入力などでの利用が想定される。本実施形態に係る単語選択装置においては、この単語問返しに焦点を当てる。 On the other hand, word query return is a user utterance sentence only by selecting a part of the user utterance sentence by extracting only the parts with different contents from the plurality of recognition candidate sentences and enumerating them. Is to decide. For example, in the case of the above (1) and (2), a system utterance sentence “Is it Thai curry or vegetable curry?” Is generated, and this is synthesized and uttered. The user can determine the user utterance by speaking the word presented as an option. The word query return strategy is effective for database searches and web searches for various domains, and is expected to be used for searching information using a search application such as a smartphone or signage, and for entering a destination in a car navigation system. In the word selection device according to the present embodiment, this word query is focused.

単語問返しの戦略において、選択股として提示する単語は、利用者の発話意図に関わる単語のみに限定する必要がある。例えば、下記（３）〜（５）のような３つの認識候補文があるとする。 In the word query return strategy, it is necessary to limit the word presented as the selected crotch to only words related to the user's utterance intention. For example, it is assumed that there are three recognition candidate sentences as shown in (3) to (5) below.

（３）「カレーのレシピが知りたい」
（４）「カレーのレシピを知りたい」
（５）「カレーのレシピを聞きたい」 (3) “I want to know curry recipes”
(4) “I want to know the recipe of curry”
(5) “I want to hear curry recipes”

上記（３）〜（５）の認識候補文の各々は、一部単語が異なっているが、全て、カレーのレシピを検索したいという利用者の意図は同じである。また、どの利用者発話文を受けても、カレーという料理名のキーワードでレシピデータベースを検索するというシステムの振る舞いも同じである。 Each of the recognition candidate sentences in (3) to (5) above is partially different in word, but the user's intention to search for a curry recipe is the same. In addition, regardless of the user utterance sentence, the system behavior of searching the recipe database with the keyword of the dish name of curry is the same.

よって、単語問返しの戦略が適するサービスの場合、一字一句間違いなく利用者発話文を決定することは不要であり、利用者の意図やシステムの振る舞いに関わる部分が正しく認識できていれば問題ない。通常のシステムは、利用者の意図によって振る舞いが決まるため、利用者の意図さえ正しく認識できれば、それ以外の部分が誤って認識されていても問題ないと考えられる。 Therefore, it is not necessary to determine a user's utterance sentence without a word or phrase without fail for a service that is suitable for a word query return strategy, and it is a problem if the part related to the user's intention and system behavior can be recognized correctly. Absent. Since the behavior of a normal system is determined by the user's intention, if the user's intention can be correctly recognized, it is considered that there is no problem even if other parts are recognized incorrectly.

しかし、認識候補文を比較して異なる単語を用いて単語問返しを行うと、上記（３）〜（５）の認識候補文の場合には、 However, when the candidate word sentences are returned using different words by comparing the recognition candidate sentences, in the case of the recognition candidate sentences (3) to (5) above,

「“が”ですか、“を”ですか」
「“知りたい”ですか、“聞きたい”ですか」 "Is it""or" is "?"
"Do you want to know" or "Do you want to hear"?

というような、利用者にとって不要な問返しをすることになる。この場合、利用者は何を聞かれているのかわからないと感じる場合があり、また、不要な問返しに不快感を感じる可能性がある。そのため、単語問返しの戦略をとる場合、各認識候補文から利用者の意図に関わる部分を抽出し、それを比較した上で問返しの有無を判定する必要がある。 This is a question that is unnecessary for the user. In this case, the user may feel unsure what is being asked, and may feel uncomfortable with unnecessary questions. For this reason, when taking a word query return strategy, it is necessary to extract a portion related to the user's intention from each recognition candidate sentence and compare it to determine whether or not there is a query return.

「対話シナリオ」とは、対話の戦略を規定するものである。対話シナリオには、利用者発話に応じたシステムの振る舞いを、サービス提供者または運用者等が記述する。これをサービスに合わせて事前に準備し、当該対話シナリオに従って利用者とシステムの対話が制御される。具体的には、対話シナリオは、特定のキーとなる文字列、単語、形態素情報、又は時間等に基づいて、特定の動作を行う規則である。なお、本実施形態において用いる対話シナリオは、特定の文字列、又は単語に基づいて特定の動作を行う規則を規定しているものとする。 A “dialogue scenario” defines a strategy for dialogue. In the dialogue scenario, the service provider or the operator describes the behavior of the system according to the user utterance. This is prepared in advance according to the service, and the dialogue between the user and the system is controlled according to the dialogue scenario. Specifically, the dialogue scenario is a rule for performing a specific operation based on a character string, a word, morpheme information, or a time as a specific key. Note that the dialogue scenario used in the present embodiment prescribes a rule for performing a specific operation based on a specific character string or word.

「対話関数」とは、対話シナリオを記述するための部品として用いられるものである。対話を行う上で必要な情報を利用者発話文から抽出する関数、あるいは、抽出した結果から適切な次のシステム応答文やシステムの動作を選択、生成する関数のことをいう。ここでは、関数という呼び方をしているが、対話シナリオに従って呼び出される機能であり、利用者発話文または認識候補文を入力とし、対話を進めるための処理を行う機能であれば、実装方法は問わない。また、対話関数により、対話シナリオに用いる特定のキーとなる文字列、単語、形態素情報、又は時間等を取得することができる。なお、本実施形態において用いる対話関数は、特定の文字列、又は単語を抽出する関数である。 A “dialogue function” is used as a part for describing a dialogue scenario. A function that extracts information necessary for dialogue from a user's utterance sentence, or a function that selects and generates an appropriate next system response sentence or system action from the extracted result. Here, it is called a function, but it is a function that is called according to the dialogue scenario. If it is a function that takes a user utterance sentence or a recognition candidate sentence as input and performs processing to advance the dialogue, the implementation method is It doesn't matter. In addition, a character string, a word, morpheme information, time, or the like serving as a specific key used in a dialogue scenario can be acquired by the dialogue function. Note that the interactive function used in the present embodiment is a function for extracting a specific character string or word.

本実施形態においては、利用者発話文または認識候補文とともに、そのテキスト解析結果も入力としているが、これに限らない。また、利用者発話文または認識候補文だけでなく、対話を進めるために必要な任意の情報も入力として与えてもよい。例えば、センサ情報や画像情報、過去の利用履歴、時刻情報や他サービス、他データベースから得られた情報などが挙げられる。 In the present embodiment, the text analysis result is input together with the user utterance sentence or the recognition candidate sentence. However, the present invention is not limited to this. Further, not only the user utterance sentence or the recognition candidate sentence but also arbitrary information necessary for proceeding with the dialogue may be given as input. For example, sensor information, image information, past usage history, time information, other services, information obtained from other databases, and the like can be mentioned.

本実施形態においては、対話関数の出力を単語の配列とした例を示している。これは、複数の出力同士を比較する作業を簡単に行うためであり、出力形式はこれに限らない。対話関数の例を図１に示す。図１の例の対話関数は、レシピ検索とレストラン検索とが可能なサービスにおいて、対話シナリオを記述する際に用いられる対話関数の一部である。 In this embodiment, an example in which the output of the interactive function is an array of words is shown. This is to easily perform the operation of comparing a plurality of outputs, and the output format is not limited to this. An example of the interactive function is shown in FIG. The interactive function in the example of FIG. 1 is a part of the interactive function used when describing an interactive scenario in a service that allows recipe search and restaurant search.

この対話関数で抽出した単語を検索条件として用いて、項番１、及び項番２の場合はレシピデータベースを、項番３の場合はレストランデータベースを検索する検索式を生成する関数などと組み合わされ、実際に対話シナリオの中で用いられることを想定している。 Using the word extracted by this interactive function as a search condition, it is combined with a function that generates a search expression for searching the recipe database for item number 1 and item number 2, and a restaurant database for item number 3. It is assumed that it is actually used in a dialogue scenario.

ここでは、検索条件となる単語の抽出までを１つの対話関数として定義したが、抽出した結果で検索した結果を得るところまでを１つの対話関数としてもよい。対話関数は、入力された利用者発話文、又は認識候補文について、何らかの処理結果を出力されていれば、長さや定義の単位は問わない。また、対話関数は、問返しの機能を実現するために特別に作る必要はなく、既に対話シナリオの中で呼び出し関係がある関数を再利用することが想定される。 Here, the process up to extraction of a word serving as a search condition is defined as one interactive function. However, the process up to obtaining a search result based on the extracted result may be defined as one interactive function. The dialogue function may be of any length or definition unit as long as some processing result is output for the input user utterance sentence or recognition candidate sentence. In addition, it is not necessary to create a dialog function specially in order to realize the function of returning a query, and it is assumed that a function that has a calling relationship is already reused in a dialog scenario.

本実施形態に係る単語選択装置においては、各認識候補文が処理結果を得られる対話関数と、その処理結果を比較することで、利用者への問返しの必要有無や問返しに用いる選択肢を決定する。 In the word selection device according to the present embodiment, by comparing the processing result with an interactive function that allows each recognition candidate sentence to obtain a processing result, whether or not to return a question to the user and options to be used for the question return are selected. decide.

＜第１の実施形態に係る単語選択装置の構成＞
次に、第１の実施形態に係る単語選択装置の構成について説明する。図２に示すように、第１の実施形態に係る単語選択装置１００は、ＣＰＵと、ＲＡＭと、後述する単語選択処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することができる。この単語選択装置は、機能的には図２に示すように入力部１０と、演算部２０と、出力部９０とを含んで構成されている。 <Configuration of Word Selection Device According to First Embodiment>
Next, the configuration of the word selection device according to the first embodiment will be described. As shown in FIG. 2, the word selection device 100 according to the first embodiment includes a CPU, a RAM, and a ROM that stores a program for executing a word selection processing routine described later and various data. Can be configured. This word selection device is functionally configured to include an input unit 10, a calculation unit 20, and an output unit 90 as shown in FIG.

入力部１０は、マイクから入力された利用者の音声データを受け付ける。 The input unit 10 accepts user voice data input from a microphone.

演算部２０は、対話関数記憶部２２と、音声認識部２４と、信頼度判定部２６と、内容解析部２８と、単語候補抽出部３０と、問返し処理部３２と、音声合成部３４と、を含んで構成されている。 The calculation unit 20 includes a dialogue function storage unit 22, a speech recognition unit 24, a reliability determination unit 26, a content analysis unit 28, a word candidate extraction unit 30, a question answering processing unit 32, and a speech synthesis unit 34. , Including.

対話関数記憶部２２には、第１の実施形態に係る単語選択装置１００が対象とする対話システムについて、予め定められている対話関数ｆ_ｎの各々に対応する対話関数名ｎからなる対話関数名集合が記憶されている。具体的には、第１の実施形態において対象となる対話シナリオで呼び出し関係があり、後述する認識候補文を入力情報に含む、対話関数の対話関数名集合が記憶されている。また、対話関数記憶部２２には、対話関数名ｎに対応する対話関数ｆ_ｎの各々が記憶されている。第１の実施形態においては、対話関数名ｎに基づいて、当該対話関数名に対応する対話関数ｆ_ｎを呼び出すことができる。 The dialogue function storage unit 22 includes a dialogue function name consisting of dialogue function names n corresponding to each of the dialogue functions f _n determined in advance for the dialogue system targeted by the word selection device 100 according to the first embodiment. A set is stored. More specifically, a dialogue function name set of dialogue functions that have a calling relationship in the dialogue scenario that is the target in the first embodiment and includes recognition candidate sentences (to be described later) as input information is stored. Further, each of the dialogue functions f _n corresponding to the dialogue function name n is stored in the dialogue function storage unit 22. In the first embodiment, the interactive function f _n corresponding to the interactive function name can be called based on the interactive function name n.

音声認識部２４は、入力部１０において受け付けた利用者の音声データについて、音声認識を行い、当該音声データについての信頼度が上位Ｎ個の認識候補文（Ｎ−ｂｅｓｔ）に各認識候補文の信頼度を付加した認識候補の集合を認識候補群として取得し、信頼度判定部２６に出力する。ここで、信頼度は、信頼度の値の絶対値に意味があり、値が高いほど認識候補は信頼できることを表す。また、信頼度の算出方法は、例えば、特許文献１の方法を用いて算出する。なお、音声認識部２４における解析結果には、音声データをテキスト化したものが含まれていることが必須であり、かつ、認識候補群には、最も信頼度が高く、利用者発話文である可能性が高い１つの認識候補文に関する情報だけでなく、当該認識候補文を選ぶ元になる複数の認識候補文（Ｎ−ｂｅｓｔ）の情報が含まれている必要がある。また、Ｎの値は予め定められているものとする。また、認識候補各々のデータの保持方法は問わない。 The voice recognition unit 24 performs voice recognition on the user's voice data received by the input unit 10, and the reliability of the voice data is assigned to the top N recognition candidate sentences (N-best). A set of recognition candidates to which the reliability is added is acquired as a recognition candidate group, and is output to the reliability determination unit 26. Here, the reliability indicates that the absolute value of the reliability value is meaningful, and the higher the value, the more reliable the recognition candidate. The reliability calculation method is calculated using, for example, the method disclosed in Patent Document 1. It should be noted that the analysis result in the speech recognition unit 24 is required to include the text data of the speech data, and the recognition candidate group has the highest reliability and is the user utterance sentence. It is necessary to include not only information related to one recognition candidate sentence that is highly likely, but also information on a plurality of recognition candidate sentences (N-best) from which the recognition candidate sentence is selected. The value of N is assumed to be predetermined. In addition, the data holding method for each recognition candidate is not limited.

また、音声認識部２４は、問返し処理部３２の音声データの出力に対応して、入力部１０において受け付けた利用者の音声データについて、音声認識を行い、反応利用者発話文を取得して、問返し処理部３２に出力する。ここで、反応利用者発話文について、複数の認識候補が得られた場合、信頼度や認識候補文に含まれる単語などから認識候補を絞ったものを反応利用者発話文として用いる。 Further, the voice recognition unit 24 performs voice recognition on the user's voice data received by the input unit 10 in response to the output of the voice data of the question answering processing unit 32, and acquires a reaction user utterance sentence. , Output to the inquiry processing unit 32. Here, when a plurality of recognition candidates are obtained for the reaction user utterance sentence, a response user utterance sentence obtained by narrowing the recognition candidates from the word included in the reliability or the recognition candidate sentence is used.

信頼度判定部２６は、音声認識部２４から入力される認識候補群に含まれる認識候補文を、当該認識候補文の信頼度に基づいて絞り込み、認識候補文群とする。また、信頼度判定部２６は、取得した認識候補文群の問返し方法を、当該認識候補文群に含まれる認識候補文の数（要素数）に基づいて、「再発話」、「確定」、又は「保留」に設定し、認識候補文群と共に内容解析部２８に出力する。ここで、「再発話」とは、“もう一度お願いします”のように、利用者に再度発話を促すためのシステム発話を再生し、最初の音声認識部２４の処理に戻ることである。また、「確定」とは、問返しをすることなく利用者発話文を決定できる状態をいう。また、「保留」とは、現段階において問返しの方法を決定することができず、後述する処理において決定するという状態をいう。 The reliability determination unit 26 narrows down the recognition candidate sentences included in the recognition candidate group input from the speech recognition unit 24 based on the reliability of the recognition candidate sentence to obtain a recognition candidate sentence group. In addition, the reliability determination unit 26 determines the method for returning the acquired recognition candidate sentence group based on the number of recognition candidate sentences (number of elements) included in the recognition candidate sentence group. Or set to “hold” and output to the content analysis unit 28 together with the recognition candidate sentence group. Here, “recurrent utterance” means that a system utterance for prompting the user to utter again is reproduced as in “Please ask again”, and the process returns to the process of the first voice recognition unit 24. Further, “determined” refers to a state in which a user uttered sentence can be determined without asking a question. “Pending” refers to a state in which a method for answering a question cannot be determined at this stage, but is determined in a process described later.

具体的には、まず、信頼度判定部２６は、認識候補群に含まれる認識候補の各々について、メモリ（図示省略）に記憶されている信頼度の閾値と、当該認識候補の信頼度とを比較し、閾値以上である場合に、当該認識候補に含まれる認識候補文を認識候補文群に追加する。そして、信頼度判定部２６は、取得した認識候補文群に含まれる認識候補文の数（要素数）に基づいて、問返し方法を設定する。なお、認識候補文の数が０の場合、問返し方法を「再発話」に設定する。また、認識候補文の数が１である場合、問返し方法を「確定」に設定し、認識候補文群の認識候補文を、利用者発話文に設定する。また、認識候補文の数が１よりも大きい（２以上）場合、問返し方法を「保留」に設定する。 Specifically, first, the reliability determination unit 26 calculates the reliability threshold stored in the memory (not shown) and the reliability of the recognition candidate for each of the recognition candidates included in the recognition candidate group. If the comparison result is equal to or greater than the threshold, the recognition candidate sentence included in the recognition candidate is added to the recognition candidate sentence group. Then, the reliability determination unit 26 sets a question answering method based on the number of recognition candidate sentences (number of elements) included in the acquired recognition candidate sentence group. When the number of recognition candidate sentences is 0, the question answering method is set to “recurrent speech”. When the number of recognition candidate sentences is 1, the question answering method is set to “confirmed”, and the recognition candidate sentences of the recognition candidate sentence group are set as user utterance sentences. When the number of recognition candidate sentences is larger than 1 (2 or more), the question answering method is set to “hold”.

内容解析部２８は、信頼度判定部２６から入力された問返し方法が「保留」である場合、認識候補文群について、対話関数記憶部２２に記憶されている対話関数の各々に基づいて、当該認識候補文群の問返しの方法を設定する。 When the question answering method input from the reliability determination unit 26 is “pending”, the content analysis unit 28 determines the recognition candidate sentence group based on each dialogue function stored in the dialogue function storage unit 22. Sets the method for returning the recognition candidate sentence group.

具体的には、まず、内容解析部２８は、認識候補文群に含まれる認識候補文の各々について、テキスト解析を行う。ここで、テキスト解析とは、形態素解析、係り受け解析、固有表現抽出、及び述部正規化などの言語処理や、分野単語のカテゴライズなど、認識候補文とそれに含まれる単語に関する分析であればよい。 Specifically, first, the content analysis unit 28 performs text analysis for each of the recognition candidate sentences included in the recognition candidate sentence group. Here, the text analysis may be analysis related to a recognition candidate sentence and words included in it, such as language processing such as morphological analysis, dependency analysis, specific expression extraction, predicate normalization, and categorization of field words. .

次に、内容解析部２８は、対話関数記憶部２２に記憶されている対話関数名ｎの各々について、当該対話関数名ｎに基づいて呼び出される対話関数ｆ_ｎに、認識候補文ｔと、当該認識候補文のテキスト解析結果とを入力する。そして、対話関数ｆ_ｎから出力結果が得られた場合、出力された配列を内容配列оｕｔ（ｔ,ｎ）として生成する。図３、図４、及び図５に、内容配列оｕｔ（ｔ,ｎ）の例を示す。図３〜図５は、各々異なる３パターンの利用者発話を入力した場合の、各々の認識候補文群を対象とした例である。図３の例では、認識候補文が３つある場合の例を示しており、図１に示した対話関数に入力した場合の結果を表している。図４、及び図５の例では、認識候補文が２つである場合の例を示しており、図３と同様の結果を示している。 Next, for each of the dialogue function names n stored in the dialogue function storage unit 22, the content analysis unit 28 changes the recognition candidate sentence t and the dialogue function f _n called based on the dialogue function name n, Input the text analysis result of the recognition candidate sentence. When an output result is obtained from the interactive function f _n , the output array is generated as a content array оut (t, n). 3, 4, and 5 show examples of the content array оut (t, n). FIG. 3 to FIG. 5 are examples for each recognition candidate sentence group when three different patterns of user utterances are input. The example of FIG. 3 shows an example in the case where there are three recognition candidate sentences, and shows the result when input to the interactive function shown in FIG. 4 and 5 show an example in which there are two recognition candidate sentences, and the same results as in FIG. 3 are shown.

次に、内容解析部２８は、取得した内容配列оｕｔ（ｔ,ｎ）の数に基づいて、問返し方法を設定する。ここで、内容配列оｕｔ（ｔ,ｎ）の数が０である場合には、問返し方法を「再発話」に設定する。また、内容配列оｕｔ（ｔ,ｎ）の数が１である場合には、問返し方法を「確定」に設定し、内容配列の要素に対応する認識候補文を利用者発話文として設定する。また、内容配列оｕｔ（ｔ,ｎ）の数が１よりも大きい（２以上）である場合には、問返し方法を「保留」に設定し、内容配列оｕｔ（ｔ,ｎ）の各々と、内容配列оｕｔ（ｔ,ｎ）の各々の要素に対応する対話関数名、及び認識候補文の各々と共に、単語候補抽出部３０に出力する。 Next, the content analysis unit 28 sets a query return method based on the number of acquired content arrays оut (t, n). Here, if the number of the content array оut (t, n) is 0, the question answering method is set to “recurrent speech”. When the number of content arrays оut (t, n) is 1, the question answering method is set to “confirmed”, and the recognition candidate sentence corresponding to the element of the content array is set as the user utterance sentence. When the number of content arrays оut (t, n) is larger than 1 (2 or more), the inquiry method is set to “hold”, and each of the content arrays оut (t, n) The dialogue function name corresponding to each element of the content array оut (t, n) and each recognition candidate sentence are output to the word candidate extraction unit 30.

単語候補抽出部３０は、内容解析部２８から入力された問返し方法が「保留」である場合、入力された内容配列оｕｔ（ｔ,ｎ）の各々と、内容配列оｕｔ（ｔ,ｎ）の各々の要素に対応する対話関数名、及び認識候補文の各々とに基づいて、内容表を作成し、当該内容表に基づいて、「再発話」、「単語問返し」、又は「確定」の何れか１つの問返し方法を決定する。ここで、「単語問返し」とは、利用者発話の一部を羅列して問い返すシステム発話を再生し、利用者に選択させることで、利用者発話文を決定することである。 When the question return method input from the content analysis unit 28 is “pending”, the word candidate extraction unit 30 and each of the input content array оut (t, n) and the content array оut (t, n) Based on the dialogue function name corresponding to each element and each of the recognition candidate sentences, a content table is created, and based on the content table, “repeated utterance”, “word question return”, or “confirmed” Any one of the answering methods is determined. Here, “word query return” is to determine a user utterance sentence by playing back a system utterance in which a part of user utterances is enumerated and asking the user to select it.

具体的には、まず、単語候補抽出部３０は、入力された内容配列оｕｔ（ｔ,ｎ）の各々を一行とし、内容配列оｕｔ（ｔ,ｎ）に認識候補文ｔと対話関数名ｎとを対応づけた内容表を作成する。内容表は、内容配列оｕｔ（ｔ,ｎ）の数と同じ行数となる。各行において、要素がない列は空欄とする。内容表の例を図６、図７、及び図８に示す。図６は、図３の内容配列оｕｔ（ｔ,ｎ）に対応し、図７は、図４の内容配列оｕｔ（ｔ,ｎ）に対応し、図８は、図５の内容配列оｕｔ（ｔ,ｎ）に対応している。なお、図６〜図８の一列目の項番は説明の都合上記載したものであり、値の大きさは意味をなさない。 Specifically, first, the word candidate extraction unit 30 sets each input content array оut (t, n) as one line, and recognizes the recognition candidate sentence t, the interactive function name n, and the content array оut (t, n). Create a table of contents that associates. The content table has the same number of rows as the number of content arrays оut (t, n). In each row, columns with no elements are blank. Examples of content tables are shown in FIG. 6, FIG. 7, and FIG. 6 corresponds to the content array оut (t, n) of FIG. 3, FIG. 7 corresponds to the content array оut (t, n) of FIG. 4, and FIG. 8 shows the content array оut (t, t of FIG. , n). The item numbers in the first column of FIGS. 6 to 8 are described for convenience of explanation, and the magnitude of the values does not make sense.

次に、単語候補抽出部３０は、作成された内容表において、対話関数名ｎが複数種類あるか否かを判定する。ここで、単語候補抽出部３０が、図８で示した例のように、対話関数名ｎが複数種類あると判定した場合には、問返し方法を「再発話」に設定する。一方、単語候補抽出部３０が、図６、及び図７で示した例のように、対話関数名ｎが複数種類ないと判定した場合には、内容表において、内容配列оｕｔ（ｔ,ｎ）の要素が完全一致する行がある場合、一方の行を削除する。つまり、内容配列оｕｔ（ｔ,ｎ）の全要素の内容が重複する行を統合する。例えば、図６においては、項番１の行の内容配列оｕｔ（ｔ_１,Ｒｅｃｉｐｅ＿ｆｏｏｄ＿ｍｅｎｕ）の全要素と項番３の行の内容配列оｕｔ（ｔ_３,Ｒｅｃｉｐｅ＿ｆｏｏｄ＿ｍｅｎｕ）の全要素との、内容が完全一致する。そのため、項番１の行又は項番３の行の何れか一方を削除する。なお、削除するのはどちらでもよく、認識候補文に対応する信頼度の高い方を選ぶか、内容表の上に存在するものを選ぶなどの方法が考えられる。第１の実施形態においては、項番３の行を削除した場合について以後説明する。 Next, the word candidate extraction unit 30 determines whether or not there are a plurality of types of dialogue function names n in the created content table. Here, when the word candidate extraction unit 30 determines that there are a plurality of types of dialogue function names n as in the example illustrated in FIG. 8, the question answering method is set to “recurrent speech”. On the other hand, when the word candidate extraction unit 30 determines that there are not a plurality of types of dialogue function names n as in the examples shown in FIGS. 6 and 7, the content array оut (t, n) If there is a line that completely matches the element of, delete one of the lines. That is, the rows in which the contents of all the elements in the content array оut (t, n) overlap are integrated. For example, in FIG. 6, the contents of the content array оut (t ₁ , Recipe_food_menu) of the row of item number 1 and all the elements of the content array оut (t ₃ , Recipe_food_menu) of the row of item number 3 are complete. Match. For this reason, either the item number 1 row or the item number 3 row is deleted. Either method can be deleted, and a method such as selecting the one with higher reliability corresponding to the recognition candidate sentence or selecting one existing on the content table can be considered. In the first embodiment, the case where the row of item number 3 is deleted will be described below.

次に、単語候補抽出部３０は、内容表のうち、内容配列оｕｔ（ｔ,ｎ）の配列の内容と包含関係にある行がある場合、部分集合となる方の行（要素数が少ない方）を削除する。例えば、図７について説明すると。項番４の内容配列оｕｔ（ｔ_４,Ｒｅｃｉｐｅ＿ｆｏｏｄ＿ｍｅｎｕ）の配列の内容と項番５の内容配列оｕｔ（ｔ_５,Ｒｅｃｉｐｅ＿ｆｏｏｄ＿ｍｅｎｕ）の配列の内容とが包含関係にあるため、要素に含まれる特定の単語の数が少ない項番５の行を削除する。 Next, if there is a row in the content table that is in an inclusive relationship with the content of the content array оut (t, n) in the content table, the word candidate extraction unit 30 is a subset row (the one with the smaller number of elements). ) Is deleted. For example, referring to FIG. A specific word included in an element because the contents of the array of item number 4 content array оut (t ₄ , Recipe_food_menu) and the contents of the item array 5 content array оut (t ₅ , Recipe_food_menu) are inclusive relation Delete the line of item number 5 with a small number of.

次に、単語候補抽出部３０は、内容表のうち、内容配列оｕｔ（ｔ,ｎ）の要素に対応する各列について、値の差異があるかどうかを判定する。単語候補抽出部３０は、内容配列оｕｔ（ｔ,ｎ）の要素に対応する各列のうち、値に差異のある列をカウントし、ｃとする。例えば、図６の例の場合、残っている項番１の行の内容配列оｕｔ（ｔ_１,Ｒｅｃｉｐｅ＿ｆｏｏｄ＿ｍｅｎｕ）の要素と項番２の行の内容配列оｕｔ（ｔ_２,Ｒｅｃｉｐｅ＿ｆｏｏｄ＿ｍｅｎｕ）の要素とを比較すると、内容配列оｕｔ（ｔ,ｎ）の１つの要素に対応する列１のみが異なる値を持つことから、ｃ＝１となる。ここで、単語候補抽出部３０は、ｃ＝０の場合、問返し方法を「確定」に設定し、内容表に残っている要素の認識候補文を利用者発話文に設定する。また、単語候補抽出部３０は、ｃ＝１の場合、問返し方法を「単語問返し」設定し、差異のある列の値からなる単語候補配列を生成し、単語候補配列と内容表とを問返し処理部３２に出力する。図６の例の場合、単語候補配列は、（“野菜カレー”,“タイカレー”）となる。また、単語候補抽出部３０は、ｃ＞１（ｃ≧２）の場合、問返し方法を「再発話」に設定する。 Next, the word candidate extraction unit 30 determines whether or not there is a difference in value for each column corresponding to the element of the content array оut (t, n) in the content table. The word candidate extraction unit 30 counts the columns having different values among the columns corresponding to the elements of the content array iout (t, n), and sets them as c. For example, in the case of the example of FIG. 6, the element of the content array оut (t ₁ , Recipe_food_menu) of the remaining row of item number 1 is compared with the element of the content array оut (t ₂ , Recipe_food_menu) of the row of item number 2. Then, since only column 1 corresponding to one element of the content array оut (t, n) has a different value, c = 1. Here, when c = 0, the word candidate extraction unit 30 sets the question return method to “determined”, and sets the recognition candidate sentences of the elements remaining in the content table as user utterance sentences. In addition, when c = 1, the word candidate extraction unit 30 sets the question answering method to “word question answering”, generates a word candidate array composed of values in the different columns, and obtains the word candidate array and the content table. The data is output to the question return processing unit 32. In the example of FIG. 6, the word candidate sequence is (“vegetable curry”, “thai curry”). Further, the word candidate extraction unit 30 sets the question answering method to “recurrence” when c> 1 (c ≧ 2).

問返し処理部３２は、信頼度判定部２６、内容解析部２８、又は単語候補抽出部３０から入力される設定された問返し方法の種類に基づいて、必要に応じて利用者に問返しを行うシステム発話を行い、利用者発話文を確定し、出力部９０から出力する。なお、問返し方法は必ず入力され、問返し方法が「確定」の場合には、利用者発話文、問返し方法が「単語問返し」の場合には、単語候補配列と内容表とが問返し処理部３２に入力される。 The question answering unit 32 asks the user to answer questions as necessary based on the set question answering method type input from the reliability determination unit 26, the content analysis unit 28, or the word candidate extraction unit 30. A system utterance to be performed is performed, a user utterance sentence is determined, and output from the output unit 90. Note that the question answering method is always entered. When the question answering method is “Confirmed”, the user utterance sentence, and when the question answering method is “word answering”, the word candidate array and the contents table are asked. Input to the return processing unit 32.

具体的には、問返し処理部３２は、入力された設定された問返し方法の種類が「単語問返し」の場合、問返し処理部３２は、入力された単語候補配列を用いて、選択を促すシステム発話文を生成する。例えば、単語候補配列に含まれている単語の各々を読点でつなげ、最後に「どちらでしょうか。」という文をつなげると、図６の例の場合、「野菜カレー、タイカレー、どちらでしょうか。」という文が作成される。システム発話文は、単語候補配列の単語群からの選択を促すような文であればどのような文でも構わない。また、利用者に音声によって提示するのではなく画面で表示する場合、インタフェースに合わせた記述に変更する。 Specifically, when the input type of the set question answering method is “word answer”, the question answering unit 32 selects the input using the inputted word candidate array. Generate a system utterance sentence that prompts For example, if each word included in the word candidate sequence is connected by a punctuation mark and the sentence “Which is?” Is connected at the end, in the case of the example in FIG. 6, “Which is vegetable curry or Thai curry? Is created. The system utterance sentence may be any sentence as long as it prompts selection from the word group of the word candidate array. Also, when displaying on the screen instead of presenting it to the user by voice, the description is changed to match the interface.

次に、問返し処理部３２は、作成したシステム発話文を、音声合成部３４に出力し、音声合成部３４において生成された音声データを出力部９０から出力して再生する。 Next, the question answering processing unit 32 outputs the created system utterance sentence to the speech synthesis unit 34, and the speech data generated in the speech synthesis unit 34 is output from the output unit 90 and reproduced.

次に、問返し処理部３２は、音声認識部２４から入力される反応利用者発話文に、単語候補配列の何れか１つの要素のみが含まれているか判定する。問返し処理部３２が、取得した反応利用者発話文に、単語候補配列の何れか１つの要素のみが含まれていないと判定した場合には、再度同じ音声データを出力部９０から出力して再生する処理を繰り返す。ここで、繰り返しを行う回数の上限値を予め設定し、上限値を超えた場合は問返し方法を「再発話」としてもよい。一方、問返し処理部３２が、取得した反応利用者発話文に、単語候補配列の何れか１つの要素のみが含まれていると判定した場合には、含まれていると判定された単語候補に対応する認識候補文を内容表から取得し、利用者発話文として設定する。例えば、図６の例の場合、利用者が「野菜」と発話した場合、内容表の項番１の行の認識候補文「野菜カレーのレシピが知りたい」を利用者発話文として設定し、当該利用者発話文を出力部９０から出力する。 Next, the inquiry processing unit 32 determines whether the reaction user utterance input from the speech recognition unit 24 includes only one element of the word candidate sequence. If the response processing unit 32 determines that the obtained reaction user utterance does not contain only one element of the word candidate sequence, the same voice data is output from the output unit 90 again. Repeat the playback process. Here, an upper limit value of the number of repetitions may be set in advance, and when the upper limit value is exceeded, the question answering method may be “recurrent speech”. On the other hand, when the inquiry processing unit 32 determines that the obtained reaction user utterance includes only one element of the word candidate sequence, the word candidate determined to be included Is obtained from the contents table and set as a user utterance sentence. For example, in the case of FIG. 6, when the user utters “vegetable”, the recognition candidate sentence “I want to know the vegetable curry recipe” in the row of item number 1 of the contents table is set as the user utterance sentence. The user utterance sentence is output from the output unit 90.

また、問返し処理部３２は、入力された設定された問返し方法が「確定」である場合、利用者発話文を出力部９０から出力する。 In addition, the question answering processing unit 32 outputs a user utterance sentence from the output unit 90 when the inputted question answering method is “confirmed”.

また、問返し処理部３２は、入力された設定された問返し方法が「再発話」である場合、再発話を促すシステム発話文を生成し、音声合成部３４に出力する。次に、音声合成部３４から取得した再発話を促すシステム発話文に対応する音声データを出力部９０から出力して再生し、利用者発話文にｎｕｌｌを設定する。 Further, when the input set question answering method is “recurrence”, the question answering processing unit 32 generates a system utterance sentence that prompts the recurrence utterance and outputs it to the speech synthesizer 34. Next, the voice data corresponding to the system utterance sentence prompting the recurrent utterance acquired from the voice synthesizer 34 is output from the output part 90 and reproduced, and null is set in the user utterance sentence.

音声合成部３４は、与えられたテキストやその特徴に基づいて、合成音声データを生成し、出力する機能を持つ。第１の実施形態においては、問返し処理部３２から入力されるテキスト形式のシステム発話文を入力とし、合成音声データ生成し、問返し処理部３２に出力する。 The speech synthesizer 34 has a function of generating and outputting synthesized speech data based on the given text and its features. In the first embodiment, text-format system utterances input from the question answering processing unit 32 are input, synthesized speech data is generated, and output to the question answering processing unit 32.

入力には、声質、抑揚、アクセントなどの発音記号等、音声の属性に関するパラメータが含まれていてもよいし、システム発話文の中にこれらのパラメータを埋め込んだ１つのテキスト文でもよい。例えば、ＳＳＭＬ形式などが挙げられる。合成音声データの形式は限定しない。音声合成部３４は、システム発話文を含む情報を入力し、合成音声データが出力される機能であれば、どのような構成でも構わない。なお、第１の実施形態においては、特許文献３（特開２０１２−２３７９２５号公報）の技術を用いる。 The input may include parameters relating to speech attributes such as voice quality, intonation, accents, etc., or a single text sentence in which these parameters are embedded in the system utterance sentence. For example, the SSML format can be used. The format of the synthesized voice data is not limited. The speech synthesizer 34 may have any configuration as long as it receives information including system utterances and outputs synthesized speech data. In the first embodiment, the technique disclosed in Patent Document 3 (Japanese Patent Laid-Open No. 2012-237925) is used.

＜第１の実施形態に係る単語選択装置の作用＞
次に、第１の実施形態に係る単語選択装置１００の作用について説明する。入力部１０においてマイクから入力された利用者の音声データを受け付けると、単語選択装置１００によって図９〜図１３に示す単語選択処理ルーチンを実行する。 <Operation of the word selection device according to the first embodiment>
Next, the operation of the word selection device 100 according to the first embodiment will be described. When the voice data of the user input from the microphone is received in the input unit 10, the word selection processing routine shown in FIGS. 9 to 13 is executed by the word selection device 100.

まず、図９のステップＳ１００で、音声認識部２４は、対話関数記憶部２２に記憶されている対話関数名集合を取得する。 First, in step S 100 of FIG. 9, the voice recognition unit 24 acquires a dialogue function name set stored in the dialogue function storage unit 22.

次に、ステップＳ１０２で、音声認識部２４は、入力部１０において受け付けた音声データについて、音声認識を行い、Ｎ個の認識候補文を取得する。 Next, in step S102, the voice recognition unit 24 performs voice recognition on the voice data received by the input unit 10, and acquires N recognition candidate sentences.

次に、ステップＳ１０４で、音声認識部２４は、ステップＳ１０２において取得したＮ個の認識候補文の各々について信頼度を算出する。 Next, in step S104, the speech recognition unit 24 calculates the reliability for each of the N recognition candidate sentences acquired in step S102.

次に、ステップＳ１０６で、音声認識部２４は、ステップＳ１０２において取得したＮ個の認識候補文の各々に、ステップＳ１０４において取得した当該認識候補文の信頼度を付加し、認識候補群を取得する。 Next, in step S106, the speech recognition unit 24 adds the reliability of the recognition candidate sentence acquired in step S104 to each of the N recognition candidate sentences acquired in step S102, and acquires a recognition candidate group. .

次に、ステップＳ１０８で、信頼度判定部２６は、ステップＳ１０６において取得した認識候補群から当該認識候補群に含まれる認識候補文の各々の信頼度に基づいて、認識候補文群を取得し、取得した認識候補文群に含まれる認識候補文の数に基づいて、問返し方法を設定する。 Next, in step S108, the reliability determination unit 26 acquires a recognition candidate sentence group from the recognition candidate group acquired in step S106 based on the reliability of each of the recognition candidate sentences included in the recognition candidate group, A question answering method is set based on the number of recognition candidate sentences included in the acquired recognition candidate sentence group.

次に、ステップＳ１１０で、内容解析部２８は、ステップＳ１００において取得した対話関数名集合と、ステップＳ１０６において取得した、認識候補文群、及び問返し方法とに基づいて、問返し方法を設定する。 Next, in step S110, the content analysis unit 28 sets a question answering method based on the conversation function name set obtained in step S100, the recognition candidate sentence group and the question answering method obtained in step S106. .

次に、ステップＳ１１２で、単語候補抽出部３０は、ステップＳ１１０において取得した問返し方法に基づいて、問返し方法を決定する。 Next, in step S112, the word candidate extraction unit 30 determines a question answering method based on the question answering method acquired in step S110.

次に、ステップＳ１１４で、問返し処理部３２は、ステップＳ１１２において取得した問返し方法に基づいて、問返し処理を実行し、単語選択処理を終了する。 Next, in step S114, the question return processing unit 32 executes the question return processing based on the question return method acquired in step S112, and ends the word selection processing.

上記ステップＳ１０８の信頼度に基づく問返し方法の判定処理について、図１０において詳細に説明する。 The question answering method determination process based on the reliability in step S108 will be described in detail with reference to FIG.

図１０のステップＳ２００で、信頼度判定部２６は、メモリ（図示省略）に記憶されている信頼度の閾値を取得する。 In step S200 of FIG. 10, the reliability determination unit 26 acquires a reliability threshold stored in a memory (not shown).

次に、ステップＳ２０２で、信頼度判定部２６は、ステップＳ１０６において取得した認識候補群から処理対象となる認識候補を決定する。 Next, in step S202, the reliability determination unit 26 determines recognition candidates to be processed from the recognition candidate group acquired in step S106.

次に、ステップＳ２０４で、信頼度判定部２６は、処理対象となる認識候補の信頼度が、ステップＳ２００において取得した閾値以上か否かを判定する。信頼度判定部２６が、処理対象となる認識候補の信頼度が閾値以上であると判定した場合には、信頼度に基づく問返し方法の判定処理は、ステップＳ２０６へ移行する。一方、信頼度判定部２６が、処理対象となる認識候補の信頼度が閾値未満であると判定した場合には、信頼度に基づく問返し方法の判定処理は、ステップＳ２０８へ移行する。 Next, in step S204, the reliability determination unit 26 determines whether or not the reliability of the recognition candidate to be processed is equal to or greater than the threshold acquired in step S200. When the reliability determination unit 26 determines that the reliability of the recognition candidate to be processed is greater than or equal to the threshold, the determination process for the question return method based on the reliability proceeds to step S206. On the other hand, when the reliability determination unit 26 determines that the reliability of the recognition candidate to be processed is less than the threshold value, the determination process of the question answering method based on the reliability proceeds to step S208.

次に、ステップＳ２０６で、信頼度判定部２６は、処理対象となる認識候補の認識候補文を認識候補文群に追加する。 Next, in step S206, the reliability determination unit 26 adds the recognition candidate sentence of the recognition candidate to be processed to the recognition candidate sentence group.

次に、ステップＳ２０８で、信頼度判定部２６は、ステップＳ１０６において取得した認識候補群に含まれる認識候補の全てについてステップＳ２０２〜ステップＳ２０４、又はステップＳ２０６までの処理を終了したか否かを判定する。信頼度判定部２６が、ステップＳ１０６において取得した認識候補群に含まれる認識候補の全てについてステップＳ２０２〜ステップＳ２０４、又はステップＳ２０６までの処理を終了したと判定した場合には、信頼度に基づく問返し方法の判定処理は、ステップＳ２１０へ移行する。一方、信頼度判定部２６が、ステップＳ１０６において取得した認識候補群に含まれる認識候補の全てについてステップＳ２０２〜ステップＳ２０４、又はステップＳ２０６までの処理を終了していないと判定した場合には、信頼度に基づく問返し方法の判定処理は、ステップＳ２０２へ移行し、処理対象となる認識候補を変更し、ステップＳ２０４〜ステップＳ２０８までの処理を繰り返す。 Next, in step S208, the reliability determination unit 26 determines whether or not the processing from step S202 to step S204 or step S206 has been completed for all of the recognition candidates included in the recognition candidate group acquired in step S106. To do. If the reliability determination unit 26 determines that the processing from step S202 to step S204 or step S206 has been completed for all of the recognition candidates included in the recognition candidate group acquired in step S106, a question based on the reliability is obtained. The return method determination processing moves to step S210. On the other hand, when the reliability determination unit 26 determines that the processing from step S202 to step S204 or step S206 has not been completed for all the recognition candidates included in the recognition candidate group acquired in step S106, the reliability is determined. The determination process of the question return method based on the degree proceeds to step S202, changes the recognition candidate to be processed, and repeats the processes from step S204 to step S208.

次に、ステップＳ２１０で、信頼度判定部２６は、ステップＳ２０６において取得した認識候補文群に含まれる認識候補文の数が０であるか否かを判定する。信頼度判定部２６が、認識候補文群に含まれる認識候補文の数が０であると判定した場合には、信頼度に基づく問返し方法の判定処理は、ステップＳ２１２へ移行する。一方、信頼度判定部２６が、認識候補文群に含まれる認識候補文の数が０でないと判定した場合には、信頼度に基づく問返し方法の判定処理は、ステップＳ２１４へ移行する。 Next, in step S210, the reliability determination unit 26 determines whether or not the number of recognition candidate sentences included in the recognition candidate sentence group acquired in step S206 is zero. When the reliability determination unit 26 determines that the number of recognition candidate sentences included in the recognition candidate sentence group is 0, the determination process of the question return method based on the reliability proceeds to step S212. On the other hand, when the reliability determination unit 26 determines that the number of recognition candidate sentences included in the recognition candidate sentence group is not 0, the determination process of the question return method based on the reliability proceeds to step S214.

次に、ステップＳ２１２で、信頼度判定部２６は、問返し方法を「再発話」に設定し、信頼度に基づく問返し方法の判定処理を終了する。 Next, in step S212, the reliability determination unit 26 sets the question answering method to “recurrent speech”, and ends the question answering method determination process based on the reliability.

ステップＳ２１４で、信頼度判定部２６は、ステップＳ２０６において取得した認識候補文群に含まれる認識候補文の数が１であるか否かを判定する。信頼度判定部２６が、認識候補文群に含まれる認識候補文の数が１であると判定した場合には、信頼度に基づく問返し方法の判定処理は、ステップＳ２１６へ移行する。一方、信頼度判定部２６が、認識候補文群に含まれる認識候補文の数が１でないと判定した場合には、信頼度に基づく問返し方法の判定処理は、ステップＳ２２０へ移行する。 In step S214, the reliability determination unit 26 determines whether or not the number of recognition candidate sentences included in the recognition candidate sentence group acquired in step S206 is one. When the reliability determination unit 26 determines that the number of recognition candidate sentences included in the recognition candidate sentence group is 1, the determination process of the question return method based on the reliability proceeds to step S216. On the other hand, when the reliability determination unit 26 determines that the number of recognition candidate sentences included in the recognition candidate sentence group is not 1, the determination process of the question return method based on the reliability proceeds to step S220.

次に、ステップＳ２１６で、信頼度判定部２６は、問返し方法を「確定」に設定する。 Next, in step S216, the reliability determination unit 26 sets the inquiry method to “determined”.

次に、ステップＳ２１８で、信頼度判定部２６は、ステップＳ２０６において取得した認識候補文群に含まれる認識候補文を利用者発話文として設定し、信頼度に基づく問返し方法の判定処理を終了する。 Next, in step S218, the reliability determination unit 26 sets the recognition candidate sentence included in the recognition candidate sentence group acquired in step S206 as a user uttered sentence, and ends the determination process of the question return method based on the reliability. To do.

ステップＳ２２０で、信頼度判定部２６は、問返し方法を「保留」に設定し、信頼度に基づく問返し方法の判定処理を終了する。 In step S220, the reliability determination unit 26 sets the question return method to “hold”, and ends the question return method determination process based on the reliability.

上記ステップＳ１１０の対話関数に基づく問返し方法の判定処理について、図１１において詳細に説明する。 The question answering method determination process based on the interactive function in step S110 will be described in detail with reference to FIG.

図１１のステップＳ３００で、内容解析部２８は、ステップＳ１０８において取得した問返し方法が「保留」であるか否かを判定する。内容解析部２８が、問返し方法が「保留」であると判定した場合には、対話関数に基づく問返し方法の判定処理は、ステップＳ３０２へ移行する。一方、内容解析部２８が、問返し方法が「保留」でないと判定した場合には、対話関数に基づく問返し方法の判定処理を終了する。 In step S300 of FIG. 11, the content analysis unit 28 determines whether or not the question return method acquired in step S108 is “pending”. When the content analysis unit 28 determines that the question answering method is “pending”, the question answering method determination process based on the dialogue function proceeds to step S302. On the other hand, when the content analysis unit 28 determines that the question answering method is not “pending”, the question answering method determination process based on the interactive function is terminated.

次に、ステップＳ３０２で、内容解析部２８は、ステップＳ２０６において取得した認識候補文群に含まれる認識候補文のうち、処理対象となる認識候補文を決定する。 Next, in step S302, the content analysis unit 28 determines a recognition candidate sentence to be processed among the recognition candidate sentences included in the recognition candidate sentence group acquired in step S206.

次に、ステップＳ３０４で、内容解析部２８は、処理対象となる認識候補文についてテキスト解析を行う。 Next, in step S304, the content analysis unit 28 performs text analysis on the recognition candidate sentence to be processed.

次に、ステップＳ３０６で、内容解析部２８は、ステップＳ１００において取得した対話関数名集合から、処理対象となる対話関数名を決定する。 Next, in step S306, the content analysis unit 28 determines a dialogue function name to be processed from the dialogue function name set acquired in step S100.

次に、ステップＳ３０７で、内容解析部２８は、処理対象となる対話関数名ｎに基づいて呼び出される対話関数ｆ_ｎに、処理対象となる認識候補文ｔと、ステップＳ３０４において取得したテキスト解析結果とを入力する。 Next, in step S307, the content analysis unit 28 adds the recognition candidate sentence t to be processed and the text analysis result acquired in step S304 to the interaction function f _n called based on the interaction function name n to be processed. Enter.

次に、ステップＳ３０８で、内容解析部２８は、ステップＳ３０７において出力結果が得られたか否かを判定する。内容解析部２８が、出力結果が得られたと判定した場合には、対話関数に基づく問返し方法の判定処理は、ステップＳ３１０へ移行する。一方、内容解析部２８が、出力結果が得られなかったと判定した場合には、対話関数に基づく問返し方法の判定処理は、ステップＳ３１２へ移行する。 Next, in step S308, the content analysis unit 28 determines whether an output result is obtained in step S307. When the content analysis unit 28 determines that an output result has been obtained, the determination process of the question answering method based on the interactive function proceeds to step S310. On the other hand, when the content analysis unit 28 determines that the output result has not been obtained, the determination process of the question return method based on the dialogue function proceeds to step S312.

ステップＳ３１０で、内容解析部２８は、ステップＳ３０７において取得した出力された配列を内容配列оｕｔ（ｔ,ｎ）として生成する。 In step S310, the content analysis unit 28 generates the output array acquired in step S307 as a content array itut (t, n).

ステップＳ３１２で、内容解析部２８は、ステップＳ１００において取得した対話関数名集合に含まれる全ての対話関数名についてステップＳ３０６〜ステップＳ３０８、又はステップＳ３１０までの処理を終了したか否かを判定する。内容解析部２８が、対話関数名集合に含まれる全ての対話関数名についてステップＳ３０６〜ステップＳ３０８、又はステップＳ３１０までの処理を終了したと判定した場合には、対話関数に基づく問返しの判定処理は、ステップＳ３１４へ移行する。一方、内容解析部２８が、対話関数名集合に含まれる全ての対話関数名についてステップＳ３０６〜ステップＳ３０８、又はステップＳ３１０までの処理を終了していないと判定した場合には、対話関数に基づく問返しの判定処理は、ステップＳ３０６へ移行し、処理対象となる対話関数名を変更し、ステップＳ３０７〜ステップＳ３１２までの処理を繰り返す。 In step S312, the content analysis unit 28 determines whether or not the processing from step S306 to step S308 or step S310 has been completed for all the dialog function names included in the dialog function name set acquired in step S100. When the content analysis unit 28 determines that the processes from step S306 to step S308 or step S310 have been completed for all the dialog function names included in the dialog function name set, the question return determination process based on the dialog function Proceeds to step S314. On the other hand, when the content analysis unit 28 determines that the processing from step S306 to step S308 or step S310 has not been completed for all the dialogue function names included in the dialogue function name set, the question based on the dialogue function is used. In the return determination process, the process proceeds to step S306, the dialogue function name to be processed is changed, and the process from step S307 to step S312 is repeated.

次に、ステップＳ３１４で、内容解析部２８は、ステップＳ２０６において取得した認識候補文群に含まれる全ての認識候補文について、ステップＳ３０２〜ステップＳ３１２までの処理を終了したか否かを判定する。内容解析部２８が、全ての認識候補文について、ステップＳ３０２〜ステップＳ３１２までの処理を終了したと判定した場合には、対話関数に基づく問返し方法の判定処理は、ステップＳ３１６へ移行する。一方、内容解析部２８が、全ての認識候補文について、ステップＳ３０２〜ステップＳ３１２までの処理を終了していないと判定した場合には、対話関数に基づく問返し方法の判定処理は、ステップＳ３０２へ移行し、処理対象となる認識候補文を変更し、ステップＳ３０４〜ステップＳ３１４までの処理を繰り返す。 Next, in step S314, the content analysis unit 28 determines whether or not the processing from step S302 to step S312 has been completed for all the recognition candidate sentences included in the recognition candidate sentence group acquired in step S206. When the content analysis unit 28 determines that the processing from step S302 to step S312 has been completed for all the recognition candidate sentences, the determination process of the question return method based on the interactive function proceeds to step S316. On the other hand, when the content analysis unit 28 determines that the processing from step S302 to step S312 has not been completed for all the recognition candidate sentences, the determination processing of the question return method based on the interactive function proceeds to step S302. The process proceeds to change the recognition candidate sentence to be processed, and repeats the processing from step S304 to step S314.

次に、ステップＳ３１６で、内容解析部２８は、ステップＳ３１０において取得した内容配列оｕｔ（ｔ,ｎ）の数が０であるか否かを判定する。内容解析部２８が、取得した内容配列оｕｔ（ｔ,ｎ）の数が０であると判定した場合には、対話関数に基づく問返し方法の判定処理は、ステップＳ３１８へ移行する。一方、内容解析部２８が、取得した内容配列оｕｔ（ｔ,ｎ）の数が０でないと判定した場合には、対話関数に基づく問返し方法の判定処理は、ステップＳ３２０へ移行する。 Next, in step S316, the content analysis unit 28 determines whether or not the number of content arrays iout (t, n) acquired in step S310 is zero. When the content analysis unit 28 determines that the number of acquired content arrays iout (t, n) is 0, the determination process of the question return method based on the interactive function proceeds to step S318. On the other hand, when the content analysis unit 28 determines that the number of acquired content arrays оut (t, n) is not 0, the determination process of the question return method based on the interactive function proceeds to step S320.

次に、ステップＳ３１８で、内容解析部２８は、問返し方法を「再発話」に設定し、対話関数に基づく問返し方法の判定処理を終了する。 Next, in step S318, the content analysis unit 28 sets the question answering method to “recurrent speech”, and ends the question answering method determination process based on the interactive function.

ステップＳ３２０で、内容解析部２８は、ステップＳ３１０において取得した内容配列оｕｔ（ｔ,ｎ）の数が１であるか否かを判定する。内容解析部２８が、取得した内容配列оｕｔ（ｔ,ｎ）の数が１であると判定した場合には、対話関数に基づく問返し方法の判定処理は、ステップＳ３２２へ移行する。一方、内容解析部２８が、取得した内容配列оｕｔ（ｔ,ｎ）の数が１でないと判定した場合には、対話関数に基づく問返し方法の判定処理は、ステップＳ３２６へ移行する。 In step S320, the content analysis unit 28 determines whether or not the number of content arrays iout (t, n) acquired in step S310 is one. When the content analysis unit 28 determines that the number of acquired content arrays iout (t, n) is 1, the determination process of the question return method based on the interactive function proceeds to step S322. On the other hand, when the content analysis unit 28 determines that the number of acquired content arrays iout (t, n) is not 1, the determination process of the question return method based on the interactive function proceeds to step S326.

次に、ステップＳ３２２で、内容解析部２８は、問返し方法を「確定」に設定する。 Next, in step S322, the content analysis unit 28 sets the inquiry method to “determined”.

次に、ステップＳ３２４で、内容解析部２８は、ステップＳ３１０において取得した唯一の内容配列оｕｔ（ｔ,ｎ）に対応する認識候補文を利用者発話文として設定し、対話関数に基づく問返し方法の判定処理を終了する。 Next, in step S324, the content analysis unit 28 sets a recognition candidate sentence corresponding to the only content array оut (t, n) acquired in step S310 as a user utterance sentence, and a question answering method based on an interactive function. This determination process ends.

ステップＳ３２６で、内容解析部２８は、問返し方法を「保留」に設定して、対話関数に基づく問返し方法の判定処理を終了する。 In step S326, the content analysis unit 28 sets the question answering method to “hold” and ends the question answering method determination process based on the interactive function.

上記ステップＳ１１２の問返し方法の決定処理について、図１２において詳細に説明する。 The question return method determination process in step S112 will be described in detail with reference to FIG.

図１２のステップＳ４００で、単語候補抽出部３０は、ステップＳ１０８、又はステップＳ１１０において取得した問返し方法が「保留」であるか否かを判定する。単語候補抽出部３０が、問返し方法が「保留」であると判定した場合には、問返し方法の決定処理は、ステップＳ４０２へ移行する。一方、単語候補抽出部３０が、問返し方法が「保留」でないと判定した場合には、問返し方法の決定処理は終了する。 In step S400 of FIG. 12, the word candidate extraction unit 30 determines whether or not the question answering method acquired in step S108 or step S110 is “pending”. When the word candidate extraction unit 30 determines that the question answering method is “pending”, the question answering method determination process proceeds to step S402. On the other hand, when the word candidate extraction unit 30 determines that the question answering method is not “hold”, the question answering method determination process ends.

次に、ステップＳ４０２で、単語候補抽出部３０は、ステップＳ３１０において取得した内容配列に基づいて、内容表を作成する。 Next, in step S402, the word candidate extraction unit 30 creates a content table based on the content array acquired in step S310.

次に、ステップＳ４０４で、単語候補抽出部３０は、ステップＳ４０２において取得した内容表に含まれている対話関数名が複数種類存在するか否かを判定する。単語候補抽出部３０が、内容表に含まれている対話関数名が複数種類存在すると判定した場合には、問返し方法の決定処理は、ステップＳ４０６へ移行する。一方、単語候補抽出部３０が、内容表に含まれている対話関数名が１つだけ存在すると判定した場合には、問返し方法の決定処理は、ステップＳ４０８へ移行する。 Next, in step S404, the word candidate extraction unit 30 determines whether there are a plurality of types of dialogue function names included in the content table acquired in step S402. If the word candidate extraction unit 30 determines that there are a plurality of types of dialogue function names included in the content table, the question answering method determination process proceeds to step S406. On the other hand, if the word candidate extraction unit 30 determines that there is only one interactive function name included in the content table, the question return method determination process proceeds to step S408.

次に、ステップＳ４０６で、単語候補抽出部３０は、問返し方法を「再発話」に設定し、問返し方法の決定処理を終了する。 Next, in step S406, the word candidate extraction unit 30 sets the question answering method to “recurrent speech”, and ends the question answering method determination process.

ステップＳ４０８で、単語候補抽出部３０は、ステップＳ４０２において取得した内容表に、内容配列оｕｔ（ｔ,ｎ）の全要素が完全一致する行が存在するか否かを判定する。単語候補抽出部３０が、内容表に、内容配列оｕｔ（ｔ,ｎ）の全要素が完全一致する行が存在すると判定した場合には、問返し方法の決定処理は、ステップＳ４１０に移行する。一方、単語候補抽出部３０が、内容表に、内容配列оｕｔ（ｔ,ｎ）の全要素が完全一致する行が存在しない判定した場合には、問返し方法の決定処理は、ステップＳ４１２に移行する。 In step S408, the word candidate extraction unit 30 determines whether or not there is a row in the content table acquired in step S402 that completely matches all the elements of the content array оut (t, n). If the word candidate extraction unit 30 determines that there is a row in the content table in which all the elements of the content array оut (t, n) are completely identical, the inquiry return method determination process proceeds to step S410. On the other hand, if the word candidate extraction unit 30 determines that there is no line in the content table in which all the elements of the content array оut (t, n) completely match, the question return method determination process proceeds to step S412. To do.

次に、ステップＳ４１０で、単語候補抽出部３０は、ステップＳ４０８において取得した完全一致する行を一つだけ残すように統合し、その他の完全一致する行の各々を削除する。 Next, in step S410, the word candidate extraction unit 30 performs integration so as to leave only one completely matching line acquired in step S408, and deletes each of the other completely matching lines.

次に、ステップＳ４１２で、単語候補抽出部３０は、ステップＳ４０２、又は、ステップＳ４１０において取得した内容表の行ペアの間で、内容配列оｕｔ（ｔ,ｎ）からなる集合に包含関係が存在するか否かを判定する。単語候補抽出部３０が、内容表の行ペアの間で、内容配列оｕｔ（ｔ,ｎ）からなる集合に包含関係が存在すると判定した場合には、問返し方法の決定処理は、ステップＳ４１４へ移行する。一方、単語候補抽出部３０は、内容表の行ペアの間で、内容配列оｕｔ（ｔ,ｎ）からなる集合に包含関係が存在しないと判定した場合には、問返し方法の決定処理は、ステップＳ４１６へ移行する。 Next, in step S412, the word candidate extraction unit 30 has an inclusion relationship in the set of content arrays оut (t, n) between the row pairs of the content table acquired in step S402 or step S410. It is determined whether or not. If the word candidate extraction unit 30 determines that there is an inclusion relationship in the set of the content array оut (t, n) between the row pairs of the content table, the question return method determination processing proceeds to step S414. Transition. On the other hand, if the word candidate extraction unit 30 determines that there is no inclusion relationship in the set of content arrays оut (t, n) between the row pairs of the content table, The process proceeds to step S416.

次に、ステップＳ４１４で、単語候補抽出部３０は、ステップＳ４１２において取得した包含関係が存在する行ペアのうちについて、要素に含まれる単語の数が少ない行を削除する。 Next, in step S414, the word candidate extraction unit 30 deletes a line having a small number of words included in the element from the line pairs having the inclusion relation acquired in step S412.

次に、ステップＳ４１６で、単語候補抽出部３０は、ステップＳ４０２、ステップＳ４１０、又はステップＳ４１４において取得した内容表において、内容配列оｕｔ（ｔ,ｎ）の要素に対応する各列について値に差異があるか否かを判定する。単語候補抽出部３０が、内容表において、内容配列оｕｔ（ｔ,ｎ）の要素に対応する各列について値の差異があると判定した場合には、問返し方法の決定処理は、ステップＳ４２２へ移行する。一方、単語候補抽出部３０が、内容表において、内容配列оｕｔ（ｔ,ｎ）の要素に対応する各列について値の差異がないと判定した場合には、問返し方法の決定処理は、ステップＳ４１８へ移行する。 Next, in step S416, the word candidate extraction unit 30 has a difference in value for each column corresponding to the element of the content array оut (t, n) in the content table acquired in step S402, step S410, or step S414. It is determined whether or not there is. When the word candidate extraction unit 30 determines that there is a difference in value for each column corresponding to the element of the content array оut (t, n) in the content table, the question answering method determination process proceeds to step S422. Transition. On the other hand, when the word candidate extraction unit 30 determines that there is no difference in value for each column corresponding to the element of the content array оut (t, n) in the content table, The process proceeds to S418.

次に、ステップＳ４１８で、単語候補抽出部３０は、問返し方法を「確定」に設定する。 Next, in step S418, the word candidate extraction unit 30 sets the question answering method to “confirmed”.

次に、ステップＳ４２０で、単語候補抽出部３０は、ステップＳ４０２、ステップＳ４１０、又はステップＳ４１４において取得した内容表に残っている要素の認識候補文を利用者発話文に設定し、問返し方法の決定処理を終了する。 Next, in step S420, the word candidate extraction unit 30 sets the recognition candidate sentence of the element remaining in the contents table acquired in step S402, step S410, or step S414 as the user utterance sentence, The decision process is terminated.

ステップＳ４２２で、単語候補抽出部３０は、ステップＳ４１６において取得した差異がある列数ｃをカウントする。 In step S422, the word candidate extraction unit 30 counts the number of columns c having the difference acquired in step S416.

次に、ステップＳ４２４で、単語候補抽出部３０は、ステップＳ４２２において取得した列数ｃが１であるか否かを判定する。単語候補抽出部３０が、列数ｃが１であると判定した場合には、問返し方法の決定処理は、ステップＳ４２６へ移行する。一方、単語候補抽出部３０が、列数ｃが１でないと判定した場合には、問返し方法の決定処理は、ステップＳ４０６へ移行する。 Next, in step S424, the word candidate extraction unit 30 determines whether or not the number of columns c acquired in step S422 is 1. If the word candidate extraction unit 30 determines that the number of columns c is 1, the determination process of the question answering method proceeds to step S426. On the other hand, if the word candidate extraction unit 30 determines that the number of columns c is not 1, the question answering method determination process proceeds to step S406.

次に、ステップＳ４２６で、単語候補抽出部３０は、問返し方法を「単語問返し」に設定する。 Next, in step S426, the word candidate extraction unit 30 sets the question answering method to “word answering”.

次に、ステップＳ４２８で、単語候補抽出部３０は、ステップＳ４１６において取得した差異のある列の値からなる単語候補配列を生成し、問返し方法の決定処理を終了する。 Next, in step S428, the word candidate extraction unit 30 generates a word candidate array composed of the column values with differences acquired in step S416, and ends the question return method determination process.

上記ステップＳ１１４の問返しの実行処理について、図１３において詳細に説明する。 The inquiry execution process in step S114 will be described in detail with reference to FIG.

図１３のステップＳ５００で、問返し処理部３２は、ステップＳ１０８、ステップＳ１１０、又は、ステップＳ１１２において取得した問返し方法が「単語問返し」であるか否かを判定する。問返し処理部３２が、取得した問返し方法が「単語問返し」であると判定した場合には、問返しの実行処理は、ステップＳ５０２へ移行する。一方、問返し処理部３２が、取得した問返し方法が「単語問返し」でないと判定した場合には、問返しの実行処理は、ステップＳ５１８へ移行する。 In step S500 of FIG. 13, the question return processing unit 32 determines whether or not the question return method acquired in step S108, step S110, or step S112 is “word question return”. When the question answering processing unit 32 determines that the acquired question answering method is “word answering”, the question answering execution process proceeds to step S502. On the other hand, if the question answering processing unit 32 determines that the acquired question answering method is not “word answering”, the question answering execution process proceeds to step S518.

次に、ステップＳ５０２で、問返し処理部３２は、ステップＳ４２８において取得した単語候補配列に基づいて、選択を促すシステム発話文を生成し、音声合成部３４に出力する。 Next, in step S 502, the question return processing unit 32 generates a system utterance sentence that prompts selection based on the word candidate sequence acquired in step S 428, and outputs it to the speech synthesis unit 34.

次に、ステップＳ５０４で、音声合成部３４は、ステップＳ５０２において取得したシステム発話文に対応する音声データを生成し、問返し処理部３２に出力する。 Next, in step S 504, the speech synthesizer 34 generates speech data corresponding to the system utterance sentence acquired in step S 502 and outputs the speech data to the question answering processor 32.

次に、ステップＳ５０６で、問返し処理部３２は、ステップＳ５０４において取得した音声データを、出力部９０から出力して再生する。 Next, in step S506, the inquiry processing unit 32 outputs the audio data acquired in step S504 from the output unit 90 and reproduces it.

次に、ステップＳ５０８で、音声認識部２４は、利用者の発話を受け付けたか否かを判定する。音声認識部２４が、利用者の発話を受け付けたと判定した場合には、問返しの実行処理は、ステップＳ５１０へ移行する。一方、音声認識部２４が、利用者の発話を受け付けていないと判定した場合には、問返しの実行処理は、ステップＳ５０８を繰り返す。 Next, in step S508, the voice recognition unit 24 determines whether or not the user's utterance has been accepted. If the speech recognition unit 24 determines that the user's utterance has been received, the query execution process proceeds to step S510. On the other hand, when the voice recognition unit 24 determines that the user's utterance has not been accepted, the inquiry execution process repeats step S508.

次に、ステップＳ５１０で、音声認識部２４は、ステップＳ５０８において取得した利用者の音声データについて、音声認識を行い、反応利用者発話文を取得する。 Next, in step S510, the voice recognition unit 24 performs voice recognition on the voice data of the user acquired in step S508, and acquires a reaction user utterance.

次に、ステップＳ５１２で、問返し処理部３２は、ステップＳ５０８において取得した反応利用者発話文に、ステップＳ４２８において取得した単語候補配列の何れか１つの要素（単語候補）のみが含まれているか否かを判定する。問返し処理部３２が、反応利用者発話文に、単語候補配列の何れか１つの要素のみが含まれている場合には、問返しの実行処理は、ステップＳ５１４へ移行する。一方、問返し処理部３２が、反応利用者発話文に、単語候補配列の何れの要素も含まれていない場合、又は単語候補配列の２つ以上の要素が含まれている場合には、問返しの実行処理は、ステップＳ５０６へ移行し、処理を繰り返す。 Next, in step S512, the inquiry processing unit 32 includes only one element (word candidate) of the word candidate array acquired in step S428 in the reaction user utterance sentence acquired in step S508. Determine whether or not. When the question-reply processing unit 32 includes only one element of the word candidate sequence in the reaction user utterance sentence, the question-reply execution process proceeds to step S514. On the other hand, if the response user utterance sentence 32 does not include any element of the word candidate array or includes two or more elements of the word candidate array, The return execution process proceeds to step S506, and the process is repeated.

次に、ステップＳ５１４で、問返し処理部３２は、ステップＳ４０２、ステップＳ４１０、又はステップＳ４１４において取得した内容表から、ステップＳ５１２において取得した、反応利用者発話文に含まれていた単語候補に対応する認識候補文を取得し、当該認識候補文を利用者発話文として設定する。 Next, in step S514, the inquiry processing unit 32 corresponds to the word candidate included in the reaction user utterance sentence acquired in step S512 from the content table acquired in step S402, step S410, or step S414. A recognition candidate sentence to be acquired is acquired, and the recognition candidate sentence is set as a user utterance sentence.

次に、ステップＳ５１６で、問返し処理部３２は、ステップＳ５１４、ステップＳ４２０、ステップＳ３２４、又はステップＳ２１８において取得した利用者発話文を出力部９０から出力して問返しの実行処理を終了する。 Next, in step S516, the question return processing unit 32 outputs the user utterance sentence acquired in step S514, step S420, step S324, or step S218 from the output unit 90, and ends the question return execution process.

ステップＳ５１８で、問返し処理部３２は、ステップＳ１０８、ステップＳ１１０、又は、ステップＳ１１２において取得した問返し方法が「確定」であるか否かを判定する。問返し処理部３２が、取得した問返し方法が「確定」であると判定した場合には、問返しの実行処理は、ステップＳ５１６へ移行する。一方、問返し処理部３２が、取得した問返し方法が「確定」でないと判定した場合には、問返しの実行処理は、ステップＳ５１９へ移行する。 In step S518, the inquiry return processing unit 32 determines whether or not the inquiry return method acquired in step S108, step S110, or step S112 is “confirmed”. If the question answering processing unit 32 determines that the acquired question answering method is “confirmed”, the question answering execution process proceeds to step S516. On the other hand, when the question answering processing unit 32 determines that the acquired question answering method is not “confirmed”, the question answering execution process proceeds to step S519.

次に、ステップＳ５１９で、問返し処理部３２は、再発話を促すシステム発話文を生成し、音声合成部３４に出力する。 Next, in step S 519, the question return processing unit 32 generates a system utterance sentence that prompts a recurrent utterance and outputs it to the speech synthesizer 34.

次に、ステップＳ５２０で、音声合成部３４は、ステップＳ５１９で取得したシステム発話文に対応する音声データを生成し、問返し処理部３２に出力する。 Next, in step S520, the speech synthesizer 34 generates speech data corresponding to the system utterance sentence acquired in step S519, and outputs the speech data to the question return processing unit 32.

次に、ステップＳ５２２で、問返し処理部３２は、ステップＳ５２０で取得した音声データを出力部９０から出力して再生する。 Next, in step S522, the inquiry processing unit 32 outputs the audio data acquired in step S520 from the output unit 90 and reproduces it.

次に、ステップＳ５２４で、問返し処理部３２は、利用者発話文にｎｕｌｌを設定する。 Next, in step S524, the question return processing unit 32 sets null to the user utterance sentence.

次に、ステップＳ５２６で、音声認識部２４は、利用者の発話を受け付けたか否かを判定する。音声認識部２４が、利用者の発話を受け付けたと判定した場合には、問返しの実行処理は、図９のステップＳ１０２へ移行する。一方、音声認識部２４が、利用者の発話を受け付けていないと判定した場合には、問返しの実行処理は、ステップＳ５２６を繰り返す。 Next, in step S526, the voice recognition unit 24 determines whether or not the user's utterance has been accepted. When the voice recognition unit 24 determines that the user's utterance has been received, the inquiry execution process proceeds to step S102 in FIG. On the other hand, when the voice recognition unit 24 determines that the user's utterance has not been accepted, the question return execution process repeats step S526.

以上説明したように、第１の実施形態に係る単語選択装置によれば、入力された利用者の音声データに対する音声認識結果である複数の認識候補文毎に、対話シナリオについて予め定められている、少なくとも１つの対話関数に基づいて、内容配列を生成し、生成された内容配列に格納された少なくとも１つの特定の文字列に基づいて利用者に対して問い返す単語を選択することにより、入力された発話内容を特定するための問い返す単語を適切に選択することができる。 As described above, according to the word selection device according to the first embodiment, the dialogue scenario is predetermined for each of a plurality of recognition candidate sentences that are the speech recognition results for the input user's speech data. , By generating a content array based on at least one interactive function, and by selecting a word to query the user based on at least one specific string stored in the generated content array It is possible to appropriately select a question-returning word for specifying the utterance content.

また、外部の情報を用いることなく利用者発話の内容を絞り込むことができる。 In addition, the contents of user utterances can be narrowed down without using external information.

また、問返しに特化した戦略を考えるのではなく、通常の対話戦略を流用することで、問返しのためだけの対話戦略を別途作るコストを軽減することができる。 In addition, instead of thinking about a strategy specialized in answering questions, by diverting a normal dialogue strategy, it is possible to reduce the cost of separately creating a dialogue strategy only for answering questions.

また、対話戦略を用いて内容を判断することで、問返しの必要有無を認識候補文の表層だけでなく、内容に基づいて判断することもでき、不要な問返しを減らすことができる。 In addition, by determining the content using the dialogue strategy, it is possible to determine whether or not a query is necessary based not only on the surface layer of the recognition candidate sentence but also on the content, thereby reducing unnecessary questions.

また、対話戦略を流用することで、対話を進める上で必要な内容のうち、どの部分の認識が曖昧かを特定できるため、利用者に選択股を提示して問返すことができる。 Moreover, by diverting the dialogue strategy, it is possible to identify which part of the content necessary for proceeding with the dialogue is ambiguous, so that it is possible to present the selected crotch to the user and answer the question.

また、上述の内容解析部及び単語候補抽出部を有することにより、複数の認識候補文について、通常の対話戦略を実現する対話関数の処理結果を用いて内容を判断することで、不要な問返しをすることなく利用者発話文を決定することができる。また、問返しが必要な際に利用者に提示する単語の選択股を抽出し、選択行為によって利用者発話文を確保することができる。 In addition, by having the above-described content analysis unit and word candidate extraction unit, unnecessary question answering can be performed by judging the content of a plurality of recognition candidate sentences using the processing result of the dialogue function that realizes a normal dialogue strategy. A user's utterance sentence can be determined without performing. In addition, it is possible to extract a selected crotch of a word to be presented to the user when a question needs to be returned, and to secure a user utterance sentence by a selection act.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、第１の実施形態において、問返しの方法は、「単語問合せ」、「確定」、及び「再発話」の３種類を用いる場合について説明したが、これに限定されるものではなく、問返し方法は、「単語問合せ」、「確定」、及び「再発話」の３種類の何れかに限らず、他の種類とそれを判別する機能を組み合わせて利用してもよい。 For example, in the first embodiment, the case of using three types of “word query”, “determined”, and “recurrent speech” has been described as the method of query return, but the present invention is not limited to this. The return method is not limited to any of the three types of “word inquiry”, “determined”, and “recurrent speech”, but may be used in combination with another type and a function for discriminating it.

また、第１の実施形態において、内容解析部２８において、対話関数記憶部２２に記憶されている対話関数を全て用いる場合について説明したが、これに限定されるものではない。例えば、対話関数記憶部２２に記憶されている対話関数の各々に優先度が設定されている場合には、対象となる対訳候補文に当該優先度順に対話関数を用いて、一番最初に出力が得られた対話関数の結果のみを保持し、次の対象となる対訳候補文の処理に移行してもよい。このようにすることで、優先度を反映した結果を得ることができる。また、対話シナリオの中で、優先的に処理される対話がある場合、当該処理に対応する対話関数の名前を前に、優先度が低い対話関数の名前を後ろに並べることで、より対話戦略を反映した内容解析が可能になる。また、対話関数に優先度が設定されていない場合、出力が得られた全ての対話関数の結果を保持し、別の処理で、当該結果を比較する何らかの処理を加えてもよい。例えば、対話関数の出力結果である内容配列のうち当該内容配列に含まれる要素数が一番多いもののみ結果として保持してもよい。 Further, in the first embodiment, the case has been described in which the content analysis unit 28 uses all the dialogue functions stored in the dialogue function storage unit 22. However, the present invention is not limited to this. For example, when the priority is set for each of the dialogue functions stored in the dialogue function storage unit 22, the dialogue functions are used in the priority order for the target bilingual candidate sentence, and output first. It is also possible to hold only the result of the interactive function obtained as described above and proceed to the processing of the next candidate translation sentence. In this way, a result reflecting the priority can be obtained. Also, if there is a conversation that is processed preferentially in the conversation scenario, the conversation function name is arranged before the name of the conversation function corresponding to the process, and the conversation function with the lower priority is placed behind. Content analysis that reflects In addition, when priority is not set for the interactive function, it is possible to add a process for holding the results of all the interactive functions for which output is obtained and comparing the results in another process. For example, the content array that is the output result of the interactive function may be held as a result only for the content array that includes the largest number of elements.

また、第１の実施形態において、信頼度は、値の絶対値に意味がある信頼度を用いる場合について説明したが、これに限定されるものではない。例えば、値の絶対値に意味を持たないが、認識候補文間で信頼度の大きさ意を比較した差や割合に意味を持つ値等、性質が異なる信頼度を用いてもよい。この場合、上述の信頼度判定部２６において、各因子気候補の信頼度同士を比較するなど、比較方法の変更が必要となる、信頼度判定部２６の処理を一部変更する必要がある。なお、何らかの方法で信頼できる認識候補を選出することができれば、処理方法は問わない。 Further, in the first embodiment, the case has been described where the reliability uses a reliability that is meaningful for the absolute value of the value, but is not limited thereto. For example, although the absolute value of the value has no meaning, a reliability having different properties such as a difference or a ratio having a significance in comparing the degree of reliability between recognition candidate sentences may be used. In this case, in the above-described reliability determination unit 26, it is necessary to partially change the processing of the reliability determination unit 26 that requires a comparison method change, such as comparing the reliability of each factor candidate. Note that the processing method is not limited as long as a reliable recognition candidate can be selected by some method.

また、第１の実施形態において、認識候補は、認識候補文と当該認識候補文の信頼度との組み合わせの場合について説明したが、これに限定されるものではない。例えば、認識候補は、認識候補文と、テキストの読み仮名情報、品詞情報などの情報が付随してもよい。 In the first embodiment, the case where the recognition candidate is a combination of the recognition candidate sentence and the reliability of the recognition candidate sentence has been described. However, the present invention is not limited to this. For example, a recognition candidate may accompany a recognition candidate sentence and information such as text reading kana information and part-of-speech information.

また、第１の実施形態において、音声認識部は、音声データを入力し、認識候補群を出力する機能を有していれば、どのような構成をとっても構わない。例えば、特許文献１、又は特許文献２（特開２０１２−０３２５３８号公報）などの方法を用いてもよい。 In the first embodiment, the speech recognition unit may have any configuration as long as it has a function of inputting speech data and outputting a recognition candidate group. For example, you may use methods, such as patent document 1 or patent document 2 (Unexamined-Japanese-Patent No. 2012-032538).

また、第１の実施形態において、信頼度の閾値は、メモリに記憶されている固定値を用いる場合について説明したが、これに限定されるものではない。例えば、呼び出す毎に指定しても、サービスを利用する周囲の環境や、利用者等に応じて変更してもよい。 In the first embodiment, the case where the fixed value stored in the memory is used as the reliability threshold value has been described. However, the present invention is not limited to this. For example, it may be specified each time it is called or may be changed according to the surrounding environment where the service is used, the user, and the like.

また、第１の実施形態においては、内容表は表の形で実現する場合について説明したが、これに限定されるものではない。例えば、各認識候補文における対話関数の出力結果を比較できる方法であればよい。 In the first embodiment, the case where the content table is realized in the form of a table has been described. However, the present invention is not limited to this. For example, any method that can compare the output results of interactive functions in the respective recognition candidate sentences may be used.

また、第１の実施形態においては、音声による問返しを想定したサービスの例について説明したが、これに限定されるものではない。例えば、サイネージやスマートフォンに問返し内容を表示する場合、システム応答文や選択を促す表示するインタフェースを作成し、画面に表示する機能として実現されるようにしてもよい。 Further, in the first embodiment, an example of a service that assumes an answering question by voice has been described, but the present invention is not limited to this. For example, when displaying the contents of an inquiry on a signage or a smartphone, a system response sentence or an interface for prompting selection may be created and realized as a function of displaying on a screen.

また、第１の実施形態においては、システムが音声で返答することを想定しているがこれに限定されるものではない。例えば、画面に文章を表示するなど、システム側は音声以外のインタフェースを用いて利用者と対話を行ってもよい。 In the first embodiment, it is assumed that the system responds by voice, but the present invention is not limited to this. For example, the system side may interact with the user using an interface other than voice, such as displaying text on the screen.

また、第１の実施形態においては、対話関数に基づいて出力される内容配列の各要素は単語である場合について説明したが、これに限定されるものではない。例えば、対話関数に基づいて出力される内容配列の各要素は文字列であってもよい。 In the first embodiment, the case where each element of the content array output based on the interactive function is a word has been described. However, the present invention is not limited to this. For example, each element of the content array output based on the interactive function may be a character string.

次に、第２の実施形態に係る単語選択装置について説明する。 Next, a word selection device according to the second embodiment will be described.

第２の実施形態においては、信頼度を用いない点が第１の実施形態と異なる。なお、第１の実施形態に係る単語選択装置と同様の構成及び作用については、同一の符号を付して説明を省略する。 The second embodiment is different from the first embodiment in that the reliability is not used. In addition, about the structure and effect | action similar to the word selection apparatus concerning 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

＜第２の実施形態に係る単語選択装置の構成＞
次に、第２の実施形態に係る単語選択装置の構成について説明する。図１４に示すように、第２の実施形態に係る単語選択装置２００は、ＣＰＵと、ＲＡＭと、後述する単語選択処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することができる。この単語選択装置は、機能的には図１４に示すように入力部１０と、演算部２２０と、出力部９０とを含んで構成されている。 <Configuration of Word Selection Device According to Second Embodiment>
Next, the configuration of the word selection device according to the second embodiment will be described. As shown in FIG. 14, a word selection device 200 according to the second embodiment includes a CPU, a RAM, and a ROM that stores a program for executing a word selection processing routine described later and various data. Can be configured. This word selection device is functionally configured to include an input unit 10, a calculation unit 220, and an output unit 90 as shown in FIG.

演算部２２０は、対話関数記憶部２２と、音声認識部２２４と、内容解析部２８と、単語候補抽出部３０と、問返し処理部３２と、音声合成部３４と、を含んで構成されている。 The calculation unit 220 includes a dialogue function storage unit 22, a voice recognition unit 224, a content analysis unit 28, a word candidate extraction unit 30, a question answering processing unit 32, and a voice synthesis unit 34. Yes.

音声認識部２２４は、入力部１０において受け付けた利用者の音声データについて、音声認識を行い、当該音声データについてＮ個の認識候補文（Ｎ−ｂｅｓｔ）を認識候補文群として取得し、内容解析部２８に出力する。また、音声認識部２２４は、問返し方法を「保留」と設定し、内容解析部２８に出力する。なお、音声認識部２２４における他の処理については第１の実施の形態に係る単語選択装置における音声認識部２４と同様の処理を行うため、説明を省略する。 The speech recognition unit 224 performs speech recognition on the user's speech data received by the input unit 10, acquires N recognition candidate sentences (N-best) as recognition candidate sentence groups for the speech data, and analyzes the contents. To the unit 28. In addition, the voice recognition unit 224 sets the question answering method to “hold” and outputs it to the content analysis unit 28. Note that other processes in the speech recognition unit 224 are the same as those performed in the speech recognition unit 24 in the word selection device according to the first embodiment, and thus description thereof is omitted.

＜第２の実施形態に係る単語選択装置の作用＞
次に、第２の実施形態に係る単語選択装置２００の作用について説明する。入力部１０においてマイクから入力された利用者の音声データを受け付けると、単語選択装置２００によって図１５に示す単語選択処理ルーチンを実行する。 <Operation of the word selection device according to the second embodiment>
Next, the operation of the word selection device 200 according to the second embodiment will be described. When the user's voice data input from the microphone is received by the input unit 10, the word selection processing routine shown in FIG.

まず、図１５のステップＳ６００で、音声認識部２２４は、ステップＳ１０２において取得したＮ個の認識候補文を認識候補文群とする。 First, in step S600 of FIG. 15, the speech recognition unit 224 sets the N recognition candidate sentences acquired in step S102 as a recognition candidate sentence group.

次に、ステップＳ６０２で、音声認識部２２４は、問返し方法を「保留」に設定する。 Next, in step S602, the voice recognition unit 224 sets the question answering method to “hold”.

以上説明したように、第２の実施形態に係る単語選択装置によれば、入力された利用者の音声データに対する音声認識結果である複数の認識候補文毎に、対話シナリオについて予め定められている、少なくとも１つの対話関数に基づいて、内容配列を生成し、生成された内容配列に格納された少なくとも１つの特定の文字列に基づいて利用者に対して問い返す単語を選択することにより、入力された発話内容を特定するための問い返す単語を適切に選択することができる。 As described above, according to the word selection device according to the second embodiment, the dialogue scenario is determined in advance for each of a plurality of recognition candidate sentences that are voice recognition results for the input user's voice data. , By generating a content array based on at least one interactive function, and by selecting a word to query the user based on at least one specific string stored in the generated content array It is possible to appropriately select a question-returning word for specifying the utterance content.

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能であるし、ネットワークを介して提供することも可能である。 Further, in the present specification, the embodiment has been described in which the program is installed in advance. However, the program can be provided by being stored in a computer-readable recording medium or provided via a network. It is also possible to do.

、
１０入力部
２０演算部
２２対話関数記憶部
２４音声認識部
２６信頼度判定部
２８内容解析部
３０単語候補抽出部
３２処理部
３４音声合成部
９０出力部
１００単語選択装置
２００単語選択装置
２２０演算部
２２４音声認識部 ,
DESCRIPTION OF SYMBOLS 10 Input part 20 Operation part 22 Dialog function memory | storage part 24 Speech recognition part 26 Reliability determination part 28 Content analysis part 30 Word candidate extraction part 32 Processing part 34 Speech synthesis part 90 Output part 100 Word selection apparatus 200 Word selection apparatus 220 Operation part 224 Voice recognition unit

Claims

The identification is predetermined for a dialogue scenario that is a rule for performing a specific operation for each specific character string for each of a plurality of recognition candidate sentences that are voice recognition results for the input voice data of the user. based on the plurality of interactive functions to extract the type of string, for each of the plurality of interactive functions, from the recognition candidate sentence of at least one of said specific type have been extracted by the interaction function string A content analysis unit for generating a content array storing
A word to be queried to the user is selected based on the at least one particular type of character string stored in the content array generated for each of the plurality of interactive functions by the content analysis unit. A word candidate extraction unit;
Including a word selection device.

The identification is predetermined for a dialogue scenario that is a rule for performing a specific operation for each specific character string for each of a plurality of recognition candidate sentences that are voice recognition results for the input voice data of the user. A content analysis unit that generates a content array storing at least one specific character string extracted from the recognition candidate sentence by the interactive function, based on at least one interactive function for extracting the character string;
A word candidate extraction unit that selects a word to be replied to the user based on the at least one specific character string stored in the content array generated by the content analysis unit;
Only including,
The word candidate extraction unit includes a combination of the content sequence generated by the content analysis unit and the dialogue function name corresponding to the dialogue function from which the specific character string stored in the content sequence is extracted as one line. Create a table of contents to represent
When there is only one type of interactive function name in the created content table, when there is a row in which the content array matches in the created content table, the matching rows are integrated,
When there is a row in which the content array is in an inclusion relationship in the content table, the row having the smaller number of specific character strings stored in the content array is deleted from the rows in the inclusion relationship. ,
Of the columns corresponding to the elements of the content array in the content table, when there is only one column where the specific character string is different, the specific character string included in the difference column is A word selection device that selects each word as a question-returning word .

A word selection method in a word selection device including a content analysis unit and a word candidate extraction unit,
The content analysis unit determines in advance a dialogue scenario that is a rule for performing a specific operation on each specific character string for each of a plurality of recognition candidate sentences that are voice recognition results for the input user's voice data. It is, on the basis of the plurality of interactive functions for extracting character strings of a particular type, for each of the plurality of interactive functions, at least one of said extracted by the conversation function from the recognition candidate sentence Create a content array that stores a particular type of string,
The word candidate extraction unit is configured to inform the user based on the at least one specific type of character string stored in the content array generated for each of the plurality of interactive functions by the content analysis unit. Select a word to ask
Word selection method.

A word selection method in a word selection device including a content analysis unit and a word candidate extraction unit,
The content analysis unit determines in advance a dialogue scenario that is a rule for performing a specific operation on each specific character string for each of a plurality of recognition candidate sentences that are voice recognition results for the input user's voice data. Generating a content array storing at least one specific character string extracted from the recognition candidate sentence by the interactive function based on at least one interactive function for extracting the specific character string. And
The word candidate extraction unit selects a word to be queried to the user based on the at least one specific character string stored in the content array generated by the content analysis unit;
Word selection method ,
The word candidate extraction unit selects a combination of the content array generated by the content analysis unit and the dialog function name corresponding to the dialog function from which the specific character string stored in the content array is extracted. Create a table of contents that represents as one row,
When there is only one type of interactive function name in the created content table, when there is a row in which the content array matches in the created content table, the matching rows are integrated,
When there is a row in which the content array is in an inclusion relationship in the content table, the row having the smaller number of specific character strings stored in the content array is deleted from the rows in the inclusion relationship. ,
Of the columns corresponding to the elements of the content array in the content table, when there is only one column where the specific character string is different, the specific character string included in the difference column is A word selection method for selecting each as a word to be asked back.

The program for functioning a computer as each part of the word selection apparatus of Claim 1 or Claim 2.