JP2000181475A

JP2000181475A - Voice answering device

Info

Publication number: JP2000181475A
Application number: JP10362898A
Authority: JP
Inventors: Masakazu Hattori; 雅一服部; Takashi Sasai; 崇司笹井; Hiroshi Tsunoda; 弘史角田; Yasuhiko Kato; 靖彦加藤
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1998-12-21
Filing date: 1998-12-21
Publication date: 2000-06-30
Anticipated expiration: 2018-12-21
Also published as: JP4228442B2

Abstract

PROBLEM TO BE SOLVED: To inform a user of the reliability of an answer sent back by an answering system to a question inputted by the user and also to adaptively vary the detailedness of the answer. SOLUTION: An input analysis block 2 makes a retrieval block 5 to retrieve data according to information inputted from an input block 1, an answer text generation block 4 generates an answer sentence from the retrieved data, and a voice synthesis block 3 outputs a synthesized voice of the answer sentence. The answer text generation block 4 generates the answer sentence on which the reliability of the answer is reflected, varies the abstraction degree of the expression of the corresponding answer sentence according to the accuracy of the retrieved data, and also varies the abstraction degree of the answer sentence at a request for accuracy estimated from input information and a history of a history management block 9.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、例えばユーザ等か
ら入力された質問等に対して応答を行うようなシステム
に関し、特に自然言語の音声による応答を行う音声応答
装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a system for responding to a question or the like input from, for example, a user, and more particularly to a voice response device for responding to a natural language voice.

【０００２】[0002]

【従来の技術】従来より、例えばユーザ等から入力され
た質問等に対して応答を行うようなシステムが存在す
る。当該応答システムの一例としては、ユーザ等からの
質問等に対して自然言語の音声による応答を行う音声応
答装置が知られている。2. Description of the Related Art Conventionally, there is a system that responds to a question or the like input from a user or the like. As an example of the response system, there is known a voice response device that responds to a question or the like from a user or the like by voice in a natural language.

【０００３】また、応答の内容やその情報の出所に応じ
て応答音声を変えるようになされた音声合成システム及
び音声合成方法も存在している。[0003] There is also a speech synthesis system and a speech synthesis method that change the response voice according to the content of the response and the source of the information.

【０００４】[0004]

【発明が解決しようとする課題】ところで、例えばユー
ザ等から入力された質問に対して応答システムから応答
を返す際に、その応答の信頼性についてもユーザに知ら
せたい場合がある。しかし、従来の音声応答装置では、
口調、音質などが一定のものが多い。これでは、応答の
内容をしっかり聞かないと、ユーザはその信頼性がわか
らない。Incidentally, when a response is returned from a response system to a question input by a user or the like, for example, there is a case where it is desired to inform the user of the reliability of the response. However, in the conventional voice response device,
Many tone and sound quality are constant. In this case, the user cannot understand the reliability unless he / she listens to the contents of the response.

【０００５】また、例えば応答の内容やその情報の出所
に応じて応答音声を変えるシステムを用いたとしても、
応答の信頼性を伝えるのに充分ではない。例えば、ユー
ザの質問への理解に不安があっての応答ならば、その応
答の信頼性は低いと言える。一方で、応答に利用した情
報の細かい部分が不確かであっても、ユーザの要求が詳
細を求めていなければ、充分な応答ができると考えられ
る。[0005] Further, for example, even if a system that changes the response voice according to the content of the response and the source of the information is used,
Not enough to convey the reliability of the response. For example, if the user is uneasy to understand the question, the response is low in reliability. On the other hand, even if the detailed part of the information used for the response is uncertain, it is considered that a sufficient response can be made unless the user's request requires details.

【０００６】さらに、応答の詳細度を適応的に変えるこ
とも、音声応答においては重要なポイントである。すな
わち例えば、音声出力で必要以上に詳しく説明してしま
うと、適切な応答とは言えなくなる。このような場合に
おける応答の一つの方法としては、特開平８−１３７６
９８号公報にて開示されるように、ユーザからの質問の
形式、履歴を反映して、必要と思われる部分の説明だけ
を行い、他を省略するような方法が考えられる。また、
他の方法としては、データを応答文に変換する時の抽象
化の程度によって、詳細度を変えることも考えられる。
例えば、あるイベントの日程を知らせる時に、いつ頃か
という「月、季節」を知らせるのか、或いは具体的な
「日時」まで知らせるのかといったことである。Further, adaptively changing the level of detail of a response is also an important point in voice response. That is, for example, if the explanation is made in more detail than necessary by voice output, it cannot be said that the response is appropriate. One method of responding in such a case is disclosed in Japanese Unexamined Patent Application Publication No. H8-1376.
As disclosed in Japanese Patent Publication No. 98, there may be a method in which the format and history of the question from the user are reflected, and only the portions deemed necessary are described, and the others are omitted. Also,
As another method, the degree of detail may be changed depending on the degree of abstraction when converting data into a response sentence.
For example, when notifying the date of a certain event, is it to be informed of "month, season", or about a specific "date and time"?

【０００７】しかしながら、従来の音声応答システムで
は、データから応答文への変換の時に、用意された応答
文のテンプレートにデータの項目を入れていくものが多
く、１対１対応でデータを応答文に変換するため、簡潔
な応答のための抽象化には対応できない。However, in the conventional voice response system, when converting data into a response sentence, many data items are put into a prepared response sentence template. , It cannot support abstraction for simple response.

【０００８】そこで、本発明はこのような状況に鑑みて
なされたものであり、ユーザ等から入力された質問に対
して応答システムから応答を返す際に、その応答の信頼
性についてユーザに知らせることが可能であり、また、
応答の詳細度を適応的に変えることも可能な、音声応答
装置を提供することを目的とする。The present invention has been made in view of such circumstances, and when a response is returned from a response system to a question input by a user or the like, the user is informed of the reliability of the response. Is possible, and
It is an object of the present invention to provide a voice response device capable of adaptively changing the degree of detail of a response.

【０００９】[0009]

【課題を解決するための手段】本発明の音声応答装置
は、情報を入力する入力手段と、入力された情報に基づ
いてデータ検索を行う検索手段と、検索したデータから
応答文を生成する応答文生成手段と、応答文を合成音に
変換して出力する合成音変換出力手段とを有し、検索し
たデータに基づいた応答文を生成するとき、当該応答の
信頼性を反映した応答文を生成することにより、上述し
た課題を解決する。According to the present invention, there is provided a voice response apparatus comprising: input means for inputting information; search means for performing data search based on the input information; and a response for generating a response sentence from the searched data. When generating a response sentence based on the retrieved data, a response sentence that reflects the reliability of the response is generated. By generating, the above-mentioned problem is solved.

【００１０】また、本発明の音声応答装置は、情報を入
力する入力手段と、入力された情報に基づいてデータ検
索を行う検索手段と、検索したデータから応答文を生成
する応答文生成手段と、応答文を合成音に変換して出力
する合成音変換出力手段とを有し、検索したデータに基
づいた応答文を生成するとき、検索したデータの正確性
に応じて当該応答文の表現の抽象度を変えることによ
り、上述した課題を解決する。[0010] The voice response apparatus according to the present invention comprises: input means for inputting information; search means for performing data search based on the input information; and response sentence generation means for generating a response sentence from the searched data. Means for converting a response sentence into a synthesized sound and outputting the synthesized sound, and when generating a response sentence based on the retrieved data, generating a response sentence based on the accuracy of the retrieved data. The above-described problem is solved by changing the degree of abstraction.

【００１１】また、本発明の音声応答装置は、情報を入
力する入力手段と、入力された情報に基づいてデータ検
索を行う検索手段と、検索したデータから応答文を生成
する応答文生成手段と、過去の入力及びその入力に対す
る応答の履歴を管理する履歴管理手段と、応答文を合成
音に変換して出力する合成音変換出力手段とを有し、検
索したデータに基づいた応答文を生成するとき、入力し
た情報及び履歴から推測した正確性の要求に応じて、応
答文の表現の抽象度を変えることにより、上述した課題
を解決する。[0011] The voice response apparatus according to the present invention comprises: input means for inputting information; search means for performing data search based on the input information; and response sentence generation means for generating a response sentence from the searched data. Generating a response sentence based on the retrieved data, including history management means for managing a history of past inputs and responses to the input, and synthetic sound conversion output means for converting the response sentence into a synthesized sound and outputting the synthesized sound In doing so, the above-described problem is solved by changing the degree of abstraction of the expression of the response sentence according to the request for accuracy estimated from the input information and the history.

【００１２】すなわち本発明においては、ユーザ等から
入力された質問に対して応答を返す際に、例えば、デー
タの種類、確信度、詳細度、量などに応じて、応答文の
口調、声などを変えることで、その応答の信頼性につい
てユーザに知らせることを可能とし、また、応答の詳細
度を適応的に変えることを可能として、さらにユーザか
ら指示された要点に応じて応答文を簡潔に出力するよう
にしている。That is, according to the present invention, when returning a response to a question input from a user or the like, the tone, voice, etc. of the response sentence are determined according to, for example, the type, certainty, detail, and amount of data. By changing the response, it is possible to inform the user about the reliability of the response, and it is also possible to adaptively change the level of detail of the response, and to further simplify the response sentence according to the points specified by the user. Output.

【００１３】[0013]

【発明の実施の形態】本発明の好ましい実施の形態につ
いて、図面を参照しながら説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described with reference to the drawings.

【００１４】本発明の音声応答装置が適用される一実施
の形態の音声応答システムの構成例を図１に示す。FIG. 1 shows a configuration example of a voice response system according to an embodiment to which the voice response device of the present invention is applied.

【００１５】先ず、第１の実施の形態の音声応答システ
ム構成として、ユーザからの質問に対応する情報を検索
し、その検索した情報から音声を生成して出力する情報
検索システムを例に挙げる。First, as an example of the voice response system configuration of the first embodiment, an information search system that searches for information corresponding to a question from a user, generates a voice from the searched information, and outputs the voice will be described as an example.

【００１６】この図１において、入力ブロック１は、ユ
ーザからの入力を受け、その入力を数値や記号に変換し
たり、テキストに変換するための入力変換手段である。
ここで、ユーザからの入力とは、例えばキーワードや命
令、質問等であり、これら入力としては、例えばボタ
ン、キーボード、タッチパネル等をユーザが操作するこ
とによる入力や、マイクロホンへの音声による入力など
が考えられる。In FIG. 1, an input block 1 is an input conversion means for receiving an input from a user, converting the input into a numerical value or a symbol, or converting the input into a text.
Here, the input from the user is, for example, a keyword, a command, a question, and the like. Examples of the input include an input by a user operating a button, a keyboard, a touch panel, and the like, and an input by a voice to a microphone. Conceivable.

【００１７】入力解析ブロック２は、入力ブロック１の
処理結果を受けて、ユーザ入力を解釈し、どの種類の情
報から何をキーワードにしてデータを検索するのかを決
定し、その決定結果を検索ブロック５に知らせる。ま
た、当該入力解析ブロック２は、検索ブロック５での検
索の結果得られたデータを、どの程度の詳しさで音声出
力するのかを決定し、その決定結果を応答テキスト生成
ブロック４に知らせる。The input analysis block 2 receives the processing result of the input block 1, interprets the user input, determines what kind of information is to be used as a keyword, and determines data to be searched. Inform 5 In addition, the input analysis block 2 determines how detailed the data obtained as a result of the search in the search block 5 is to be output as voice, and notifies the response text generation block 4 of the determination result.

【００１８】履歴管理ブロック９は、入力解析ブロック
２での決定を行うための情報が欠けている場合にそれを
補う役割を持つ。ユーザの過去の入力、それに対して行
った応答出力などの履歴を管理する。当該履歴管理ブロ
ック９は、入力解析ブロック２から不足情報に関する問
い合わせがあると、それに対する回答を送る。The history management block 9 has a role of supplementing information for making a decision in the input analysis block 2 when the information is lacking. It manages the history of the user's past input and the response output made to it. The history management block 9 sends an answer to the inquiry about the missing information from the input analysis block 2.

【００１９】検索ブロック５は、入力解析ブロック２か
らの情報をもとに、データベースブロック６に格納され
ている情報の中から所望のデータを検索する。なお、当
該検索の結果得られるデータには、最終的にユーザが知
りたい以外の情報や必要以上に詳しい情報が含まれる
が、この検索結果のデータはそのまま応答テキスト生成
ブロック４に送られる。The retrieval block 5 retrieves desired data from the information stored in the database block 6 based on the information from the input analysis block 2. It should be noted that the data obtained as a result of the search includes information other than what the user wants to know in the end and more detailed information than necessary. The data of the search result is sent to the response text generation block 4 as it is.

【００２０】データベースブロック６は、検索の対象と
なる情報が記録されている場所である。このデータベー
スブロック６に格納される情報は、例えば地図や辞典な
どシステム側で用意しておく情報の他、予定表、電話帳
など、ユーザが後から編集、追加していく情報も含まれ
る。また、交通情報、ニュース、天気予報など、ネット
ワーク経由で取得する情報も含まれる。The database block 6 is a place where information to be searched is recorded. The information stored in the database block 6 includes information prepared by the system such as a map and a dictionary, as well as information edited and added later by the user, such as a schedule table and a telephone directory. It also includes information obtained via a network, such as traffic information, news, and weather forecasts.

【００２１】応答テキスト生成ブロック４は、検索ブロ
ック５から受け取ったデータをもとに、応答用のテキス
トを生成する。すなわち、応答テキスト生成ブロック４
では、例えば入力解析ブロック２からの情報をもとに、
ユーザが求めていないデータ項目をカットしたり、表現
の抽象度を決めたりすることが行われる。また、応答テ
キスト生成ブロック４は、テキスト生成用データ格納ブ
ロック８から単語を引き、それらを繋ぎ合わせてテキス
トを生成したり、そのテキストを音声出力する時の声の
質、大きさなどを決める。The response text generation block 4 generates a response text based on the data received from the search block 5. That is, response text generation block 4
Then, for example, based on information from the input analysis block 2,
Data items that are not required by the user are cut or the level of abstraction of the expression is determined. Further, the response text generation block 4 extracts words from the text generation data storage block 8 and connects them to generate a text, and determines the quality and loudness of voice when the text is output as voice.

【００２２】テキスト生成用データ格納ブロック８に
は、データの各要素に対応する単語、及び、出力情報の
特徴を表現するための語尾のセットが用意されている。
データと単語の対応は１対多であり、例えば「１０月５
日」「１０月上旬」「今年の秋」という表現がすべて１
つのデータに対応する。そのどれを選ぶかは、データの
正確性や、音声テキスト生成ブロック４で決められた表
現の抽象度による。また、語尾については、出力情報の
特徴をユーザに感覚的に伝えられるように使い分ける。
例えば、応答の信頼性が高い時は「〜です」、低い時は
「〜かな」という語尾を使用する。The text generation data storage block 8 has a set of words corresponding to each element of the data and a set of endings for expressing the characteristics of the output information.
The correspondence between data and words is one-to-many, for example, “October 5
The expressions "Sun", "Early October" and "This Autumn" are all 1
Data. Which one to select depends on the accuracy of the data and the degree of abstraction of the expression determined in the speech text generation block 4. In addition, the ending is selectively used so that the characteristics of the output information can be intuitively conveyed to the user.
For example, when the reliability of the response is high, the suffix “is” is used, and when the reliability is low, the suffix is used.

【００２３】音声合成ブロック３は、応答テキスト生成
ブロック４からテキストとその出力方法を受け取る。当
該音声合成ブロック３では、それらをもとに音声合成用
データ格納ブロック７から必要なデータを取得し、それ
らを繋ぎ合わせて合成音データを生成する。そして、こ
の音声合成ブロック３にて生成された応答文の合成音デ
ータは、最終的にスピーカに音声信号として送られ、こ
れにより当該スピーカから応答文の合成音が出力され
る。The speech synthesis block 3 receives the text and its output method from the response text generation block 4. The speech synthesizing block 3 acquires necessary data from the speech synthesizing data storage block 7 based on the data, and connects them to generate synthesized speech data. Then, the synthesized speech data of the response sentence generated in the speech synthesis block 3 is finally sent as an audio signal to the speaker, whereby the synthesized speech of the response sentence is output from the speaker.

【００２４】音声合成用データ格納ブロック７には、テ
キストを合成音に変換するための音素などのデータセッ
トが用意されている。なお、この音声合成用データ格納
ブロック７に用意された情報は、出力情報の特徴をユー
ザに感覚的に伝えられるように使い分けられる。すなわ
ち本実施の形態のシステムでは、例えば、応答の信頼性
が高い時は大人の声を、低い時は子供の声を使用するよ
うなことが行われる。The speech synthesis data storage block 7 is provided with a data set such as phonemes for converting text into synthesized speech. The information prepared in the voice synthesizing data storage block 7 is selectively used so that the characteristics of the output information can be intuitively transmitted to the user. That is, in the system of the present embodiment, for example, when the reliability of the response is high, the voice of an adult is used, and when the reliability of the response is low, the voice of a child is used.

【００２５】ここで、上記応答の信頼性を決める要素に
は、例えば、ユーザ入力の解釈に対する自信と、応答に
利用した情報の出所と、ユーザが応答に対して期待する
正確性などが考えられる。Here, factors that determine the reliability of the response include, for example, confidence in the interpretation of the user input, the source of the information used for the response, and the accuracy expected by the user for the response. .

【００２６】先ず、上記ユーザ入力の解釈に対する自信
が、上記応答の信頼性を決める要素となる例について具
体的に説明する。First, an example in which confidence in the interpretation of the user input is a factor that determines the reliability of the response will be specifically described.

【００２７】例えば「東京から静岡までの時間は？」と
いう質問がユーザから入力された場合、この質問には、
東京から静岡まで移動するのに車を使用するのか或いは
電車するのか、といった移動手段についての内容が抜け
ている。ここで、入力解析ブロック２は、履歴管理ブロ
ック９に保存されているユーザ入力の履歴を管理してお
り、当該履歴管理ブロック９の履歴から例えばユーザは
車で移動することが多いとわかったとする。その場合、
本実施の形態のシステムでは、ユーザの質問に抜けてい
る移動手段についての内容をユーザに聞き返すようなこ
とは行わず、とりあえず車での移動時間を調べて応答を
返すことができる。ただし、この場合、車での移動時間
を応答で返すことが正しいかどうかは不明なので、応答
の信頼性がその分落ちる。すなわち、上記ユーザ入力の
解釈に対する自信とは、例えば上述のようにユーザ入力
を解釈した場合にその解釈がどの程度確かであるかを表
し、当該ユーザ入力の解釈に対する自信によって、上記
応答の信頼性が決定される。For example, if the user inputs a question "What is the time from Tokyo to Shizuoka?"
The details of the means of transportation, such as whether to use a car or a train to travel from Tokyo to Shizuoka, are missing. Here, it is assumed that the input analysis block 2 manages the history of the user input stored in the history management block 9, and from the history of the history management block 9, for example, it is found that the user often moves by car. . In that case,
The system according to the present embodiment does not ask the user about the contents of the means of transportation that are missing in the user's question, but can check the travel time by car for the time being and return a response. However, in this case, it is unknown whether it is correct to return the travel time by car in the response, and the reliability of the response is reduced accordingly. That is, the confidence in the interpretation of the user input indicates, for example, how reliable the interpretation is when the user input is interpreted as described above, and the confidence in the interpretation of the user input indicates the reliability of the response. Is determined.

【００２８】次に、上記応答に利用した情報の出所が、
上記応答の信頼性を決める要素となる例について具体的
に説明する。Next, the source of the information used for the response is
An example serving as an element that determines the reliability of the response will be specifically described.

【００２９】例えば、東京から静岡まで車での移動時間
を計算する場合は、例えば、道路地図を利用して車での
移動時間を計算する方法や、交通情報を利用して車での
移動時間を計算する方法などがある。上記道路地図を利
用する場合は、例えば出発地と目的地の２地点間の道路
距離を計算し、その道路距離を予想される時速で割って
移動時間を求める。また、交通情報を利用する場合は、
例えば出発地と目的地の２地点の間に設けられている、
交通量や渋滞等の各測定ポイント間の所要時間を調べ、
それらを合計することで移動時間を求める。このよう
に、応答に利用した情報の出所として、道路地図を利用
する場合と交通情報を利用する場合の２つの出所がある
とすると、リアルタイムな交通量や渋滞情報等を加味で
きる上記交通情報を利用した場合の方が、応答の信頼性
は高くなる。このように、応答に利用した情報の出所に
より、応答の信頼性を決める。For example, when calculating the travel time by car from Tokyo to Shizuoka, for example, a method of calculating the travel time by car using a road map or the travel time by car using traffic information There is a method to calculate. When the road map is used, for example, a road distance between two points, a departure point and a destination, is calculated, and the travel distance is calculated by dividing the road distance by an expected hourly speed. When using traffic information,
For example, it is provided between two points of departure and destination,
Check the time required between each measurement point such as traffic volume and traffic jam,
The travel time is obtained by summing them. As described above, assuming that there are two sources of information used for the response, a case where a road map is used and a case where traffic information is used, the traffic information that can take into account real-time traffic volume, traffic congestion information, etc. When used, the reliability of the response is higher. Thus, the reliability of the response is determined by the source of the information used for the response.

【００３０】次に、上記ユーザが応答に対して期待する
正確性が、上記応答の信頼性を決める要素となる例につ
いて具体的に説明する。Next, an example will be described in which the accuracy that the user expects for the response is a factor that determines the reliability of the response.

【００３１】例えば、東京から静岡までの車での移動時
間を道路地図から計算することにより、移動時間として
例えば２．１時間という計算結果が出たとする。ここ
で、例えば１時間単位での正確性しかないとすると、ユ
ーザからの「東京から静岡までの時間は？」という質問
に対して「２時間くらい」という応答は信頼できると考
えられる。しかし、ユーザからの「東京から静岡まで何
分くらいかかる？」という質問に対して、「２時間６
分」という応答は信頼性が低いと考えられる。すなわ
ち、同じデータからの応答であっても、ユーザの期待と
の比較によって、その応答の信頼性が変わる。このよう
に、ユーザが応答に対して期待する正確性によって、応
答の信頼性を決める。For example, it is assumed that the travel time by car from Tokyo to Shizuoka is calculated from a road map, and a calculation result of 2.1 hours is obtained as the travel time. Here, assuming that the accuracy is, for example, only in units of one hour, it is considered that a response of "about two hours" to the question "What is the time from Tokyo to Shizuoka?" However, in response to the question "How long does it take from Tokyo to Shizuoka?"
The minute response is considered unreliable. That is, even if the response is from the same data, the reliability of the response changes depending on the comparison with the user's expectation. Thus, the reliability of the response is determined by the accuracy that the user expects of the response.

【００３２】一方、上記データ表現の抽象度を決める要
素には、例えば、データ自身の正確性と、ユーザが応答
に対して期待する正確性などが考えられる。On the other hand, factors that determine the degree of abstraction of the data expression include, for example, the accuracy of the data itself and the accuracy that the user expects for a response.

【００３３】先ず、上記データ自身の正確性が、上記デ
ータ表現の抽象度を決める要素となる例について具体的
に説明する。First, an example in which the accuracy of the data itself is an element that determines the degree of abstraction of the data expression will be specifically described.

【００３４】例えば、東京から静岡までの車での移動時
間を道路地図から計算することにより、移動時間として
例えば２．１時間という計算結果が出たとする。このよ
うな場合、ユーザに対する応答の表現の仕方としては、
当該移動時間の計算結果である２．１時間を、「２時間
６分」と表現したり、「２時間ちょっと」と表現した
り、或いは「２時間くらい」と表現するように、想定さ
れる誤差に応じて表現を変えることができる。このよう
にデータ自身の正確性に応じて、応答のデータ表現の抽
象度を決める。For example, it is assumed that the travel time by car from Tokyo to Shizuoka is calculated from a road map, and a calculation result of 2.1 hours is obtained as the travel time. In such a case, the way of expressing the response to the user is as follows:
It is assumed that the calculated result of the traveling time, 2.1 hours, is expressed as “2 hours and 6 minutes”, “2 hours a little”, or “about 2 hours”. The expression can be changed according to the error. In this way, the degree of abstraction of the data representation of the response is determined according to the accuracy of the data itself.

【００３５】次に、上記ユーザが応答に対して期待する
正確性が、上記データ表現の抽象度を決める要素となる
例について具体的に説明する。Next, an example will be described in which the accuracy that the user expects for the response is an element that determines the degree of abstraction of the data expression.

【００３６】例えば、ユーザからの「明日の予定は？」
という質問に対しては、「午前中にテニスです。」と応
答するようにする。一方、ユーザからの「明日の午前中
の予定は？」という質問に対しては、「９時から１１時
までテニスです。」と、時間まで具体的に答えるように
する。このように、ユーザが応答に対して期待する正確
性によって、データ表現の抽象度を決める。なお、ユー
ザが応答に対して期待する正確性は、ユーザからの質問
入力の形状からだけでなく、過去の履歴から推測するこ
ともできる。例えば、いつも正確な時間まで求めるユー
ザに対しては、「明日の予定は？」と質問されたとき、
「９時から１１時までテニスです。」と答えるようにす
る。For example, a user asks "What is the schedule for tomorrow?"
When asked, "I'm playing tennis in the morning." On the other hand, in response to the question "What is your plan for tomorrow morning?", The user is asked to answer specifically "Time is tennis from 9:00 to 11:00." Thus, the degree of abstraction of the data representation is determined by the accuracy that the user expects for the response. Note that the accuracy that the user expects for the response can be estimated not only from the shape of the question input from the user but also from the past history. For example, if a user always asks for the exact time, and asks, "What is your plan for tomorrow?"
Ask to answer, "I play tennis from 9:00 to 11:00."

【００３７】以下、第１の実施の形態のシステムの動作
について、図１及び図２を用いて具体的に説明する。Hereinafter, the operation of the system according to the first embodiment will be specifically described with reference to FIGS.

【００３８】ここでは、ユーザがキーワードと共に質問
を入力すると、本実施の形態のシステムから音声出力に
よる応答を返す場合を考える。Here, it is assumed that when the user inputs a question together with a keyword, a response by voice output is returned from the system of the present embodiment.

【００３９】先ず、初期設定として、ユーザが前記入力
ブロック１を操作し、道路に関する情報を得るモードと
する。First, as an initial setting, a mode is set in which the user operates the input block 1 to obtain information on the road.

【００４０】次に、ステップＳ１として、ユーザが「東
京から静岡までの時間は？」という質問を入力する。こ
こでは、質問入力のためのインターフェイスとして音声
入力を想定し、ユーザがシステムのマイクロホンに向か
って喋るものとする。マイクロホンからの入力音声は、
入力ブロック１内で音声認識処理にかけられ、テキスト
データに変換される。Next, as step S1, the user inputs a question "What is the time from Tokyo to Shizuoka?" Here, it is assumed that a voice input is assumed as an interface for inputting a question, and a user speaks into a microphone of the system. The input sound from the microphone is
It is subjected to voice recognition processing in the input block 1 and converted into text data.

【００４１】入力解析ブロック２は、入力ブロック１か
ら送られたテキストデータを解析し、対応する検索処理
を決定する。すなわち、入力解析ブロック２は、処理を
決定するためのキーワードのリストを持っており、先ず
はこのキーワードにマッチする言葉を捜す。The input analysis block 2 analyzes the text data sent from the input block 1 and determines a corresponding search process. That is, the input analysis block 2 has a list of keywords for determining processing, and first searches for a word that matches this keyword.

【００４２】ここで、「東京から静岡までの時間は？」
という質問からは、キーワードにマッチする言葉として
「東京」「静岡」「時間」の３つを見つける。このと
き、「東京」「静岡」といった地名と「時間」というキ
ーワードから交通情報を調べることまでは分かるが、こ
のままでは交通手段（移動手段）が分からない。そこ
で、入力解析ブロック２は、履歴管理ブロック９に問い
合わせを行う。この問い合わせから、過去において交通
手段として例えば車が選択される場合が殆どだったとい
うことを知ると、入力解析ブロック２は、これを踏まえ
て、ステップＳ１０として交通手段は車であると想定す
る。これにより、交通手段（移動手段）についてユーザ
が明示していないが、履歴から車の可能性が高く、ユー
ザ入力に対する理解には自信があるものとする。Here, "What is the time from Tokyo to Shizuoka?"
Find three words that match the keywords: "Tokyo", "Shizuoka", and "Time". At this time, it can be understood from a place name such as “Tokyo” and “Shizuoka” and a keyword “time” that the traffic information is checked, but the transportation means (moving means) cannot be understood as it is. Therefore, the input analysis block 2 makes an inquiry to the history management block 9. When the input analysis block 2 learns from this inquiry that, for example, a car has been mostly selected as a means of transportation in the past, the input analysis block 2 assumes that the means of transportation is a car in step S10 based on this. Thus, it is assumed that the user does not explicitly indicate transportation means (moving means), but the possibility of a car is high from the history, and he is confident in understanding user input.

【００４３】次に、入力解析ブロック２は、検索ブロッ
ク５に対し、ステップＳ２として「東名高速道路の交通
情報を取得する」ように指示を行い、また、応答テキス
ト生成ブロック４に対し、「検索した情報から東京−静
岡間の所要時間を計算して出力する」ように指示する。
また、入力解析ブロック２は、ユーザの質問の形式か
ら、応答としては大体の時間を知らせればよいものと判
断し、応答テキスト生成ブロック４に対して、時間の分
単位については抽象的に表現するように指示する。Next, the input analysis block 2 instructs the search block 5 to “acquire traffic information on the Tomei Expressway” in step S2, and instructs the response text generation block 4 to “search for traffic information on the Tomei Expressway”. Calculates and outputs the required time between Tokyo and Shizuoka from the information obtained. "
Also, the input analysis block 2 determines from the user's question format that it is sufficient to notify the approximate time as a response, and provides the response text generation block 4 with an abstract representation of the minute unit of time. To do so.

【００４４】検索ブロック５は、上記入力解析ブロック
２からの指示に応じ、データベースブロック６にアクセ
スして交通情報を取り寄せる。このとき、当該検索ブロ
ック５は、システムの記憶部であるデータベースブロッ
ク６にローカルに保存されている情報を調べ、当該デー
タベースブロック６の情報が古かったり無かったりした
場合は、例えばネットワーク経由で最新の情報を取得す
る。なお、ネットワーク上のどこから交通情報が得られ
るか、また、その情報を得るためのシステムの動作は、
予め決められて分かっているものとする。この検索ブロ
ック５での検索により得られた交通情報は、応答テキス
ト生成ブロック４に送られる。The search block 5 accesses the database block 6 in accordance with the instruction from the input analysis block 2 to obtain traffic information. At this time, the search block 5 checks information stored locally in the database block 6 which is a storage unit of the system. If the information of the database block 6 is old or missing, for example, the latest information is sent via a network. Get information. In addition, where traffic information can be obtained from the network, and the operation of the system for obtaining that information,
Assume that it is predetermined and known. The traffic information obtained by the search in the search block 5 is sent to the response text generation block 4.

【００４５】このときの応答テキスト生成ブロック４
は、入力解析ブロック２から「東京−静岡間の所要時間
を計算して出力する」ことを知らされており、先ず、ス
テップＳ３として、検索ブロック５の検索により得られ
た交通情報から例えば「東京−厚木」「厚木−御殿場」
「御殿場−静岡」の各区間の所要時間を必要データとし
てピックアップし、さらにステップＳ４としてその合計
時間（この例では２１５分）を計算する。The response text generation block 4 at this time
Is notified from the input analysis block 2 that “the time required between Tokyo and Shizuoka is calculated and output”. First, at step S3, for example, “Tokyo-shizuoka” is extracted from the traffic information obtained by the search in the search block 5. -Atsugi, Atsugi-Gotemba
The required time of each section of "Gotemba-Shizuoka" is picked up as necessary data, and the total time (215 minutes in this example) is calculated as step S4.

【００４６】次に、当該応答テキスト生成ブロック４で
は、応答テキストを生成する。先ず、時間の表現につい
て決定する。上記の例の場合、入力解析ブロック５から
の指示に応じて、分単位については抽象的な表現に決定
する。この抽象的な表現の決定に際し、応答テキスト生
成ブロック４は、テキスト生成用データ格納ブロック８
を参照し、抽象的な表現の単語として、例えば「（何も
言わない）」「ちょっと」「半」「弱」の４つの中か
ら、例えば「半」という表現を選択し、これによりステ
ップＳ５として上記２１５分という時間を「３時間半」
という表現にする。また、当該計算した時間は、交通情
報がもとになっていて正確性が高いため、語尾の表現を
言い切り型の「です」にし、さらに、ユーザ入力に対し
て十分な信頼性を持つ応答であると判断し、大人の声の
大きくはっきりした口調の音声出力を行うためのテキス
トと音声合成に関する指示を音声合成ブロック３に出力
する。Next, the response text generation block 4 generates a response text. First, the expression of time is determined. In the case of the above example, the minute unit is determined to be an abstract expression according to the instruction from the input analysis block 5. In determining this abstract expression, the response text generation block 4 includes a text generation data storage block 8.
And select an expression such as "half" from among four words such as "(do not say anything)", "slightly", "half" and "weak" as the words of the abstract expression. The time of 215 minutes is "3 and a half hours"
To the expression. In addition, since the calculated time is based on traffic information and has high accuracy, the expression of the ending is changed to a disjunctive "is", and furthermore, a response having sufficient reliability to user input is provided. It is determined that there is a voice, and an instruction relating to text and voice synthesis for outputting a voice with a large and clear tone of an adult voice is output to the voice synthesis block 3.

【００４７】音声合成ブロック３では、応答テキスト生
成ブロック４から、「３時間半です」というテキストと
音声合成に関する指示を受け取る。これにより、音声合
成ブロック３は、ステップＳ１１として、「３時間半で
す」という大人の声の大きくはっきりした口調の音声出
力のデータを生成し、スピーカから応答として出力す
る。The speech synthesis block 3 receives a text “3 and a half hours” and an instruction regarding speech synthesis from the response text generation block 4. Thereby, the voice synthesis block 3 generates voice output data of a large and clear tone of an adult voice, "3 and a half hours", as a step S11, and outputs it as a response from the speaker.

【００４８】ところで、「東京から静岡までの時間」を
調べるための情報源は、交通情報だけとは限らない。正
確性では劣るものの、道路地図から距離を調べて、所要
時間を概算することも可能である。The information source for examining "time from Tokyo to Shizuoka" is not limited to traffic information. Although less accurate, it is also possible to estimate the required time by examining the distance from a road map.

【００４９】この場合、上記入力解析ブロック２は、ス
テップＳ６として、「東名高速道路の地図情報を所得す
る」のように、検索ブロック５に知らせる。In this case, the input analysis block 2 informs the search block 5 as "obtain map information of the Tomei Expressway" at step S6.

【００５０】このときの検索ブロック５は、データベー
スブロック６にアクセスして地図情報を取り寄せる。当
該検索ブロック５がデータベースブロック６を検索して
得られた情報は応答テキスト生成ブロック４に送られ
る。At this time, the search block 5 accesses the database block 6 to obtain map information. Information obtained by the search block 5 searching the database block 6 is sent to the response text generation block 4.

【００５１】またこのときの応答テキスト生成ブロック
４は、入力解析ブロック２から、「東京−静岡間の所要
時間を出力する」ことを知らさてており、ステップＳ７
として、検索ブロック５の検索により得られた地図情報
から例えば「東京−厚木」「厚木−御殿場」「御殿場−
静岡」の各区間の距離を必要データとしてピックアップ
し、さらにステップＳ８としてその合計距離（この例で
は２１０ｋｍ）を計算する。またさらに、応答テキスト
生成ブロック４では、車の移動速度を平均で時速１００
ｋｍとし、上記合計距離と当該平均時速とから所要時間
の２．１時間を概算する。At this time, the response text generation block 4 informs the input analysis block 2 that "the required time between Tokyo and Shizuoka is to be output", and proceeds to step S7.
From the map information obtained by the search of the search block 5, for example, "Tokyo-Atsugi", "Atsugi-Gotenba", "Gotemba-
The distance of each section of "Shizuoka" is picked up as necessary data, and the total distance (210 km in this example) is calculated as step S8. Further, in the response text generation block 4, the moving speed of the car is averaged at 100 per hour.
km, the required time of 2.1 hours is roughly estimated from the total distance and the average speed per hour.

【００５２】次に、当該応答テキスト生成ブロック４で
は、応答テキストを生成する。この場合も先ず、時間の
表現について決定する。上記の例の場合、入力解析ブロ
ック５からの指示に応じて、分単位については抽象的な
表現に決定する。この抽象的な表現の決定に際し、応答
テキスト生成ブロック４は、テキスト生成用データ格納
ブロック８を参照し、抽象的な表現の単語として、前述
同様に「（何も言わない）」「ちょっと」「半」「弱」
の４つの中から、例えば「（何も言わない）」の表現を
選択し、これによりステップＳ９として上記２．１時間
を「２時間」という表現にする。また、当該計算した時
間は、大体の時間としては合っているものとし、語尾の
表現を言い切り型の「です」にする。ただし、地図情報
を利用した場合は、上述の交通情報を利用した場合に比
べて応答の信頼性が低いので、大人の声ではあるが小さ
な声の音声出力を行うためのテキストと音声合成に関す
る指示を音声合成ブロック３に出力する。Next, the response text generation block 4 generates a response text. Also in this case, first, the expression of time is determined. In the case of the above example, the minute unit is determined to be an abstract expression according to the instruction from the input analysis block 5. In determining the abstract expression, the response text generation block 4 refers to the text generation data storage block 8 and uses the words “(say nothing)”, “slightly” and “slightly” as the words of the abstract expression as described above. Half and weak
For example, the expression "(do not say anything)" is selected from among the four, and thereby the above 2.1 hours is expressed as "2 hours" in step S9. In addition, the calculated time is assumed to be appropriate as an approximate time, and the expression at the end is made to be a disjunctive "is". However, when map information is used, the reliability of the response is lower than when the above-mentioned traffic information is used. Is output to the speech synthesis block 3.

【００５３】音声合成ブロック３では、応答テキスト生
成ブロック４から、「２時間です」というテキストと音
声合成に関する指示を受け取る。これにより、音声合成
ブロック３は、ステップＳ１２として、「２時間です」
という大人の声ではあるが小さな声の音声出力のデータ
を生成し、スピーカから応答として出力する。The speech synthesis block 3 receives from the response text generation block 4 the text "2 hours" and an instruction regarding speech synthesis. As a result, the speech synthesis block 3 sets “2 hours” in step S12.
, But generates voice output data of a small voice but outputs it as a response from the speaker.

【００５４】なお、上述のように道路地図から時間を計
算する場合において、ユーザからの質問が「東京から静
岡までの時間は？」でなく、「東京から静岡まで何分
？」だったとすると、以下のような応答が考えられる。In the case where the time is calculated from the road map as described above, if the question from the user is not "What is the time from Tokyo to Shizuoka?" The following responses are possible.

【００５５】この場合、入力解析ブロック２では、分単
位までの表現の応答をユーザが望んでいると判断され
る。In this case, in the input analysis block 2, it is determined that the user wants a response of the expression up to the minute unit.

【００５６】これにより、応答テキスト生成ブロック４
では、それに基づき、上記計算により得られた２．１時
間を「２時間６分」という表現にする。ただし、この計
算結果の分単位は信用できないものとすると、当該応答
における「６分」の部分は不正確（嘘）である可能性が
高い。このため、当該応答テキスト生成ブロック４で
は、応答の信頼性が十分でないので、語尾には曖昧性を
表現した「かな」を使うようにする。そして、応答テキ
スト生成ブロック４では、例えば子供の声で音声出力を
行うためのテキストと音声合成に関する指示を音声合成
ブロック３に出力する。Thus, the response text generation block 4
Then, based on this, the 2.1 hour obtained by the above calculation is expressed as “2 hours 6 minutes”. However, assuming that the minute unit of the calculation result is unreliable, the "6 minutes" portion in the response is highly likely to be incorrect (lie). Therefore, in the response text generation block 4, since the reliability of the response is not sufficient, "kana" expressing ambiguity is used at the end. Then, the response text generation block 4 outputs, to the voice synthesis block 3, a text for performing voice output in, for example, a child's voice and an instruction regarding voice synthesis.

【００５７】音声合成ブロック３では、応答テキスト生
成ブロック４から、「２時間６分かな」というテキスト
と音声合成に関する指示を受け取る。これにより、音声
合成ブロック３は、「２時間６分かな」という子供の声
の音声出力のデータを生成し、スピーカから応答として
出力する。The speech synthesis block 3 receives from the response text generation block 4 the text "2 hours and 6 minutes" and an instruction regarding speech synthesis. As a result, the voice synthesis block 3 generates the voice output data of the child's voice “2 hours 6 minutes” and outputs it as a response from the speaker.

【００５８】更に、例えば正確性を保証できないデータ
表現はしないという方針にすると、上述のように道路地
図から時間を計算する場合において、ユーザからの質問
が「東京から静岡まで何分？」だったとすると、以下の
ような応答が考えられる。Further, for example, if a policy is made not to use data representation that cannot guarantee accuracy, the question from the user is "How many minutes from Tokyo to Shizuoka?" Then, the following response can be considered.

【００５９】この例の場合、入力解析ブロック２では、
分単位までの表現の応答をユーザが望んでいると判断さ
れる。In this example, in the input analysis block 2,
It is determined that the user wants a response of the expression up to the minute unit.

【００６０】これにより、応答テキスト生成ブロック４
では、それに基づき、上記計算により得られた２．１時
間を「２時間６分」という表現にしようとする。しかし
ここで、上記計算結果の分単位は信用できないものとす
ると、当該応答の「６分」の部分は不正確（嘘）である
可能性が高い。そこで、応答テキスト生成ブロック４で
は、表現の抽象度を上げて、具体的な分単位は言わず
「２時間」という表現に決定する。さらに、応答テキス
ト生成ブロック４は、それ以上具体的には答えられない
ことを反映して、その語尾に「くらいです」をつける。
ただし、この場合、応答としてユーザの要求を満足する
ものではないので、応答テキスト生成ブロック４では、
例えば子供の声で音声出力を行うためのテキストと音声
合成に関する指示を音声合成ブロック３に出力する。Thus, the response text generation block 4
Then, based on this, we will try to express the 2.1 hours obtained by the above calculation as "2 hours 6 minutes". However, here, if the minute unit of the calculation result is unreliable, the "six minutes" portion of the response is likely to be incorrect (lie). Therefore, in the response text generation block 4, the expression is increased in the degree of abstraction, and the expression is determined to be "2 hours" without saying a specific minute unit. Further, the response text generation block 4 appends “approx.” To the end of the response text to reflect that the answer cannot be answered any more.
However, in this case, the response does not satisfy the user's request.
For example, a text for performing voice output with a child's voice and an instruction regarding voice synthesis are output to the voice synthesis block 3.

【００６１】音声合成ブロック３では、応答テキスト生
成ブロック４から、「２時間くらいです」というテキス
トと音声合成に関する指示を受け取る。これにより、音
声合成ブロック３は、「２時間くらいです」という子供
の声の音声出力のデータを生成し、スピーカから応答と
して出力する。The speech synthesis block 3 receives from the response text generation block 4 a text “about 2 hours” and an instruction regarding speech synthesis. As a result, the voice synthesis block 3 generates the voice output data of the child's voice “about 2 hours” and outputs it as a response from the speaker.

【００６２】次に、本発明の第２の実施の形態の音声応
答システム構成として、ユーザのスケジュールを管理す
るスケジューラを例に挙げる。以下、当該第２の実施の
形態のシステムの動作について、図１及び図３を用いて
具体的に説明する。Next, as a configuration of the voice response system according to the second embodiment of the present invention, a scheduler for managing a user's schedule will be described as an example. Hereinafter, the operation of the system according to the second embodiment will be specifically described with reference to FIGS.

【００６３】先ず、初期設定として、ユーザが前記入力
ブロック１を操作し、予定や電話番号などの個人情報を
得るモードとする。First, as an initial setting, a mode is set in which the user operates the input block 1 to obtain personal information such as schedules and telephone numbers.

【００６４】次に、ステップＳ３１として、ユーザが例
えば「明日の予定は？」という質問を入力する。ここで
は、質問入力のためのインターフェイスとして音声入力
を想定し、ユーザがシステムのマイクロホンに向かって
喋るものとする。マイクロホンからの入力音声は、入力
ブロック１内で音声認識処理にかけられ、テキストデー
タに変換される。Next, as step S31, the user inputs, for example, a question "What is the schedule for tomorrow?" Here, it is assumed that a voice input is assumed as an interface for inputting a question, and a user speaks into a microphone of the system. The voice input from the microphone is subjected to voice recognition processing in the input block 1 and is converted into text data.

【００６５】入力解析ブロック２は、入力ブロック１か
ら送られたテキストデータを解析し、対応する検索処理
を決定する。すなわち、入力解析ブロック２は、処理を
決定するためのキーワードのリストを持っており、先ず
はこのキーワードにマッチする言葉を捜す。The input analysis block 2 analyzes the text data sent from the input block 1 and determines a corresponding search process. That is, the input analysis block 2 has a list of keywords for determining processing, and first searches for a word that matches this keyword.

【００６６】ここで、「明日の予定は？」という質問か
らは、キーワードにマッチする言葉として「明日」「予
定」の２つの言葉を見つける。そして、入力解析ブロッ
ク２は、「予定」というキーワードから予定表の検索を
決定し、また、「明日」というキーワードから明日につ
いて調べることを決定する。Here, from the question "What is the plan for tomorrow?", Two words "tomorrow" and "plan" are found as words that match the keyword. Then, the input analysis block 2 determines to search the schedule table from the keyword “planned”, and determines to check tomorrow from the keyword “tomorrow”.

【００６７】このため、入力解析ブロック２は、ステッ
プＳ３２として、検索ブロック５に対して「明日の予定
のデータを取得する」ように指示し、また、応答テキス
ト生成ブロック４には「明日の予定を全て知らせる」よ
うに指示する。ただし、この場合の入力解析ブロック２
は、ユーザの質問の形式から、ユーザが知りたいのは概
略だと判断し、予定の詳しい時間、詳しい内容までは言
わないようにする。Therefore, in step S32, the input analysis block 2 instructs the search block 5 to “acquire scheduled data of tomorrow”, and the response text generation block 4 specifies “acquired data of tomorrow”. Let me know everything. " However, in this case, the input analysis block 2
Determines that the user wants to know roughly from the form of the user's question, and does not say the detailed time and details of the schedule.

【００６８】上記入力解析ブロック２から検索の指示を
受けた検索ブロック５は、ステップＳ３３として、デー
タベースブロック６にアクセスして明日の予定を取り出
す。なお、このとき取り出された情報としては、「９：
００−１１：００テニスat 品川」、「１４：００−
買い物 at 新宿 with 友達」であるとする。当該
検索ブロック５での検索により得られた情報は、応答テ
キスト生成ブロック４に送られる。The search block 5 receiving the search instruction from the input analysis block 2 accesses the database block 6 and extracts the schedule of tomorrow as step S33. The information extracted at this time is “9:
00-11: 00 tennis at Shinagawa "," 14: 00-
Shopping at Shinjuku with friends. " Information obtained by the search in the search block 5 is sent to the response text generation block 4.

【００６９】次に、応答テキスト生成ブロック４は応答
テキストを生成する。この時、先ず時間については表現
について決定するが、上述のように入力解析ブロック２
から「明日の予定の概略を出力する」ことが知らされて
いるため、当該応答テキスト生成ブロック４では、時間
については表現の抽象度を上げるように決定する。例え
ば、テキスト生成用データ格納ブロック８を参照すると
例えば「朝」「昼」「夜」の３つの分類があり、そこ
で、応答テキスト生成ブロック４は、明日の予定のうち
「９：００−１１：００」は「朝」、「１４：００−
（１４時以降）」は「昼」に相当するものと決定する。
また、予定の内容については、場所、相手を省略し、用
件だけ言うようにする。すなわち、当該応答テキスト生
成ブロック４は、ステップＳ３４として、「午前テニ
ス」「午後買い物」のような応答テキストを生成す
る。Next, the response text generation block 4 generates a response text. At this time, the time is first determined for the expression, but as described above, the input analysis block 2
, The response text generation block 4 determines to increase the abstraction level of the expression. For example, referring to the text generation data storage block 8, there are, for example, three classifications of "morning", "day", and "night". Therefore, the response text generation block 4 selects "9: 00-11:""00" is "morning", "14: 00-
(After 14:00) "is determined to correspond to" noon ".
Regarding the contents of the schedule, the place and the partner are omitted, and only the business is said. That is, the response text generation block 4 generates a response text such as “Tennis in the morning” or “Shopping in the afternoon” in step S34.

【００７０】次に、応答テキスト生成ブロック４では、
文の語尾及び音声合成に関する指示を決定する。この第
２の実施の形態のように、スケジューラの場合はユーザ
入力の解釈に対して他に間違えようが無く、応答のデー
タもユーザ自身が入れたものであるため正確で正しいと
すると、応答の信頼性は十分である。そこで、応答テキ
スト生成ブロック４は、ステップＳ３５として、語尾の
表現として言い切り型の「です」を使い、大人の声の大
きくはっきりした口調で音声出力行うためのテキストと
音声合成に関する指示を音声合成ブロック３に出力す
る。Next, in the response text generation block 4,
Determine sentence endings and instructions for speech synthesis. As in the second embodiment, in the case of the scheduler, there is no other mistake in the interpretation of the user input, and if the response data is accurate and correct because the user himself has entered the response data, Reliability is sufficient. Therefore, in step S35, the response text generation block 4 uses the disjoint type "is" as the expression of the ending, and issues a speech and voice synthesis instruction for performing voice output in a large and clear tone of an adult voice. Output to 3.

【００７１】音声合成ブロック３は、応答テキスト生成
ブロック４から、「午前中テニスで、午後は買い物で
す。」というテキストと音声合成に関する指示を受け取
る。これにより、音声合成ブロック３は、ステップＳ３
９として、「午前中テニスで、午後は買い物です。」と
いう大人の声で大きくはっきりした口調の音声出力のデ
ータを生成し、スピーカから応答として出力する。The voice synthesizing block 3 receives from the response text generating block 4 the text “Tennis in the morning and shopping in the afternoon” and instructions on voice synthesis. Thereby, the speech synthesis block 3 performs step S3
As 9, the voice output data of a large and clear tone is generated by the voice of an adult saying, "Tennis in the morning and shopping in the afternoon."

【００７２】ただし、上記の例ように「明日の予定は
？」とユーザが聞いた時に、日時、用件を詳しく返答す
ることがあってもよい。However, when the user asks "What is the schedule for tomorrow?" As in the above example, the date and time and the business may be answered in detail.

【００７３】例えば、入力解析ブロック２において、履
歴管理ブロック９の情報から「ユーザに概略を言うと、
その後詳しい日時を聞き直されることが多い。」ことが
分かったとする。この場合は、入力解析ブロック２で
は、現在の入力の表現よりも履歴を優先し、応答テキス
ト生成ブロック４には「明日の予定は１件１件を詳しく
しらせる」ように指示する。For example, in the input analysis block 2, “from the information of the history management block 9,
After that, the date and time are often asked again. " In this case, in the input analysis block 2, the history is prioritized over the expression of the current input, and the response text generation block 4 is instructed to "detail each tomorrow's schedule."

【００７４】この場合、応答テキスト生成ブロック４で
は、「午前９時から１１までテニス、午後２時からは買
い物です。」という、大人の声の大きくはっきりした口
調で音声出力行うためのテキストと音声合成に関する指
示を音声合成ブロック３に出力する。In this case, in the response text generation block 4, a text and voice for performing voice output in a loud and clear tone of an adult voice such as “Tennis from 9 am to 11 am and shopping from 2 pm” An instruction regarding synthesis is output to the speech synthesis block 3.

【００７５】音声合成ブロック３は、応答テキスト生成
ブロック４から当該テキストと音声合成に関する指示を
受け取ると、「午前９時から１１までテニス、午後２時
からは買い物です。」という大人の声で大きくはっきり
した口調の音声出力のデータを生成し、スピーカから応
答として出力する。When the voice synthesis block 3 receives the text and the voice synthesis instruction from the response text generation block 4, the voice synthesis block 3 speaks out loud as an adult saying, "Tennis from 9 am to 11 am, shopping from 2 pm." Generates voice output data with a clear tone and outputs it as a response from the speaker.

【００７６】一方、例えば、ユーザが入力ブロック１を
操作して個人情報を得るモードを指定後、例えばステッ
プＳ３６として「明日の午前中の予定は？」という質問
を入力した場合を考える。なお、この場合も、前述と同
様、質問入力のためのインターフェイスとして音声入力
を想定し、ユーザがシステムのマイクロホンに向かって
喋るものとする。マイクロホンからの入力音声は、入力
ブロック１内で音声認識処理にかけられ、テキストデー
タに変換される。On the other hand, for example, let us consider a case where the user operates the input block 1 to specify a mode for obtaining personal information, and then inputs, for example, the question "What is the schedule for tomorrow morning?" Also in this case, as in the above, it is assumed that voice input is assumed as an interface for inputting a question, and the user speaks into the microphone of the system. The voice input from the microphone is subjected to voice recognition processing in the input block 1 and is converted into text data.

【００７７】この場合、入力解析ブロック２は、入力ブ
ロック１から送られたテキストデータを解析し、対応す
る検索処理を決定する。入力解析ブロック２は、処理を
決定するためのキーワードのリストを持っており、先ず
はこのキーワードにマッチする言葉を捜す。In this case, the input analysis block 2 analyzes the text data sent from the input block 1 and determines a corresponding search process. The input analysis block 2 has a list of keywords for deciding the processing, and first searches for a word that matches this keyword.

【００７８】ここで、「明日の午前中の予定は？」とい
う質問からは、キーワードにマッチする言葉として「明
日」「午前中」「予定」の３つの言葉を見つける。そし
て、入力解析ブロック２は、「明日」「予定」というキ
ーワードの組み合わせから「明日の予定を検索する」こ
とを決定する。また、このときの入力解析ブロック２
は、ステップＳ３２として、検索ブロック５に対して
「明日の予定のデータを取得する」ように指示し、応答
テキスト生成ブロック４には「明日の午前中の予定のみ
知らせる」ように指示する。また、この例の場合、入力
解析ブロック２は、ユーザの質問の形式から、ユーザが
詳しい時間まで知りたいと判断し、予定の詳しい時間、
ついでに詳しい内容まで言うようにする。Here, from the question "What is the plan for tomorrow morning?", Three words "tomorrow", "morning" and "plan" are found as words that match the keyword. Then, the input analysis block 2 determines to “search for the tomorrow's schedule” from the combination of the keywords “tomorrow” and “schedule”. At this time, the input analysis block 2
Instructs the search block 5 to "acquire the data of the tomorrow's schedule" in step S32, and instructs the response text generation block 4 to "inform only the schedule of tomorrow morning". Further, in this case, the input analysis block 2 determines from the format of the user's question that the user wants to know the detailed time,
In addition, try to say more details.

【００７９】上記入力解析ブロック２から検索の指示を
受けた検索ブロック５は、ステップＳ３３として、デー
タベースブロック６にアクセスして明日の予定を取り出
す。当該検索ブロック５での検索により得られた情報
は、応答テキスト生成ブロック４に送られる。The search block 5 receiving the search instruction from the input analysis block 2 accesses the database block 6 and extracts the schedule of tomorrow as step S33. Information obtained by the search in the search block 5 is sent to the response text generation block 4.

【００８０】次に、応答テキスト生成ブロック４は応答
テキストを生成する。この時の応答テキスト生成ブロッ
ク４は、入力解析ブロック２から「午前中の予定のみ出
力する」ことが知らされているため、当該応答テキスト
生成ブロック４では、ステップＳ３７として、「１４：
００−」の予定は午前中の予定ではないので、出力から
除外するが、予定の内容については時間、場所、用件全
てを出力用に残す。また、当該応答テキスト生成ブロッ
ク４では、時間をできるだけ詳しく知らせるために、
「９：００−１１：００」を「９時から１１まで」とい
う表現にする。Next, the response text generation block 4 generates a response text. At this time, the response text generation block 4 is informed by the input analysis block 2 that “only the schedule in the morning is output”, so the response text generation block 4 determines “14:
Since the schedule of "00-" is not a schedule in the morning, it is excluded from the output, but the contents of the schedule are all output for time, place, and business. Also, in the response text generation block 4, in order to inform the time as much as possible,
"9: 00-111: 00" is expressed as "from 9:00 to 11".

【００８１】次に、応答テキスト生成ブロック４では、
文の語尾及び音声合成に関する指示を決定する。この例
では、ユーザ入力の解釈に対して他に間違えようが無
く、応答のデータもユーザ自身が入れたもので正しいと
すると、応答の信頼性は十分である。そこで、応答テキ
スト生成ブロック４では、ステップＳ３８として、語尾
の表現は言い切り型の「です」を使い、「９時から１１
時まで品川でテニスです。」という、大人の声の大きく
はっきりした口調で音声出力行うためのテキストと音声
合成に関する指示を音声合成ブロック３に出力する。Next, in the response text generation block 4,
Determine sentence endings and instructions for speech synthesis. In this example, if there is no other mistake in the interpretation of the user input and the data of the response is correct by the user himself, the reliability of the response is sufficient. Therefore, in the response text generation block 4, as a step S38, the expression of the ending is used as a disjunctive type "is", and "from 9:00 to 11
It is tennis in Shinagawa until time. To the voice synthesizing block 3 for outputting a text and voice synthesis for voice output in a large and clear tone of an adult voice.

【００８２】音声合成ブロック３は、応答テキスト生成
ブロック４からテキストと音声合成に関する指示を受け
取ると、ステップＳ４０として、「９時から１１時まで
品川でテニスです。」という大人の声で大きくはっきり
した口調の音声出力のデータを生成し、スピーカから応
答として出力する。When the voice synthesis block 3 receives the instruction regarding the text and voice synthesis from the response text generation block 4, in step S 40, the voice of an adult saying “Tennis is played at Shinagawa from 9:00 to 11:00” is clearly pronounced. The tone output data is generated and output as a response from the speaker.

【００８３】次に、図４には、本発明の音声応答システ
ムを実現するためのハードウェア構成を示す。Next, FIG. 4 shows a hardware configuration for realizing the voice response system of the present invention.

【００８４】この図４において、入力部１１は、図１の
入力ブロック１に対応し、例えばボタン、キーボード、
タッチパネル、マイクロホンなどのインターフェイスを
備えている。ユーザは、当該入力部１１から質問やキー
ワードを入力する。In FIG. 4, an input unit 11 corresponds to the input block 1 of FIG.
It has an interface such as a touch panel and a microphone. The user inputs a question or a keyword from the input unit 11.

【００８５】ＣＰＵ１２は、システム各部の制御、各種
プログラム処理を実行する。当該ＣＰＵ１２は、図１の
入力解析ブロック２、音声合成ブロック３、応答テキス
ト生成ブロック４等における各種処理を受け持つ。The CPU 12 controls various parts of the system and executes various program processes. The CPU 12 is responsible for various processes in the input analysis block 2, the speech synthesis block 3, the response text generation block 4, and the like in FIG.

【００８６】ＲＯＭ１３は、固定データや固定のプログ
ラムを記憶する。前述のテキスト生成、音声合成の各種
データや処理のためのプログラムも、当該ＲＯＭ１３に
記憶されている。The ROM 13 stores fixed data and fixed programs. The above-described text generation and speech synthesis data and programs for processing are also stored in the ROM 13.

【００８７】ＲＡＭ１４は、ＣＰＵ１２でのプログラム
処理中に必要なデータを一時保存する。ユーザからの入
力や、テキスト生成、音声合成の結果、通信部１６で得
た情報などが当該ＲＡＭ１４に一時的に保存される。The RAM 14 temporarily stores necessary data during the processing of the program by the CPU 12. The input from the user, the result of text generation and speech synthesis, information obtained by the communication unit 16, and the like are temporarily stored in the RAM.

【００８８】補助記憶部１５は、フラッシュメモリやＥ
ＥＰＲＯＭからなり、追加、書き換えなどを行うデータ
でプログラム処理中以外でも常に残しておきたいデータ
を保存する。図１の履歴管理ブロック９の個人情報や、
通信で得たデータ、その他、ＣＰＵの処理で使うデータ
等が、当該補助記憶部１５に記憶される。The auxiliary storage unit 15 stores a flash memory or an E
It is composed of an EPROM and stores data to be always added to data to be added, rewritten, and the like, even during program processing. The personal information of the history management block 9 in FIG.
Data obtained by communication, other data used in the processing of the CPU, and the like are stored in the auxiliary storage unit 15.

【００８９】通信部１６は、システムがネットワーク経
由で情報を取得したり、発信したりする時に使用するも
のである。例えば、電話、インターネット、赤外線、ラ
ジオなどの通信を行う。The communication unit 16 is used when the system obtains or transmits information via a network. For example, communication such as telephone, the Internet, infrared rays, and radio is performed.

【００９０】スピーカ部１７は、音声合成された応答な
どの音声の出力を行う。The speaker section 17 outputs a voice such as a voice-synthesized response.

【００９１】表示部１８は、液晶パネルなどのディスプ
レイ装置を備えてなり、いわゆるＧＵＩ（Graphical Us
er Interface）の画面を出力したり、表、リストなど、
各種情報を表示したりする。The display section 18 includes a display device such as a liquid crystal panel, and is a so-called GUI (Graphical US).
er Interface) screen, tables, lists, etc.
Display various information.

【００９２】なお、本発明実施の形態のシステムとして
は、補助記憶部１５、通信部１６、表示部１８は、必ず
しも必要ない。The auxiliary storage unit 15, the communication unit 16, and the display unit 18 are not necessarily required in the system according to the embodiment of the present invention.

【００９３】上述したように、本発明実施の形態によれ
ば、ユーザからの質問に対する応答文を音声出力する時
の口調、音質などを、応答の信頼度に応じて変えるよう
にしているため、ユーザは自分の要求に合った応答が返
ってきているかどうかある程度知りつつ、音声出力を聞
くことができる。また、本実施の形態によれば、音声出
力を聞くだけの場合でも、システムの状態が分かりやす
くなり、次の対処（例えばちゃんと聞く、不足条件を入
れる、別の質問をするなど）を早く行うことが可能であ
る。更に、本実施の形態によれば、応答出力中のデータ
表現の抽象度を変えることにより、その正確性の程度を
表したり、出力を簡潔にしたりすることができる。As described above, according to the embodiment of the present invention, the tone, sound quality, and the like when outputting a response sentence to a question from a user by voice are changed according to the reliability of the response. The user can hear the audio output while knowing to some extent whether or not a response matching his / her request has been returned. Further, according to the present embodiment, even if the user simply listens to the audio output, the state of the system can be easily understood, and the next countermeasure (for example, listening properly, entering an insufficient condition, asking another question, etc.) is performed quickly. It is possible. Further, according to the present embodiment, by changing the abstraction of the data expression in the response output, it is possible to indicate the degree of accuracy and to simplify the output.

【００９４】[0094]

【発明の効果】以上の説明で明らかなように、本発明の
音声応答装置は、入力された情報に基づいてデータ検索
を行い、その検索したデータから応答文を生成し、合成
音として出力する場合において、応答の信頼性を反映し
た応答文を生成すること、或いは、検索したデータの正
確性に応じて当該応答文の表現の抽象度を変えること、
若しくは、入力した情報及び履歴から推測した正確性の
要求に応じて応答文の表現の抽象度を変えることによ
り、ユーザ等から入力された質問に対して応答システム
から応答を返す際に、その応答の信頼性についてユーザ
に知らせることが可能であり、また、応答の詳細度を適
応的に変えることも可能である。As is apparent from the above description, the voice response apparatus of the present invention performs a data search based on the input information, generates a response sentence from the searched data, and outputs it as a synthesized sound. Generating a response sentence reflecting the reliability of the response, or changing the degree of abstraction of the expression of the response sentence according to the accuracy of the retrieved data;
Alternatively, by changing the degree of abstraction of the expression of the response sentence according to the request for accuracy estimated from the input information and the history, when the response system returns a response to the question input from the user or the like, the response is returned. It is possible to inform the user about the reliability of the response, and it is also possible to adaptively change the level of detail of the response.

【００９５】すなわち、本発明によれば、応答文を音声
出力する時の口調、音質などを、応答の信頼度に応じて
変えることで、ユーザは自分の要求に合った応答が返っ
てきているかどうかある程度知りつつ、音声出力を聞く
ことができ、また、音声出力を聞くだけの場合でも、シ
ステムの状態が分かりやすくなり、次の対処が早くな
る。さらに、本発明によれば、応答出力中のデータ表現
の抽象度を変えることにより、その正確性の程度を表し
たり、出力を簡潔にしたりすることができる。That is, according to the present invention, by changing the tone and sound quality when outputting a response sentence according to the reliability of the response, whether the user has returned a response that meets his or her request. It is possible to listen to the audio output while knowing to some extent, and even if only the audio output is heard, the state of the system becomes easy to understand, and the next countermeasure becomes faster. Further, according to the present invention, by changing the degree of abstraction of the data expression in the response output, it is possible to indicate the degree of accuracy and simplify the output.

[Brief description of the drawings]

【図１】本発明実施の形態の音声応答システムの概略構
成を示す機能ブロック図である。FIG. 1 is a functional block diagram illustrating a schematic configuration of a voice response system according to an embodiment of the present invention.

【図２】第１の実施の形態のシステムの一動作例の流れ
を示すフローチャートである。FIG. 2 is a flowchart illustrating a flow of an operation example of the system according to the first embodiment.

【図３】第２の実施の形態のシステムの一動作例の流れ
を示すフローチャートである。FIG. 3 is a flowchart illustrating a flow of an operation example of the system according to the second embodiment;

【図４】本発明の音声応答システムを実現するためのハ
ードウェア構成を示すブロック回路図である。FIG. 4 is a block circuit diagram showing a hardware configuration for realizing the voice response system of the present invention.

[Explanation of symbols]

１入力ブロック、２入力解析ブロック、３音
声合成ブロック、４応答テキスト生成ブロック、５
検索ブロック、６データベースブロック、７
音声合成用データ格納ブロック、８テキスト生成用
データ格納ブロック、９履歴管理ブロック、１１
入力部、１２ＣＰＵ、１３ＲＯＭ、１４Ｒ
ＡＭ、１５補助記憶部、１６通信部、１７
スピーカ部、１８表示部1 input block, 2 input analysis block, 3 speech synthesis block, 4 response text generation block, 5
Search block, 6 Database block, 7
Speech synthesis data storage block, 8 Text generation data storage block, 9 History management block, 11
Input unit, 12 CPU, 13 ROM, 14 R
AM, 15 auxiliary storage unit, 16 communication unit, 17
Speaker part, 18 display part

フロントページの続き (72)発明者角田弘史東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者加藤靖彦東京都品川区北品川６丁目７番35号ソニー株式会社内Ｆターム(参考） 5D015 AA05 KK04 5D045 AA07 AB21 Continuing from the front page (72) Inventor Hirofumi Tsunoda 6-35, Kita-Shinagawa, Shinagawa-ku, Tokyo Inside Sony Corporation (72) Inventor Yasuhiko Kato 6-35, Kita-Shinagawa, Shinagawa-ku, Tokyo Sony Stock In-house F term (reference) 5D015 AA05 KK04 5D045 AA07 AB21

Claims

[Claims]

An input unit for inputting information; a search unit for performing data search based on the input information; a response sentence generation unit for generating a response sentence from the searched data; And a synthetic sound conversion output means for converting the sound into sound and outputting the sound. The response sentence generating means, when generating a response sentence based on the retrieved data, generates a response sentence reflecting the reliability of the response A voice response device.

2. An input unit for inputting information, a search unit for performing data search based on the input information, a response sentence generation unit for generating a response sentence from the searched data, and a synthesis of the response sentence And a synthetic sound conversion output means for converting the response data into sound. The response sentence generating means, when generating a response sentence based on the retrieved data, responds to the response according to the accuracy of the retrieved data. A voice response device characterized by changing the degree of abstraction of sentence expression.

An input unit for inputting information; a search unit for performing data search based on the input information; a response sentence generation unit for generating a response sentence from the searched data; A history management unit that manages a history of responses to the input; and a synthesized sound conversion output unit that converts the response sentence into a synthesized sound and outputs the synthesized sound. The response sentence generation unit includes a response based on the searched data. When generating a sentence, the voice response apparatus changes the degree of abstraction of the expression of the response sentence according to a request for accuracy estimated from the input information and the history.