US20110184738A1 - Navigation and orientation tools for speech synthesis - Google Patents
Navigation and orientation tools for speech synthesis Download PDFInfo
- Publication number
- US20110184738A1 US20110184738A1 US13/012,989 US201113012989A US2011184738A1 US 20110184738 A1 US20110184738 A1 US 20110184738A1 US 201113012989 A US201113012989 A US 201113012989A US 2011184738 A1 US2011184738 A1 US 2011184738A1
- Authority
- US
- United States
- Prior art keywords
- text
- indicator
- read
- reading
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000015572 biosynthetic process Effects 0.000 title description 8
- 238000003786 synthesis reaction Methods 0.000 title description 8
- 238000000034 method Methods 0.000 claims abstract description 41
- 238000005516 engineering process Methods 0.000 abstract description 2
- 208000029257 vision disease Diseases 0.000 abstract 1
- 230000004393 visual impairment Effects 0.000 abstract 1
- 239000011521 glass Substances 0.000 description 46
- 230000008859 change Effects 0.000 description 10
- 230000033001 locomotion Effects 0.000 description 8
- 238000013518 transcription Methods 0.000 description 7
- 230000035897 transcription Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 230000001360 synchronised effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 230000007306 turnover Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 238000004148 unit process Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
- G06F3/0488—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
- G06F3/04883—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/06—Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
- G09B5/062—Combinations of audio and printed presentations, e.g. magnetically striped cards, talking books, magnetic tapes with printed texts thereon
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
Definitions
- the present invention is in the field of navigation and orientation tools for speech synthesis.
- Speech synthesis is the artificial production of human speech.
- a computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware.
- a text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.
- the existing products/applications are far from being comfortable for the end users.
- a text-to-speech (TTS) system (or “engine”) is composed of two parts: a front-end and a back-end.
- the front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization.
- the front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences.
- the process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end.
- the back-end —often referred to as the synthesizer—then converts the symbolic linguistic representation into sound.
- the engine will provide portrayed text indications every time a new sentence or a new word or a new character collectively referred hereunder as the “text” is being output by the back-end. Based on these indications the system will mark for example but not limited to portraying a magnifying glass over text being read, providing the user with orientation of the current text being read.
- the engine will provide portrayed line indications every time the text being read, where that text is the next line or in the previous line relatively to the text that was read immediately before the current text.
- a line indication can be for example portraying a small needle at the beginning of the line that is currently being read.
- the user may click, double click, drag, use a single touch or a multitouch gesture applied on over the portrayed text indicator in order to start or stop playback of the TTS engine.
- the user may drag, use a single touch or a multitouch gesture applied on the portrayed text indicator in order to set a new reading point for playback of the TTS engine.
- the user may drag, use a single touch or a multitouch gesture applied on the portrayed text indicator in order to set a new reading point for playback of the TTS engine. Where said reading point is not in the same page of the book.
- FIG. 1 illustrates platforms that can be used in a system in accordance with certain embodiments of the invention
- FIG. 2 a illustrates a system architecture in accordance with certain embodiments of the invention
- FIG. 2 b is a screen layout of a text and associated controls in accordance with certain embodiments of the invention
- FIGS. 3 a , 3 b , 3 c , and 3 d are screen layouts of a text and associated controls in accordance with certain embodiments of the invention
- FIG. 4 is a screen layout of a text and associated controls in accordance with certain embodiments of the invention.
- FIGS. 5 a and 5 b are screen layouts of displayed controls in accordance with certain embodiments of the invention.
- FIGS. 6 a , 6 b , 6 c , 6 d , and 6 e are screen layouts of displayed controls in accordance with certain embodiments of the invention
- FIG. 7 is a screen layout of a text and associated controls in accordance with certain embodiments of the invention.
- FIG. 8 illustrates a flow chart of a sequence of operations in a system in accordance with certain embodiments of the invention
- FIG. 9 illustrates a flow chart of a sequence of operations in a system in accordance with certain embodiments of the invention.
- FIG. 10 illustrates a flow chart of a sequence of operations in a system in accordance with certain embodiments of the invention.
- the current invention describes a system and methods for users that need to read text from a display.
- This system is useful for mobile users that would like the computer to read the text for them since they are in constant movement, say walking, driving, on a train, were they often need to move their eyes from the display and therefore lose the last point they read in the text, making it impossible to have a continuous experience.
- the system is reading the text for the user and the current word and line text is being highlighted keeping the listener with visual orientation to the text being read.
- the system includes portrayed text indicators, in this case a word indicator 207 and a portrayed line indicator 208 .
- portrayed text indicators help the user to immediately focus on the text and keep the user in context of the text being read.
- the portrayed word indicator 207 changes to the next word as the TTS engine progress through the text.
- the text will scroll to be aligned with the portrayed line indicator when a new line is going to start being read by the TTS engine.
- the control button has navigation buttons 219 and 211 helping the user to move forward and backward in the text.
- the Timeline 217 represents the duration it will take reading through the entire text
- the knob 214 represent the current time of the text being read.
- the system is generating 1) A portray indicating the word being read 207 , 2) portray indicating the line of the word being read 208 and, 3) Time indication by moving the knob 214 to the respective point in time of that word in the entire reading sequence. Also the time elapsed 220 , time remaining 221 , and page number 215 is updated accordingly.
- the timeline has a knob 214 that represents the current time of the text being read, that knob 214 is moving along the timeline as the reading progresses, giving the user the feeling of continuity and that he is in progress while listening to the text, the progress bar gives the user further indication of the pace the text is being read and since progress is captured by the mind as something positive, it actually encourages the user to keep on listening to the text and complete the task of listening to the entire text.
- the knob 214 is also used for navigation by scrolling the knob 214 along the time line the user can access any point in the text. When scrolling the knob the text in the display area 206 is moved accordingly keeping sync with the knob 214 .
- the system may be operated in different ways, in some embodiments of the present invention the system maybe a generic capability of the device system (also known as a system service, drive, resident, etc.) where the user can use the system to read for example: his emails, word documents, pdf files and webpage's for the user and the system will read the text for the user including the highlight of the words, lines, timeline and control buttons within his application.
- the system is implemented as an application where the user paste clips of at least text copied for example from his emails, word documents, pdf files and webpage's and then ask the application to read through the text using the highlight of the words, lines, timeline and control buttons.
- FIG. 1 demonstrates a N AVIGATION AND ORIENTATION TOOLS FOR SPEECH SYNTHESIS implemented on a PC, MAC, Tablet, and Smart phone.
- the mouse, keyboard, touch screen, touch gestures 101 , stylus 103 and voice activated 102 are examples of input devices enabling pointing and selecting referred here under as “selection”, “selections”, “selected”, “selects”, etc.
- a mouse, keyboard, touch screen, touch gestures 101 , stylus 103 and voice activated 102 are examples of input devices that enables text scrolling referred hereunder as “scroll”, “scrolls”, “scrolling”, etc.
- buttons or menus will be refereed collectively herein as “buttons”. Such buttons can be operated by touch screen, touch gestures 101 , stylus 103 , mouse, keyboard, and voice control 102.
- FIG. 2 a shows a text-to-speech (TTS) system (or “engine”) is composed of two parts: a front-end 202 and 203 and a back-end 205 .
- the front-end 202 has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words 203 is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end 203 .
- the back-end 205 (often referred to as the synthesizer—then converts the symbolic linguistic representation into sound.
- the present invention also has a Navigation and Orientation Unit 201 and Synchronization Unit 204 .
- the Synchronization unit 204 is responsible for taking the output of the front-end 203 and feed it synchronically to the Wave Form Generation 205 and the Navigation and Orientation Unit 201 .
- the Synchronization unit 204 may Synchronize the Wave Form Generation 205 and the Navigation and Orientation Unit 201 for every new word.
- the Synchronization Unit 204 may Synchronize the Wave Form Generation 205 and the Navigation and Orientation Unit 201 for every character, line, sentence, paragraph, bookmark, or page.
- the Synchronization Unit 204 may Synchronize the Wave Form Generation 205 and the Navigation and Orientation Unit 201 for every segment of the text as defined by the application. Synchronization is achieved for example, in the case were the Synchronization Unit 204 synchronizes for every word, by having the Synchronization Unit 204 waiting for the word to be played by the Wave Form Generation 205 , and only then continue to the next word if such word exists. By waiting for the word to be played the system achieve synchronization.
- the Navigation and Orientation Unit 201 has Human Interface—UI, that enables the user to interact with the displayed text and other UI parts such as buttons.
- UI Human Interface
- the user may start a TTS session by ordering the Text Analysis Unit 202 to start reading text by selecting a “Play” button.
- the user interacts with the Navigation and Orientation Unit 201 UI, the Navigation and Orientation Unit 201 may change the text feed to the Text Analysis Unit 202 and start a new TTS session.
- FIGS. 3 a - d show a portrayed text indicator 302 that is portrayed on the text 301 with as a magnifying glass effect.
- the magnifying glass 302 surrounds at least one character in the text 301 .
- the magnifying glass 302 surrounds at least one word.
- the magnifying glass 302 is aligned to the direction of the text reading, for example when TTS Engine is reading English the magnifying glass 302 will be aligned to the left of the word that is currently being read.
- the magnifying glass 302 will not jump from word to word but rather will animate from its position on word 302 to its new position on word 308 , giving the user a progressive experience.
- FIG. 3 c and FIG. 3 d further show a line indicator 314 and 318 portrayed as a triangle.
- the line indicator 314 looks like a needle that points to the current line.
- the line indicator 314 will animate when moving from line FIG. 3 c line 2 to line FIG. 3 d line 3 .
- the portrayed text indicator animation in this case the portrayed magnifying glass over words 302 and 308 , will start when the synchronization unit 204 will send “word is about to start” event to the Navigation and orientation tool unit 201 .
- the “word is about to start” event is fired immediately after the waveform generator 205 completed synthesizing the current word 302 .
- the time between the “word is about to start” event and the time the waveform generator will start synthesizing the next word 308 is the time duration for the magnifying glass to animate between the current word 302 to the next word 308 .
- the time gap between the “word is about to start” event and actual speech synthesis of that word will be 200 ms, and that would also be the time for the magnifying glass to animate from word 302 to word 308 .
- the word indicator and the line indicator will animate from their current position 313 , 314 to their new position 318 , 319 .
- FIG. 4 illustrates a user that interacts with the portrayed text indicator 402 , in this case the portrayed magnifying glass.
- the user may click, double click, drag, touch, multitouch, use a single touch or a multitouch gesture applied on the magnifying glass 402 in order to start playback of the TTS engine.
- the user in a preferred embodiment of the present invention in a computer connected to a mouse or similar apparatus the user shall use double click on the magnifying glass 402 in order to start playback of the TTS engine.
- the user in a computer connected to a mouse or similar apparatus the user shall use a single click on the magnifying glass 402 in order to stop playback of the TTS engine.
- the user when a keyboard is connected the user may use the spacebar to toggle start and stop playback of the TTS engine.
- a device with a touch pad, touch screen, multi touch screen, or similar apparatus the user shall user a swipe gesture 403 in the direction of reading on the magnifying glass 402 in order to start playback of the TTS engine. For example, in English the user will use a swipe gesture from left to right applied on the magnifying glass 402 in order to start reading.
- the user may click, double click, drag, touch, multitouch, use a single touch or a multitouch gesture applied on the magnifying glass 402 in order to stop playback of the TTS engine.
- a device with a touchpad, touch, multi touch screen or similar apparatus the user shall use a swipe gesture 403 opposite to the direction of reading on the magnifying glass 402 in order to stop playback of the TTS engine. For example in English the user will use a swipe gesture from right to left to stop reading.
- the magnifying glass r 402 places in the entire text will be saved in memory. When resuming reading by the TTS engine the TTS engine will resume playback from the saved position enabling precise resuming.
- FIGS. 5 a - b Illustrate a case where the user drags, uses a single touch or a multitouch drag gesture applied on the portrayed text indicator 502 , in this case the portrayed magnifying glass, in order to set a new reading point for playback of the TTS engine.
- the new location the user set by the user maybe anywhere on the screen where the text 500 played by the TTS engine.
- the user shall use a dragging method on the magnifying glass 502 to set a new reading point for playback of the TTS engine.
- Dragging means pointing to the magnifying glass 502 , pressing a mouse button down and moving the magnifying glass using the mouse to a new location 509 over the text 500 while the mouse button is being pressed.
- Releasing the mouse button means that dragging is completed.
- the magnifying glass 509 will be placed on new text setting a new reading point for playback of the TTS engine.
- the line indicator 501 will also change its position pointing to the new line 507 .
- the current time indicator 504 and the time elapsed 512 and time remaining 506 will also change their position and value respectively representing a new current time 508 , and new time elapsed 513 and time remaining 510 .
- a device with a touch, multi touch screen or similar apparatus the user shall use a drag by touching the magnifying glass 502 and move it to a new location 509 on the text in order to set the current position for text to be read by the TTS engine.
- the magnifying glass 502 when the user drag the text indicator 502 using his finger 503 to a new location 509 the magnifying glass 502 will follow the finger 503 while it is being drag to the new location 509 .
- the user will use his finger 503 touching the magnifying glass 502 and then the user will drag his finger 503 over the touch screen to a new location 509 over the text.
- the magnifying glass 502 will be placed on the text setting a new reading 509 point for playback of the TTS engine.
- Drag gesture may be double tap followed by a drag motion, or a single tap and hold followed by a drag motion.
- FIGS. 6 a FIG. 6 b , and FIG. 6 c illustrate a case were the user navigates through the text using a mouse dragging, touch, multitouch, a single touch, or a multitouch gesture applied on the portrayed text indicator 601 , in this case the portrayed magnifying glass, in order to set a new reading point for playback of the TTS engine, where that playback point in the text 603 is not visible to the user at the time the users starts navigation.
- FIGS. 6 a , 6 b , and 6 c There are three methods illustrated in FIGS. 6 a , 6 b , and 6 c.
- FIG. 6 a illustrates a case were the page can be scrolled to any direction 610 .
- the user navigates through the text 603 using mouse dragging, single touch dragging, or a multitouch dragging applied on the text indicator 601 .
- the mouse pointer, finger or fingers are dragged beyond the text area border 604 , the text is scrolled, revealing new text 615 that was not visible at the time the user started the navigation,
- the user releases the mouse button, finger, or fingers it signals that the dragging is completed.
- the text indicator 601 will be placed on new text 617 , setting a new reading point for playback of the TTS engine.
- the magnifying glass 601 when the user drag the text indicator 601 using his finger 608 , the magnifying glass 601 will follow the finger 608 while it is being dragged.
- the line indicator portray 602 When the magnifying glass 601 is dragged from one position placed on one line to a new position 617 placed on a different line, the line indicator portray 602 will also change its position pointing to the new line 614 .
- the current time indicator 606 when the magnifying glass 601 is dragged from one position to a new position 617 , the current time indicator 606 the time elapsed 622 and time remaining 623 will also change their position and value respectively representing a new current time 612 , and new time elapsed 625 and time remaining 625 .
- the magnifying glass 617 when the user drag the magnifying glass 601 using his finger 608 , the magnifying glass 617 will follow the finger 620 while it is being dragged.
- the line indicator portray 602 When the magnifying glass 601 is dragged from one position placed on one line to a new position 617 placed on a different line, the line indicator portray 602 will also change its position pointing to the new line 614 .
- the current time indicator 606 and the time elapsed 622 and time remaining 623 when the magnifying glass 601 is dragged from one position to a new position 617 , the current time indicator 606 and the time elapsed 622 and time remaining 623 will also change their position and value respectively representing a new current time 612 , and new time elapsed 624 and time remaining 625 .
- FIG. 6 c illustrates a case were the metaphor used for the text presentation is portrayed as a book, were the book is made of more than one page.
- the illustration shows page flipping areas 626 . When these areas are clicked, double clicked, dragged, touched, multitouched, or applied with a single touch or a multitouch gesture the page will turn by portraying a page curl and turn over, using a multi frame animation to a different page, and the text 627 will change to a different text.
- the page 627 will turn by portraying a page curl and turn over, using a multi frame animation to a different page revealing new text 626 that was not visible at the time the user started the navigation,
- the user 632 releases the mouse button, finger, or fingers it signals that the dragging is completed.
- the magnifying glass 631 will be placed on a new text, setting new reading point for playback of the TTS engine.
- FIG. 6 d and FIG. 6 e illustrate a case were the metaphor used for the text presentation is portrayed as a book, were the book is made of more than one page and a time line 647 .
- the current time indicator 648 is moved using mouse dragging, touch dragging, multitouched dragging, or applied with a single touch or a multitouch gesture the page 634 will change to a different page 639 revealing the text that should be displayed in the new point of time 646 .
- the portrayed magnifying glass 637 when the user 638 drags the portrayed text indicator, in this case the portrayed magnifying glass 637 (using mouse dragging, single touch dragging, multitouch dragging, or touchpad dragging) over the timeline area 647 the page 634 will change to a different page 639 revealing new text that was not visible at the time the user started dragging, when the user 645 is already in a different page 639 and releases the mouse button, finger, or fingers it signals that the dragging is completed.
- the magnifying glass 646 will be placed on a new text in the different page 639 , setting new reading point for playback of the TTS engine.
- Drag gesture may be double tap followed by a drag motion, or a single tap and hold followed by a drag motion.
- FIGS. 7 a - b illustrates a case were the metaphor used for the text presentation is portrayed as a book, were the book is made of more than one page and may have a time line and other controls such as but not limited to play/stop button, skip forward backward buttons, line indicator in this case the portrayed triangle, text indicator in this case the portrayed magnifying glass, find controls, text size controls, and other navigation controls 702 .
- some or all of the controls 702 on the screen will disappear, an animation will occur and the controls that disappeared will reappear once the animation is done.
- the TTS engine when the TTS engine reads the last word of a page the page will turn to a different page and the TTS engine will continue reading from the first word in the new page.
- the text layout is in scrolling format, when the TTS engine reach the last word in the visible screen area or When the TTS engine nearly reach the last word of the visible screen area the text on the screen will scroll revealing new text to be read by the TTS engine and enable the user continuous reading.
- the TTS engine on stop mode and the user when the TTS engine on stop mode and the user is scrolling the text or flipping to a new page a new reading point will be set automatically in a visible place in the text viewing area.
- the TTS engine on stop mode and the user when the TTS engine on stop mode and the user is scrolling the text or flipping to a new page a new reading point will be set automatically at the first word of the text that is currently visible in the text viewing area.
- FIG. 8 is a flow chart for initializing the Synchronization Unit 204 , and Navigation and Orientation Unit 201 .
- the Navigation and Orientation unit is initialized 802 including display area, word, and line indicators. The time line is also initialized. If bookmarks are present they are rendered 803 for the timeline 217 representing the bookmarks associated with the entire text. If bookmarks are present they are also rendered 804 for the text presented in the display area 206 .
- Other display parameters are initialized 805 for example elapsed time, current word, current line, font, font size, page number, search string, search results, etc.
- calculating the entire text reading time 801 is done by multiplying the average time of reading a single character multiplied by the total number of characters of the entire text.
- the average time of reading a single character is depended on the TTS engine reading speed. For example when TTS engine is set to read slow the average character reading time may be 90 ms and when the TTS engine is set to read the text fast the average charter reading time my be 40 ms.
- FIG. 9 is a flow chart showing an example of a Synchronization Unit process.
- 901 gets the text to be displayed and synchronized with the TTS engine 200 .
- 902 gets “a word is about to start” event including the next word that should be displayed and synthesized by the TTS engine 200 .
- 903 is calculating the display parameters for that word. Display parameters at least one of the following: 1) the word for highlighting in a line, 2) determining the location of the word in a line, 3) determining position of word in timeline.
- 904 is transferring the word and display parameters to be displayed by the Navigation and Orientation Unit 201 .
- FIG. 10 is an example of a detailed flow chart of 903 —Calculates display parameters for next word.
- 1001 obtains access to the entire text.
- 1002 determines the next word that needs to be displayed and synchronized with the TTS engine 200 .
- 1003 is calculating the highlight of the next word.
- 1004 is determining if the word is in a new line and therefore the line indicator 208 should be updated for that word.
- 1005 calculates the respective point in time of that word in the entire reading sequence. The result of that calculation is used for determining the position of the knob 214 over the timeline 217 and for calculating and displaying the elapsed 220 and reaming remaining time 221 .
- Get next word 1002 will be triggered by a “word is about to start” event generated by the synchronization unit 204 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
TTS is a well known technology for decades used for various applications from Artificial Call centers attendants to PC software that allows people with visual impairments or reading disabilities to listen to written works on a home computer. However to date TTS is not widely adopted for PC and Mobile users for daily reading tasks such as reading emails, reading pdf and word documents, reading through website content, and for reading books. The present invention offers new user experience for operating TTS for day to day usage. More specifically this invention describes a synchronization technique for following text being read by TTS engines and specific interfaces for touch pads, touch and multi touch screens. Nevertheless this invention also describes usage of other input methods such as touchpad, mouse, and keyboard.
Description
- Priority is claimed from U.S. provisional application No. 61/297,921 entitled “Navigation and orientation tools for speech synthesis” filed 25 Jan. 2010 and from U.S. Patent Application No. 61/347,575 entitled “Navigation and orientation tools for speech synthesis” filed 24 May, 2010.
- The present invention is in the field of navigation and orientation tools for speech synthesis.
- According to Wikipedia: Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.
- Since invented, speech technology constantly improved it's abilities. Most efforts where around imitating a human voice and fluently reading while the user interface and text navigating abandoned. From the user point of view, it is still complicated to use since current common user interfaces are limited, for example:
- The existing products/applications are far from being comfortable for the end users.
-
- a. In most cases, the user needs to select, by marking the text before listening to it.
- b. If the user stops in the middle of reading, playing text again will start from the beginning of the marked text.
- c. During reading there are no text pointers and the users lost their orientation very quickly.
- d. Not using device specific input methods and apparatuses such as touchpad's, touch and multitouch screens making navigation easier and more intuitive.
- e. reading large amounts of content are almost impossible.
- f. Current audio books navigation is cumbersome
- There is a need in the art to provide new controls for text to speech navigation and reading orientation by adding new orientation abilities that will enable easy navigation through large documents, and will help readers to follow the text as it is being read by the TTS engine.
- There is a need in the art to provide a solution that will work on any device Mac/PC, Mobile Smartphone or Tablets by touch, voice, mouse or keyboard.
- According to Wikipedia: A text-to-speech (TTS) system (or “engine”) is composed of two parts: a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end—often referred to as the synthesizer—then converts the symbolic linguistic representation into sound.
- In one embodiment of the present invention the engine will provide portrayed text indications every time a new sentence or a new word or a new character collectively referred hereunder as the “text” is being output by the back-end. Based on these indications the system will mark for example but not limited to portraying a magnifying glass over text being read, providing the user with orientation of the current text being read.
- In a second embodiment of the present invention the engine will provide portrayed line indications every time the text being read, where that text is the next line or in the previous line relatively to the text that was read immediately before the current text. A line indication can be for example portraying a small needle at the beginning of the line that is currently being read.
- In a third embodiment of the present invention the user may click, double click, drag, use a single touch or a multitouch gesture applied on over the portrayed text indicator in order to start or stop playback of the TTS engine.
- In a fourth embodiment of the present invention the user may drag, use a single touch or a multitouch gesture applied on the portrayed text indicator in order to set a new reading point for playback of the TTS engine.
- In a fifth embodiment of the present invention the user may drag, use a single touch or a multitouch gesture applied on the portrayed text indicator in order to set a new reading point for playback of the TTS engine. Where said reading point is not in the same page of the book.
- In accordance with an aspect of the invention, there is provided a method for outputting a text, comprising
-
- a. indicating read text on a screen by using a portrayed text indicator, for example portraying a magnifying glass on the read text;
- b. synchronizing the read text and audio playback of the indicated text.
- In accordance with an embodiment of the invention, there is provided a method, wherein the synchronization is at word boundary.
- In accordance with an embodiment of the invention, there is further provided a method, wherein the synchronization is at sentence boundary.
- In accordance with an embodiment of the invention, there is still further provided a method, further comprising providing a scroll indicator for scrolling the text by a user dragging the scroll indicator.
- In accordance with an embodiment of the invention, there is still further provided a method, further comprising providing a page flipping indicator for flipping page by means of a user swipe gesture on the flipping indicator.
- In accordance with an embodiment of the invention, there is still further provided a method, further comprising displaying the text in a screen layout that portrays a text book.
- In accordance with an embodiment of the invention, there is still further provided a method further comprising removing text controls when portraying flipping of a page in the text book.
- In accordance with an embodiment of the invention, there is still further provided a method configured to operate on the IPAD™, IPAD™, IPOD™, IPHONE™, Android™.
- In accordance with an aspect of the invention, there is provided a method for outputting a text, comprising:
- indicating read text on a touch screen by portraying a text indicator on the read text;
- applying a swipe gesture by a user touch on the text indicator to start or stop reading the text
- synchronizing the read text and audio playback of the indicated text.
- In accordance with an embodiment of the invention, there is provided a method, wherein the direction of the swipe gesture prescribes the start or stop playback, respectively.
- In accordance with an aspect of the invention, there is provided a method for outputting a text, comprising:
- indicating read text on a touch screen by portraying a text indicator on the read text;
- dragging the text indicator by a user touch to a different location in the text;
- synchronizing the read text starting from the new position and audio playback of the indicated text.
- In accordance with an embodiment of the invention, there is provided a method, further comprising changing the position of a time indicator to reflect the text that already been processed up to the new position, wherein the time indicator indicating on the proportion of the text that has already been processed compared to the entire text passage for reading.
- In accordance with an embodiment of the invention, there is provided a method comprising:
- indicating read text on a touch screen by portraying a text indicator on the read text;
- calculating the entire reading time of a text by being proportionally to multiplying the average time required to read a character by the total characters in the text;
- portraying a time indicator to reflect the text that already been processed up to the new position, wherein the time indicator indicating on the proportion of the text that has already been processed compared to said calculated entire reading time.
- In accordance with an embodiment of the invention, there is provided a method, wherein the average time required to read a character is configurable according to the a desired text playback rate.
- In accordance with certain other aspects of the invention there are provided counterpart system configurations configured to perform the specified method steps.
- In order to understand the invention and to see how it may be carried out in practice, embodiments will now be described, by way of non-limiting example only, with reference to the accompanying drawings, in which:
-
FIG. 1 illustrates platforms that can be used in a system in accordance with certain embodiments of the invention -
FIG. 2 a illustrates a system architecture in accordance with certain embodiments of the invention -
FIG. 2 b is a screen layout of a text and associated controls in accordance with certain embodiments of the invention -
FIGS. 3 a, 3 b, 3 c, and 3 d are screen layouts of a text and associated controls in accordance with certain embodiments of the invention -
FIG. 4 is a screen layout of a text and associated controls in accordance with certain embodiments of the invention -
FIGS. 5 a and 5 b are screen layouts of displayed controls in accordance with certain embodiments of the invention -
FIGS. 6 a, 6 b, 6 c, 6 d, and 6 e are screen layouts of displayed controls in accordance with certain embodiments of the invention -
FIG. 7 is a screen layout of a text and associated controls in accordance with certain embodiments of the invention -
FIG. 8 illustrates a flow chart of a sequence of operations in a system in accordance with certain embodiments of the invention -
FIG. 9 illustrates a flow chart of a sequence of operations in a system in accordance with certain embodiments of the invention; and -
FIG. 10 illustrates a flow chart of a sequence of operations in a system in accordance with certain embodiments of the invention. - The subject matter of the present application can have features of different aspects described above, or their equivalents, in any combination thereof, which can also be combined with any feature/s of the subject matter described in the detailed description of embodiments presented below, or their equivalents.
- The current invention describes a system and methods for users that need to read text from a display. This system is useful for mobile users that would like the computer to read the text for them since they are in constant movement, say walking, driving, on a train, were they often need to move their eyes from the display and therefore lose the last point they read in the text, making it impossible to have a continuous experience. Also when reading large text even when stationary there is a need for a pointer, instead of using the finger or the mouse, the system is reading the text for the user and the current word and line text is being highlighted keeping the listener with visual orientation to the text being read. It seems that readers that are reading through text that it not written in their mother tongue or readers that are not highly skilled in reading for example kids in kindergarten and kids during their first years as students find it hard to pronounce some of the words and generally reading slower than usual will find the TTS system with Navigation and Orientation tools detailed hereunder the user reads through the text faster, and easier while also expending his language skills.
- According to
FIG. 2 a the system includes portrayed text indicators, in this case aword indicator 207 and a portrayedline indicator 208. These indicators help the user to immediately focus on the text and keep the user in context of the text being read. The portrayedword indicator 207 changes to the next word as the TTS engine progress through the text. The text will scroll to be aligned with the portrayed line indicator when a new line is going to start being read by the TTS engine. In this case since the line indicator is in a fixed place on the screen the user knows exactly were to look for the text on the display. The control button hasnavigation buttons - The
Timeline 217 represents the duration it will take reading through the entire text, theknob 214 represent the current time of the text being read. For each word that is being read by the TTS engine the system is generating 1) A portray indicating the word being read 207, 2) portray indicating the line of the word being read 208 and, 3) Time indication by moving theknob 214 to the respective point in time of that word in the entire reading sequence. Also the time elapsed 220, time remaining 221, andpage number 215 is updated accordingly. - The user can at any time see the current time in the reading 220 and the
reading time 221 to complete the reading of the text. These indications are imperative for the user to understand where he is in the process of reading and how much time remained for the reading, and plan his time accordingly. Additionally the timeline has aknob 214 that represents the current time of the text being read, thatknob 214 is moving along the timeline as the reading progresses, giving the user the feeling of continuity and that he is in progress while listening to the text, the progress bar gives the user further indication of the pace the text is being read and since progress is captured by the mind as something positive, it actually encourages the user to keep on listening to the text and complete the task of listening to the entire text. Theknob 214 is also used for navigation by scrolling theknob 214 along the time line the user can access any point in the text. When scrolling the knob the text in thedisplay area 206 is moved accordingly keeping sync with theknob 214. - The system may be operated in different ways, in some embodiments of the present invention the system maybe a generic capability of the device system (also known as a system service, drive, resident, etc.) where the user can use the system to read for example: his emails, word documents, pdf files and webpage's for the user and the system will read the text for the user including the highlight of the words, lines, timeline and control buttons within his application. In another embodiment of the present invention the system is implemented as an application where the user paste clips of at least text copied for example from his emails, word documents, pdf files and webpage's and then ask the application to read through the text using the highlight of the words, lines, timeline and control buttons.
- Note that the description above was provided for understanding the need and typical use of the system of the invention. The invention is by no means bound by this exemplary description which is provided for illustrative purposes only.
- Bearing this in mind, attention is drawn to
FIG. 1 which demonstrates a NAVIGATION AND ORIENTATION TOOLS FOR SPEECH SYNTHESIS implemented on a PC, MAC, Tablet, and Smart phone. The mouse, keyboard, touch screen, touch gestures 101,stylus 103 and voice activated 102 are examples of input devices enabling pointing and selecting referred here under as “selection”, “selections”, “selected”, “selects”, etc. Where a mouse, keyboard, touch screen, touch gestures 101,stylus 103 and voice activated 102 are examples of input devices that enables text scrolling referred hereunder as “scroll”, “scrolls”, “scrolling”, etc. When selecting controls on the screen buttons or menus will be refereed collectively herein as “buttons”. Such buttons can be operated by touch screen, touch gestures 101,stylus 103, mouse, keyboard, andvoice control 102. -
FIG. 2 a shows a text-to-speech (TTS) system (or “engine”) is composed of two parts: a front-end end 205. The front-end 202 has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions towords 203 is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end 203. The back-end 205—often referred to as the synthesizer—then converts the symbolic linguistic representation into sound. - The present invention also has a Navigation and
Orientation Unit 201 andSynchronization Unit 204. TheSynchronization unit 204 is responsible for taking the output of the front-end 203 and feed it synchronically to theWave Form Generation 205 and the Navigation andOrientation Unit 201. In one embodiment of the present invention theSynchronization unit 204 may Synchronize theWave Form Generation 205 and the Navigation andOrientation Unit 201 for every new word. In another embodiment of the present invention theSynchronization Unit 204 may Synchronize theWave Form Generation 205 and the Navigation andOrientation Unit 201 for every character, line, sentence, paragraph, bookmark, or page. In another embodiment of the present invention theSynchronization Unit 204 may Synchronize theWave Form Generation 205 and the Navigation andOrientation Unit 201 for every segment of the text as defined by the application. Synchronization is achieved for example, in the case were theSynchronization Unit 204 synchronizes for every word, by having theSynchronization Unit 204 waiting for the word to be played by theWave Form Generation 205, and only then continue to the next word if such word exists. By waiting for the word to be played the system achieve synchronization. - The Navigation and
Orientation Unit 201 has Human Interface—UI, that enables the user to interact with the displayed text and other UI parts such as buttons. In one embodiment of the present invention the user may start a TTS session by ordering theText Analysis Unit 202 to start reading text by selecting a “Play” button. In another embodiment of the present inventions the user interacts with the Navigation andOrientation Unit 201 UI, the Navigation andOrientation Unit 201 may change the text feed to theText Analysis Unit 202 and start a new TTS session. -
FIGS. 3 a-d show a portrayedtext indicator 302 that is portrayed on thetext 301 with as a magnifying glass effect. The magnifyingglass 302 surrounds at least one character in thetext 301. In a preferred embodiment of the present invention themagnifying glass 302 surrounds at least one word. In another embodiment of the present invention themagnifying glass 302 is aligned to the direction of the text reading, for example when TTS Engine is reading English themagnifying glass 302 will be aligned to the left of the word that is currently being read. In yet another preferred embodiment of the present invention themagnifying glass 302 will not jump from word to word but rather will animate from its position onword 302 to its new position onword 308, giving the user a progressive experience. By animating themagnifying glass 302 the user feels like the text is being followed with a finger in continuous motion. In another embodiment of the present invention the when the magnifyingglass 313 moves from its current lineFIG. 3 c line 2 to a new lineFIG. 3 d line 3, it will animate from the last word of theline 313 to the beginning of the first word in thenext line 319.FIG. 3 c andFIG. 3 d further show aline indicator line indicator 314 looks like a needle that points to the current line. In yet another preferred embodiment of the present invention theline indicator 314 will animate when moving from lineFIG. 3 c line 2 to lineFIG. 3 d line 3. - In one embodiment of the present invention the portrayed text indicator animation, in this case the portrayed magnifying glass over
words synchronization unit 204 will send “word is about to start” event to the Navigation andorientation tool unit 201. The “word is about to start” event is fired immediately after thewaveform generator 205 completed synthesizing thecurrent word 302. The time between the “word is about to start” event and the time the waveform generator will start synthesizing thenext word 308 is the time duration for the magnifying glass to animate between thecurrent word 302 to thenext word 308. For example in case the time gap in between words is 200 ms, the time gap between the “word is about to start” event and actual speech synthesis of that word will be 200 ms, and that would also be the time for the magnifying glass to animate fromword 302 toword 308. In another embodiment of the present invention during the time gap between the “word is about to start” event and actual speech synthesis of that word the word indicator and the line indicator will animate from theircurrent position new position -
FIG. 4 illustrates a user that interacts with the portrayedtext indicator 402, in this case the portrayed magnifying glass. In one embodiment of the present invention the user may click, double click, drag, touch, multitouch, use a single touch or a multitouch gesture applied on themagnifying glass 402 in order to start playback of the TTS engine. In a preferred embodiment of the present invention in a computer connected to a mouse or similar apparatus the user shall use double click on themagnifying glass 402 in order to start playback of the TTS engine. In another preferred embodiment of the present invention in a computer connected to a mouse or similar apparatus the user shall use a single click on themagnifying glass 402 in order to stop playback of the TTS engine. In another embodiment of the present invention when a keyboard is connected the user may use the spacebar to toggle start and stop playback of the TTS engine. In yet another preferred embodiment of the present invention a device with a touch pad, touch screen, multi touch screen, or similar apparatus the user shall user aswipe gesture 403 in the direction of reading on themagnifying glass 402 in order to start playback of the TTS engine. For example, in English the user will use a swipe gesture from left to right applied on themagnifying glass 402 in order to start reading. In one embodiment of the present invention the user may click, double click, drag, touch, multitouch, use a single touch or a multitouch gesture applied on themagnifying glass 402 in order to stop playback of the TTS engine. In yet another preferred embodiment of the present invention a device with a touchpad, touch, multi touch screen or similar apparatus the user shall use aswipe gesture 403 opposite to the direction of reading on themagnifying glass 402 in order to stop playback of the TTS engine. For example in English the user will use a swipe gesture from right to left to stop reading. In another embodiment of the present invention when the user stops the playback of the TTS engine themagnifying glass r 402 places in the entire text will be saved in memory. When resuming reading by the TTS engine the TTS engine will resume playback from the saved position enabling precise resuming. -
FIGS. 5 a-b Illustrate a case where the user drags, uses a single touch or a multitouch drag gesture applied on the portrayed text indicator 502, in this case the portrayed magnifying glass, in order to set a new reading point for playback of the TTS engine. The new location the user set by the user maybe anywhere on the screen where the text 500 played by the TTS engine. In a preferred embodiment of the present invention in a computer connected to a mouse or similar apparatus the user shall use a dragging method on the magnifying glass 502 to set a new reading point for playback of the TTS engine. Dragging means pointing to the magnifying glass 502, pressing a mouse button down and moving the magnifying glass using the mouse to a new location 509 over the text 500 while the mouse button is being pressed. Releasing the mouse button means that dragging is completed. The magnifying glass 509 will be placed on new text setting a new reading point for playback of the TTS engine. When the magnifying glass 502 is dragged from one position placed on one line to a new position 509 placed on a different line, the line indicator 501 will also change its position pointing to the new line 507. In another embodiment of the present invention when the magnifying glass 502 is dragged from one position to a new position 509, the current time indicator 504 and the time elapsed 512 and time remaining 506 will also change their position and value respectively representing a new current time 508, and new time elapsed 513 and time remaining 510. - In yet another preferred embodiment of the present invention a device with a touch, multi touch screen or similar apparatus the user shall use a drag by touching the magnifying glass 502 and move it to a new location 509 on the text in order to set the current position for text to be read by the TTS engine. In another embodiment of the present invention when the user drag the text indicator 502 using his finger 503 to a new location 509 the magnifying glass 502 will follow the finger 503 while it is being drag to the new location 509. For example in English the user will use his finger 503 touching the magnifying glass 502 and then the user will drag his finger 503 over the touch screen to a new location 509 over the text. When the user is removing his finger from the screen (i.e not touching it) that will mean that dragging is completed. The magnifying glass 502 will be placed on the text setting a new reading 509 point for playback of the TTS engine.
- Drag gesture may be double tap followed by a drag motion, or a single tap and hold followed by a drag motion.
-
FIGS. 6 aFIG. 6 b, andFIG. 6 c illustrate a case were the user navigates through the text using a mouse dragging, touch, multitouch, a single touch, or a multitouch gesture applied on the portrayedtext indicator 601, in this case the portrayed magnifying glass, in order to set a new reading point for playback of the TTS engine, where that playback point in thetext 603 is not visible to the user at the time the users starts navigation. There are three methods illustrated inFIGS. 6 a, 6 b, and 6 c. -
FIG. 6 a illustrates a case were the page can be scrolled to anydirection 610. In one embodiment of the present invention the user navigates through thetext 603 using mouse dragging, single touch dragging, or a multitouch dragging applied on thetext indicator 601. When the mouse pointer, finger or fingers are dragged beyond thetext area border 604, the text is scrolled, revealingnew text 615 that was not visible at the time the user started the navigation, When the user releases the mouse button, finger, or fingers it signals that the dragging is completed. Thetext indicator 601 will be placed onnew text 617, setting a new reading point for playback of the TTS engine. - In another embodiment of the present invention when the user drag the
text indicator 601 using hisfinger 608, the magnifyingglass 601 will follow thefinger 608 while it is being dragged. When themagnifying glass 601 is dragged from one position placed on one line to anew position 617 placed on a different line, the line indicator portray 602 will also change its position pointing to thenew line 614. In another embodiment of the present invention when the magnifyingglass 601 is dragged from one position to anew position 617, thecurrent time indicator 606 the time elapsed 622 and time remaining 623 will also change their position and value respectively representing a new current time 612, and new time elapsed 625 and time remaining 625. - In another embodiment of the present invention when the user drag the
magnifying glass 601 using hisfinger 608, the magnifyingglass 617 will follow thefinger 620 while it is being dragged. When themagnifying glass 601 is dragged from one position placed on one line to anew position 617 placed on a different line, the line indicator portray 602 will also change its position pointing to thenew line 614. In another embodiment of the present invention when the magnifyingglass 601 is dragged from one position to anew position 617, thecurrent time indicator 606 and the time elapsed 622 and time remaining 623 will also change their position and value respectively representing a new current time 612, and new time elapsed 624 and time remaining 625. -
FIG. 6 c, illustrates a case were the metaphor used for the text presentation is portrayed as a book, were the book is made of more than one page. The illustration showspage flipping areas 626. When these areas are clicked, double clicked, dragged, touched, multitouched, or applied with a single touch or a multitouch gesture the page will turn by portraying a page curl and turn over, using a multi frame animation to a different page, and thetext 627 will change to a different text. In one embodiment of the present invention when theuser 632 drags the magnifying glass 631 (using mouse dragging, single touch dragging, or a multitouch dragging) over the page flipping areas thepage 627 will turn by portraying a page curl and turn over, using a multi frame animation to a different page revealingnew text 626 that was not visible at the time the user started the navigation, When theuser 632 releases the mouse button, finger, or fingers it signals that the dragging is completed. The magnifyingglass 631 will be placed on a new text, setting new reading point for playback of the TTS engine. -
FIG. 6 d andFIG. 6 e, illustrate a case were the metaphor used for the text presentation is portrayed as a book, were the book is made of more than one page and a time line 647. When thecurrent time indicator 648 is moved using mouse dragging, touch dragging, multitouched dragging, or applied with a single touch or a multitouch gesture thepage 634 will change to adifferent page 639 revealing the text that should be displayed in the new point oftime 646. In one embodiment of the present invention when theuser 638 drags the portrayed text indicator, in this case the portrayed magnifying glass 637 (using mouse dragging, single touch dragging, multitouch dragging, or touchpad dragging) over the timeline area 647 thepage 634 will change to adifferent page 639 revealing new text that was not visible at the time the user started dragging, when theuser 645 is already in adifferent page 639 and releases the mouse button, finger, or fingers it signals that the dragging is completed. The magnifyingglass 646 will be placed on a new text in thedifferent page 639, setting new reading point for playback of the TTS engine. - Drag gesture may be double tap followed by a drag motion, or a single tap and hold followed by a drag motion.
-
FIGS. 7 a-b illustrates a case were the metaphor used for the text presentation is portrayed as a book, were the book is made of more than one page and may have a time line and other controls such as but not limited to play/stop button, skip forward backward buttons, line indicator in this case the portrayed triangle, text indicator in this case the portrayed magnifying glass, find controls, text size controls, and other navigation controls 702. In one embodiment of the present invention when the user click, double click, touch, multitouch, or apply touch or multitouch gesture on the page flipping some or all of thecontrols 702 on the screen will disappear, an animation will occur and the controls that disappeared will reappear once the animation is done. In a preferred embodiment of the present invention when the user click, double click, touch, multitouch, or apply touch or multitouch gesture on thepage flipping areas 702 some or all of the controls on the screen will disappear, a page flipping animation will occur and the controls that disappeared will reappear once the animation is done. - In some embodiments of the present invention (not shown in the figures) in a book format, when the TTS engine reads the last word of a page the page will turn to a different page and the TTS engine will continue reading from the first word in the new page. In case the text layout is in scrolling format, when the TTS engine reach the last word in the visible screen area or When the TTS engine nearly reach the last word of the visible screen area the text on the screen will scroll revealing new text to be read by the TTS engine and enable the user continuous reading. In some embodiments of the present invention when the TTS engine on stop mode and the user is scrolling the text or flipping to a new page a new reading point will be set automatically in a visible place in the text viewing area. In a preferred embodiment of the present invention when the TTS engine on stop mode and the user is scrolling the text or flipping to a new page a new reading point will be set automatically at the first word of the text that is currently visible in the text viewing area.
-
FIG. 8 . is a flow chart for initializing theSynchronization Unit 204, and Navigation andOrientation Unit 201. In 801 the system calculates the reading time it will take theTTS engine 200 to read through the text. The value of the total reading time will be presented in remainingtime 221. The Navigation and Orientation unit is initialized 802 including display area, word, and line indicators. The time line is also initialized. If bookmarks are present they are rendered 803 for thetimeline 217 representing the bookmarks associated with the entire text. If bookmarks are present they are also rendered 804 for the text presented in thedisplay area 206. Other display parameters are initialized 805 for example elapsed time, current word, current line, font, font size, page number, search string, search results, etc. In one embodiment of the present invention calculating the entiretext reading time 801 is done by multiplying the average time of reading a single character multiplied by the total number of characters of the entire text. In another embodiment of the present invention the average time of reading a single character is depended on the TTS engine reading speed. For example when TTS engine is set to read slow the average character reading time may be 90 ms and when the TTS engine is set to read the text fast the average charter reading time my be 40 ms. -
FIG. 9 is a flow chart showing an example of a Synchronization Unit process. 901 gets the text to be displayed and synchronized with theTTS engine 200. 902 gets “a word is about to start” event including the next word that should be displayed and synthesized by theTTS engine 200. 903 is calculating the display parameters for that word. Display parameters at least one of the following: 1) the word for highlighting in a line, 2) determining the location of the word in a line, 3) determining position of word in timeline. 904 is transferring the word and display parameters to be displayed by the Navigation andOrientation Unit 201. 905 Animating at least one of the following 1) the movement of the word indicator from the previous word to the next word 2) the movement of the line indicator to the following line 3) the current time knob over the time line to the new time representation of the next word. 906 transfer the word to the waveform generation unit 205 to start reading through the word. 907 the Synchronization Unit is waiting for the word to be played. By waiting for the word to be played the system achieves synchronization. 908 goes back to 1002, to get the next word that should be displayed and synchronized with theTTS engine 200. If 902 determines that there are no further words to be displayed and synchronize with theTTS engine 200, the process stops. In a preferred embodiment of the present invention in order to achieve precise synchronization theanimate display 905, will be completed first, and immediately after the word will be synthesized by the waveform generation unit 205. -
FIG. 10 is an example of a detailed flow chart of 903—Calculates display parameters for next word. 1001 obtains access to the entire text. 1002 determines the next word that needs to be displayed and synchronized with theTTS engine 200. 1003 is calculating the highlight of the next word. 1004 is determining if the word is in a new line and therefore theline indicator 208 should be updated for that word. 1005 calculates the respective point in time of that word in the entire reading sequence. The result of that calculation is used for determining the position of theknob 214 over thetimeline 217 and for calculating and displaying the elapsed 220 andreaming remaining time 221. - In a preferred embodiment of the present invention Get
next word 1002 will be triggered by a “word is about to start” event generated by thesynchronization unit 204. - The present invention has been described with a certain degree of particularity, but those versed in the art will readily appreciate that various modifications and alterations may be carried out, without departing from the scope of the following Claims:
Claims (14)
1. A method for outputting a text, comprising
a. indicating read text on a touch screen by portraying a text indicator on the read text;
b. Synchronizing the read text and audio playback of the indicated text.
2. The method according to claim 1 , wherein the synchronization is at word boundary.
3. The method according to claim 1 , wherein the synchronization is at sentence boundary.
4. The method according claim 1 , further comprising providing a scroll indicator for scrolling the text by a user dragging the scroll indicator.
5. The method according to claim 1 , further comprising providing a page flipping indicator for flipping page by means of a user swipe gesture on the flipping indicator.
6. The method according to claim 1 , further comprising displaying the text in a screen layout that portrays a text book.
7. The method according to claim 6 , further comprising removing text controls when portraying flipping of a page in the text book.
8. The method according to claim 1 configured to operate on any of the following devices: IPAD™ IPAD™ IPOD™ IPHONE™ Android™ Kindle™ Nook™
9. A method for outputting a text, comprising
a. Indicating read text on a touch screen by portraying a text indicator on the read text;
b. Applying a swipe gesture by a user touch on the text indicator to start or stop reading the text
c. Synchronizing the read text and audio playback of the indicated text.
10. The method according to claim 9 , wherein the direction of the swipe gesture prescribes the start or stop playback, respectively.
11. A method for outputting a text, comprising
a. Indicating read text on a touch screen by portraying a text indicator on the read text;
b. Dragging the text indicator by a user touch to a different position of the text;
c. Synchronizing the read text starting from the new position and audio playback of the indicated text.
12. The method according to claim 11 , further comprising changing the position of a time indicator to reflect the text that already been processed up to the new position, wherein the time indicator indicating on the proportion of the text that has already been processed compared to the entire text passage for reading.
13. A method according to claim 11 , comprising:
a. Indicating read text on a touch screen by portraying a text indicator on the read text;
b. Calculating the entire reading time of a text by being proportionally to multiplying the average time required to read a character by the total characters in the text;
c. Portraying a time indicator to reflect the text that already been processed up to the new position, wherein the time indicator indicating on the proportion of the text that has already been processed compared to said calculated entire reading time.
14. The method according to claim 13 , wherein said average time required to read a character is configurable according to the desired text playback rate.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/012,989 US20110184738A1 (en) | 2010-01-25 | 2011-01-25 | Navigation and orientation tools for speech synthesis |
US15/678,615 US10649726B2 (en) | 2010-01-25 | 2017-08-16 | Navigation and orientation tools for speech synthesis |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US29792110P | 2010-01-25 | 2010-01-25 | |
US34757510P | 2010-05-24 | 2010-05-24 | |
US13/012,989 US20110184738A1 (en) | 2010-01-25 | 2011-01-25 | Navigation and orientation tools for speech synthesis |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/678,615 Continuation US10649726B2 (en) | 2010-01-25 | 2017-08-16 | Navigation and orientation tools for speech synthesis |
Publications (1)
Publication Number | Publication Date |
---|---|
US20110184738A1 true US20110184738A1 (en) | 2011-07-28 |
Family
ID=44309628
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/012,989 Abandoned US20110184738A1 (en) | 2010-01-25 | 2011-01-25 | Navigation and orientation tools for speech synthesis |
US15/678,615 Active US10649726B2 (en) | 2010-01-25 | 2017-08-16 | Navigation and orientation tools for speech synthesis |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/678,615 Active US10649726B2 (en) | 2010-01-25 | 2017-08-16 | Navigation and orientation tools for speech synthesis |
Country Status (1)
Country | Link |
---|---|
US (2) | US20110184738A1 (en) |
Cited By (54)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110320206A1 (en) * | 2010-06-29 | 2011-12-29 | Hon Hai Precision Industry Co., Ltd. | Electronic book reader and text to speech converting method |
US20130138443A1 (en) * | 2010-08-24 | 2013-05-30 | Call Gate Co., Ltd. | Voice-screen ars service system, method for providing same, and computer-readable recording medium |
US20130145290A1 (en) * | 2011-12-06 | 2013-06-06 | Google Inc. | Mechanism for switching between document viewing windows |
US20130232413A1 (en) * | 2012-03-02 | 2013-09-05 | Samsung Electronics Co. Ltd. | System and method for operating memo function cooperating with audio recording function |
US20140013192A1 (en) * | 2012-07-09 | 2014-01-09 | Sas Institute Inc. | Techniques for touch-based digital document audio and user interface enhancement |
US20140012583A1 (en) * | 2012-07-06 | 2014-01-09 | Samsung Electronics Co. Ltd. | Method and apparatus for recording and playing user voice in mobile terminal |
US8948892B2 (en) | 2011-03-23 | 2015-02-03 | Audible, Inc. | Managing playback of synchronized content |
US9003325B2 (en) | 2012-09-07 | 2015-04-07 | Google Inc. | Stackable workspaces on an electronic device |
US20150112465A1 (en) * | 2013-10-22 | 2015-04-23 | Joseph Michael Quinn | Method and Apparatus for On-Demand Conversion and Delivery of Selected Electronic Content to a Designated Mobile Device for Audio Consumption |
CN104756484A (en) * | 2012-11-01 | 2015-07-01 | 索尼公司 | Information processing device, reproduction state control method, and program |
US9075760B2 (en) | 2012-05-07 | 2015-07-07 | Audible, Inc. | Narration settings distribution for content customization |
US9099089B2 (en) | 2012-08-02 | 2015-08-04 | Audible, Inc. | Identifying corresponding regions of content |
US20150243294A1 (en) * | 2012-10-31 | 2015-08-27 | Nec Casio Mobile Communications, Ltd. | Playback apparatus, setting apparatus, playback method, and program |
US9141257B1 (en) | 2012-06-18 | 2015-09-22 | Audible, Inc. | Selecting and conveying supplemental content |
US20150362991A1 (en) * | 2014-06-11 | 2015-12-17 | Drivemode, Inc. | Graphical user interface for non-foveal vision |
US9223830B1 (en) | 2012-10-26 | 2015-12-29 | Audible, Inc. | Content presentation analysis |
US9264475B2 (en) | 2012-12-31 | 2016-02-16 | Sonic Ip, Inc. | Use of objective quality measures of streamed content to reduce streaming bandwidth |
US9280906B2 (en) | 2013-02-04 | 2016-03-08 | Audible. Inc. | Prompting a user for input during a synchronous presentation of audio content and textual content |
US9313510B2 (en) | 2012-12-31 | 2016-04-12 | Sonic Ip, Inc. | Use of objective quality measures of streamed content to reduce streaming bandwidth |
US9317486B1 (en) | 2013-06-07 | 2016-04-19 | Audible, Inc. | Synchronizing playback of digital content with captured physical content |
US9317500B2 (en) | 2012-05-30 | 2016-04-19 | Audible, Inc. | Synchronizing translated digital content |
US9367196B1 (en) | 2012-09-26 | 2016-06-14 | Audible, Inc. | Conveying branched content |
US9472113B1 (en) | 2013-02-05 | 2016-10-18 | Audible, Inc. | Synchronizing playback of digital content with physical content |
US9489360B2 (en) | 2013-09-05 | 2016-11-08 | Audible, Inc. | Identifying extra material in companion content |
US9536439B1 (en) | 2012-06-27 | 2017-01-03 | Audible, Inc. | Conveying questions with content |
CN106557296A (en) * | 2015-09-28 | 2017-04-05 | 百度在线网络技术(北京)有限公司 | Method and apparatus for obtaining acoustic information |
US9621522B2 (en) | 2011-09-01 | 2017-04-11 | Sonic Ip, Inc. | Systems and methods for playing back alternative streams of protected content protected using common cryptographic information |
US9632647B1 (en) * | 2012-10-09 | 2017-04-25 | Audible, Inc. | Selecting presentation positions in dynamic content |
US9679608B2 (en) | 2012-06-28 | 2017-06-13 | Audible, Inc. | Pacing content |
US9703781B2 (en) | 2011-03-23 | 2017-07-11 | Audible, Inc. | Managing related digital content |
US9712890B2 (en) | 2013-05-30 | 2017-07-18 | Sonic Ip, Inc. | Network video streaming with trick play based on separate trick play files |
US9734153B2 (en) | 2011-03-23 | 2017-08-15 | Audible, Inc. | Managing related digital content |
US9792027B2 (en) | 2011-03-23 | 2017-10-17 | Audible, Inc. | Managing playback of synchronized content |
US9866878B2 (en) | 2014-04-05 | 2018-01-09 | Sonic Ip, Inc. | Systems and methods for encoding and playing back video at different frame rates using enhancement layers |
US9883204B2 (en) | 2011-01-05 | 2018-01-30 | Sonic Ip, Inc. | Systems and methods for encoding source media in matroska container files for adaptive bitrate streaming using hypertext transfer protocol |
US9906785B2 (en) | 2013-03-15 | 2018-02-27 | Sonic Ip, Inc. | Systems, methods, and media for transcoding video data according to encoding parameters indicated by received metadata |
US9967305B2 (en) | 2013-06-28 | 2018-05-08 | Divx, Llc | Systems, methods, and media for streaming media content |
US20180143800A1 (en) * | 2016-11-22 | 2018-05-24 | Microsoft Technology Licensing, Llc | Controls for dictated text navigation |
US10212486B2 (en) | 2009-12-04 | 2019-02-19 | Divx, Llc | Elementary bitstream cryptographic material transport systems and methods |
US10225299B2 (en) | 2012-12-31 | 2019-03-05 | Divx, Llc | Systems, methods, and media for controlling delivery of content |
US10397292B2 (en) | 2013-03-15 | 2019-08-27 | Divx, Llc | Systems, methods, and media for delivery of content |
US10437896B2 (en) | 2009-01-07 | 2019-10-08 | Divx, Llc | Singular, collective, and automated creation of a media guide for online content |
US20190361975A1 (en) * | 2018-05-22 | 2019-11-28 | Microsoft Technology Licensing, Llc | Phrase-level abbreviated text entry and translation |
US10498795B2 (en) | 2017-02-17 | 2019-12-03 | Divx, Llc | Systems and methods for adaptive switching between multiple content delivery networks during adaptive bitrate streaming |
US10664658B2 (en) | 2018-08-23 | 2020-05-26 | Microsoft Technology Licensing, Llc | Abbreviated handwritten entry translation |
US10687095B2 (en) | 2011-09-01 | 2020-06-16 | Divx, Llc | Systems and methods for saving encoded media streamed using adaptive bitrate streaming |
US20200196006A1 (en) * | 2018-12-14 | 2020-06-18 | Orange | Spatio-temporal navigation of content |
US10878065B2 (en) | 2006-03-14 | 2020-12-29 | Divx, Llc | Federated digital rights management scheme including trusted systems |
WO2021096507A1 (en) | 2019-11-14 | 2021-05-20 | Google Llc | Automatic audio playback of displayed textual content |
DE102021100581A1 (en) | 2021-01-13 | 2022-07-14 | Bayerische Motoren Werke Aktiengesellschaft | conveying textual information |
US11457054B2 (en) | 2011-08-30 | 2022-09-27 | Divx, Llc | Selection of resolutions for seamless resolution switching of multimedia content |
IT202100025436A1 (en) * | 2021-10-06 | 2023-04-06 | D D Innovation Srl | System and method for displaying an electronic text and controlling the playback of at least one synchronized soundtrack |
CN116052671A (en) * | 2022-11-21 | 2023-05-02 | 深圳市东象设计有限公司 | Intelligent translator and translation method |
KR102717737B1 (en) * | 2019-11-14 | 2024-10-16 | 구글 엘엘씨 | Automatic audio playback of displayed textual content |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110184738A1 (en) * | 2010-01-25 | 2011-07-28 | Kalisky Dror | Navigation and orientation tools for speech synthesis |
Citations (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5463725A (en) * | 1992-12-31 | 1995-10-31 | International Business Machines Corp. | Data processing system graphical user interface which emulates printed material |
US5697793A (en) * | 1995-12-14 | 1997-12-16 | Motorola, Inc. | Electronic book and method of displaying at least one reading metric therefor |
US5893132A (en) * | 1995-12-14 | 1999-04-06 | Motorola, Inc. | Method and system for encoding a book for reading using an electronic book |
US6115482A (en) * | 1996-02-13 | 2000-09-05 | Ascent Technology, Inc. | Voice-output reading system with gesture-based navigation |
US20020019950A1 (en) * | 1997-11-26 | 2002-02-14 | Huffman James R. | System for inhibiting the operation of an electronic device during take-off and landing of an aircraft |
US20020099552A1 (en) * | 2001-01-25 | 2002-07-25 | Darryl Rubin | Annotating electronic information with audio clips |
US20020133349A1 (en) * | 2001-03-16 | 2002-09-19 | Barile Steven E. | Matching a synthetic disc jockey's voice characteristics to the sound characteristics of audio programs |
US20030014674A1 (en) * | 2001-07-10 | 2003-01-16 | Huffman James R. | Method and electronic book for marking a page in a book |
US20030229494A1 (en) * | 2002-04-17 | 2003-12-11 | Peter Rutten | Method and apparatus for sculpting synthesized speech |
US6871107B1 (en) * | 1999-07-01 | 2005-03-22 | Ftr Pty, Ltd. | Digital audio transcription system |
US7174295B1 (en) * | 1999-09-06 | 2007-02-06 | Nokia Corporation | User interface for text to speech conversion |
US20070083828A1 (en) * | 2005-06-15 | 2007-04-12 | Nintendo Co., Ltd. | Information processing program and information processing apparatus |
US20080086303A1 (en) * | 2006-09-15 | 2008-04-10 | Yahoo! Inc. | Aural skimming and scrolling |
US20080122796A1 (en) * | 2006-09-06 | 2008-05-29 | Jobs Steven P | Touch Screen Device, Method, and Graphical User Interface for Determining Commands by Applying Heuristics |
US20080168349A1 (en) * | 2007-01-07 | 2008-07-10 | Lamiraux Henri C | Portable Electronic Device, Method, and Graphical User Interface for Displaying Electronic Documents and Lists |
US20080228590A1 (en) * | 2007-03-13 | 2008-09-18 | Byron Johnson | System and method for providing an online book synopsis |
US20080256479A1 (en) * | 2000-09-07 | 2008-10-16 | Virtual Publishing Company Ltd. | Electronic publication and methods and components thereof |
US20090202226A1 (en) * | 2005-06-06 | 2009-08-13 | Texthelp Systems, Ltd. | System and method for converting electronic text to a digital multimedia electronic book |
US20090239202A1 (en) * | 2006-11-13 | 2009-09-24 | Stone Joyce S | Systems and methods for providing an electronic reader having interactive and educational features |
US20090319265A1 (en) * | 2008-06-18 | 2009-12-24 | Andreas Wittenstein | Method and system for efficient pacing of speech for transription |
US20100324902A1 (en) * | 2009-01-15 | 2010-12-23 | K-Nfb Reading Technology, Inc. | Systems and Methods Document Narration |
US20100324905A1 (en) * | 2009-01-15 | 2010-12-23 | K-Nfb Reading Technology, Inc. | Voice models for document narration |
US20110050594A1 (en) * | 2009-09-02 | 2011-03-03 | Kim John T | Touch-Screen User Interface |
US20110153047A1 (en) * | 2008-07-04 | 2011-06-23 | Booktrack Holdings Limited | Method and System for Making and Playing Soundtracks |
US20110208614A1 (en) * | 2010-02-24 | 2011-08-25 | Gm Global Technology Operations, Inc. | Methods and apparatus for synchronized electronic book payment, storage, download, listening, and reading |
US20110288861A1 (en) * | 2010-05-18 | 2011-11-24 | K-NFB Technology, Inc. | Audio Synchronization For Document Narration with User-Selected Playback |
US20120046947A1 (en) * | 2010-08-18 | 2012-02-23 | Fleizach Christopher B | Assisted Reader |
US20120245720A1 (en) * | 2011-03-23 | 2012-09-27 | Story Jr Guy A | Managing playback of synchronized content |
US20120311438A1 (en) * | 2010-01-11 | 2012-12-06 | Apple Inc. | Electronic text manipulation and display |
US8793575B1 (en) * | 2007-03-29 | 2014-07-29 | Amazon Technologies, Inc. | Progress indication for a digital work |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5657426A (en) * | 1994-06-10 | 1997-08-12 | Digital Equipment Corporation | Method and apparatus for producing audio-visual synthetic speech |
WO2001096264A1 (en) * | 2000-06-12 | 2001-12-20 | Conoco Inc. | Fischer-tropsch processes and catalysts using polyacrylate matrix structures |
JP3884951B2 (en) * | 2001-12-14 | 2007-02-21 | キヤノン株式会社 | Information processing apparatus and method, and program |
KR100463655B1 (en) * | 2002-11-15 | 2004-12-29 | 삼성전자주식회사 | Text-to-speech conversion apparatus and method having function of offering additional information |
EP1768630B1 (en) * | 2004-06-16 | 2015-01-07 | Machine Solutions, Inc. | Stent crimping device |
JP5259050B2 (en) * | 2005-03-30 | 2013-08-07 | 京セラ株式会社 | Character information display device with speech synthesis function, speech synthesis method thereof, and speech synthesis program |
US7761789B2 (en) * | 2006-01-13 | 2010-07-20 | Ricoh Company, Ltd. | Methods for computing a navigation path |
US8584042B2 (en) * | 2007-03-21 | 2013-11-12 | Ricoh Co., Ltd. | Methods for scanning, printing, and copying multimedia thumbnails |
US8812969B2 (en) * | 2007-03-21 | 2014-08-19 | Ricoh Co., Ltd. | Methods for authoring and interacting with multimedia representations of documents |
US8346049B2 (en) * | 2007-05-21 | 2013-01-01 | Casio Hitachi Mobile Communications Co., Ltd. | Captioned video playback apparatus and recording medium |
US8751562B2 (en) * | 2009-04-24 | 2014-06-10 | Voxx International Corporation | Systems and methods for pre-rendering an audio representation of textual content for subsequent playback |
US20110184738A1 (en) * | 2010-01-25 | 2011-07-28 | Kalisky Dror | Navigation and orientation tools for speech synthesis |
JP5728913B2 (en) * | 2010-12-02 | 2015-06-03 | ヤマハ株式会社 | Speech synthesis information editing apparatus and program |
-
2011
- 2011-01-25 US US13/012,989 patent/US20110184738A1/en not_active Abandoned
-
2017
- 2017-08-16 US US15/678,615 patent/US10649726B2/en active Active
Patent Citations (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5463725A (en) * | 1992-12-31 | 1995-10-31 | International Business Machines Corp. | Data processing system graphical user interface which emulates printed material |
US5697793A (en) * | 1995-12-14 | 1997-12-16 | Motorola, Inc. | Electronic book and method of displaying at least one reading metric therefor |
US5893132A (en) * | 1995-12-14 | 1999-04-06 | Motorola, Inc. | Method and system for encoding a book for reading using an electronic book |
US6115482A (en) * | 1996-02-13 | 2000-09-05 | Ascent Technology, Inc. | Voice-output reading system with gesture-based navigation |
US20020019950A1 (en) * | 1997-11-26 | 2002-02-14 | Huffman James R. | System for inhibiting the operation of an electronic device during take-off and landing of an aircraft |
US6871107B1 (en) * | 1999-07-01 | 2005-03-22 | Ftr Pty, Ltd. | Digital audio transcription system |
US7974715B2 (en) * | 1999-07-01 | 2011-07-05 | Ftr Pty, Ltd. | Audio/video transcription system |
US7174295B1 (en) * | 1999-09-06 | 2007-02-06 | Nokia Corporation | User interface for text to speech conversion |
US20080256479A1 (en) * | 2000-09-07 | 2008-10-16 | Virtual Publishing Company Ltd. | Electronic publication and methods and components thereof |
US20020099552A1 (en) * | 2001-01-25 | 2002-07-25 | Darryl Rubin | Annotating electronic information with audio clips |
US20020133349A1 (en) * | 2001-03-16 | 2002-09-19 | Barile Steven E. | Matching a synthetic disc jockey's voice characteristics to the sound characteristics of audio programs |
US20030014674A1 (en) * | 2001-07-10 | 2003-01-16 | Huffman James R. | Method and electronic book for marking a page in a book |
US20030229494A1 (en) * | 2002-04-17 | 2003-12-11 | Peter Rutten | Method and apparatus for sculpting synthesized speech |
US20090202226A1 (en) * | 2005-06-06 | 2009-08-13 | Texthelp Systems, Ltd. | System and method for converting electronic text to a digital multimedia electronic book |
US20070083828A1 (en) * | 2005-06-15 | 2007-04-12 | Nintendo Co., Ltd. | Information processing program and information processing apparatus |
US20080122796A1 (en) * | 2006-09-06 | 2008-05-29 | Jobs Steven P | Touch Screen Device, Method, and Graphical User Interface for Determining Commands by Applying Heuristics |
US20080086303A1 (en) * | 2006-09-15 | 2008-04-10 | Yahoo! Inc. | Aural skimming and scrolling |
US20090239202A1 (en) * | 2006-11-13 | 2009-09-24 | Stone Joyce S | Systems and methods for providing an electronic reader having interactive and educational features |
US20080168349A1 (en) * | 2007-01-07 | 2008-07-10 | Lamiraux Henri C | Portable Electronic Device, Method, and Graphical User Interface for Displaying Electronic Documents and Lists |
US20080228590A1 (en) * | 2007-03-13 | 2008-09-18 | Byron Johnson | System and method for providing an online book synopsis |
US8793575B1 (en) * | 2007-03-29 | 2014-07-29 | Amazon Technologies, Inc. | Progress indication for a digital work |
US20090319265A1 (en) * | 2008-06-18 | 2009-12-24 | Andreas Wittenstein | Method and system for efficient pacing of speech for transription |
US20110153047A1 (en) * | 2008-07-04 | 2011-06-23 | Booktrack Holdings Limited | Method and System for Making and Playing Soundtracks |
US20100324902A1 (en) * | 2009-01-15 | 2010-12-23 | K-Nfb Reading Technology, Inc. | Systems and Methods Document Narration |
US20100324905A1 (en) * | 2009-01-15 | 2010-12-23 | K-Nfb Reading Technology, Inc. | Voice models for document narration |
US20110050594A1 (en) * | 2009-09-02 | 2011-03-03 | Kim John T | Touch-Screen User Interface |
US20120311438A1 (en) * | 2010-01-11 | 2012-12-06 | Apple Inc. | Electronic text manipulation and display |
US20110208614A1 (en) * | 2010-02-24 | 2011-08-25 | Gm Global Technology Operations, Inc. | Methods and apparatus for synchronized electronic book payment, storage, download, listening, and reading |
US8103554B2 (en) * | 2010-02-24 | 2012-01-24 | GM Global Technology Operations LLC | Method and system for playing an electronic book using an electronics system in a vehicle |
US20110288861A1 (en) * | 2010-05-18 | 2011-11-24 | K-NFB Technology, Inc. | Audio Synchronization For Document Narration with User-Selected Playback |
US20120046947A1 (en) * | 2010-08-18 | 2012-02-23 | Fleizach Christopher B | Assisted Reader |
US20120245720A1 (en) * | 2011-03-23 | 2012-09-27 | Story Jr Guy A | Managing playback of synchronized content |
Non-Patent Citations (1)
Title |
---|
Keller, et al. "A serial prediction component for speech timing." Speech and Signals. Aspects of Speech Synthesis and Automatic Speech Recognition, 2000, pp. 41-49. * |
Cited By (99)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11886545B2 (en) | 2006-03-14 | 2024-01-30 | Divx, Llc | Federated digital rights management scheme including trusted systems |
US10878065B2 (en) | 2006-03-14 | 2020-12-29 | Divx, Llc | Federated digital rights management scheme including trusted systems |
US10437896B2 (en) | 2009-01-07 | 2019-10-08 | Divx, Llc | Singular, collective, and automated creation of a media guide for online content |
US10212486B2 (en) | 2009-12-04 | 2019-02-19 | Divx, Llc | Elementary bitstream cryptographic material transport systems and methods |
US10484749B2 (en) | 2009-12-04 | 2019-11-19 | Divx, Llc | Systems and methods for secure playback of encrypted elementary bitstreams |
US11102553B2 (en) | 2009-12-04 | 2021-08-24 | Divx, Llc | Systems and methods for secure playback of encrypted elementary bitstreams |
US20110320206A1 (en) * | 2010-06-29 | 2011-12-29 | Hon Hai Precision Industry Co., Ltd. | Electronic book reader and text to speech converting method |
US20130138443A1 (en) * | 2010-08-24 | 2013-05-30 | Call Gate Co., Ltd. | Voice-screen ars service system, method for providing same, and computer-readable recording medium |
US10368096B2 (en) | 2011-01-05 | 2019-07-30 | Divx, Llc | Adaptive streaming systems and methods for performing trick play |
US11638033B2 (en) | 2011-01-05 | 2023-04-25 | Divx, Llc | Systems and methods for performing adaptive bitrate streaming |
US10382785B2 (en) | 2011-01-05 | 2019-08-13 | Divx, Llc | Systems and methods of encoding trick play streams for use in adaptive streaming |
US9883204B2 (en) | 2011-01-05 | 2018-01-30 | Sonic Ip, Inc. | Systems and methods for encoding source media in matroska container files for adaptive bitrate streaming using hypertext transfer protocol |
US9734153B2 (en) | 2011-03-23 | 2017-08-15 | Audible, Inc. | Managing related digital content |
US9792027B2 (en) | 2011-03-23 | 2017-10-17 | Audible, Inc. | Managing playback of synchronized content |
US8948892B2 (en) | 2011-03-23 | 2015-02-03 | Audible, Inc. | Managing playback of synchronized content |
US9703781B2 (en) | 2011-03-23 | 2017-07-11 | Audible, Inc. | Managing related digital content |
US11457054B2 (en) | 2011-08-30 | 2022-09-27 | Divx, Llc | Selection of resolutions for seamless resolution switching of multimedia content |
US10687095B2 (en) | 2011-09-01 | 2020-06-16 | Divx, Llc | Systems and methods for saving encoded media streamed using adaptive bitrate streaming |
US10244272B2 (en) | 2011-09-01 | 2019-03-26 | Divx, Llc | Systems and methods for playing back alternative streams of protected content protected using common cryptographic information |
US10225588B2 (en) | 2011-09-01 | 2019-03-05 | Divx, Llc | Playback devices and methods for playing back alternative streams of content protected using a common set of cryptographic keys |
US10341698B2 (en) | 2011-09-01 | 2019-07-02 | Divx, Llc | Systems and methods for distributing content using a common set of encryption keys |
US9621522B2 (en) | 2011-09-01 | 2017-04-11 | Sonic Ip, Inc. | Systems and methods for playing back alternative streams of protected content protected using common cryptographic information |
US10856020B2 (en) | 2011-09-01 | 2020-12-01 | Divx, Llc | Systems and methods for distributing content using a common set of encryption keys |
US11178435B2 (en) | 2011-09-01 | 2021-11-16 | Divx, Llc | Systems and methods for saving encoded media streamed using adaptive bitrate streaming |
US11683542B2 (en) | 2011-09-01 | 2023-06-20 | Divx, Llc | Systems and methods for distributing content using a common set of encryption keys |
US20130145290A1 (en) * | 2011-12-06 | 2013-06-06 | Google Inc. | Mechanism for switching between document viewing windows |
US9645733B2 (en) * | 2011-12-06 | 2017-05-09 | Google Inc. | Mechanism for switching between document viewing windows |
US10007403B2 (en) * | 2012-03-02 | 2018-06-26 | Samsung Electronics Co., Ltd. | System and method for operating memo function cooperating with audio recording function |
EP2634773B1 (en) * | 2012-03-02 | 2020-12-30 | Samsung Electronics Co., Ltd | System and method for operating memo function cooperating with audio recording function |
US20130232413A1 (en) * | 2012-03-02 | 2013-09-05 | Samsung Electronics Co. Ltd. | System and method for operating memo function cooperating with audio recording function |
US9075760B2 (en) | 2012-05-07 | 2015-07-07 | Audible, Inc. | Narration settings distribution for content customization |
US9317500B2 (en) | 2012-05-30 | 2016-04-19 | Audible, Inc. | Synchronizing translated digital content |
US9141257B1 (en) | 2012-06-18 | 2015-09-22 | Audible, Inc. | Selecting and conveying supplemental content |
US9536439B1 (en) | 2012-06-27 | 2017-01-03 | Audible, Inc. | Conveying questions with content |
US9679608B2 (en) | 2012-06-28 | 2017-06-13 | Audible, Inc. | Pacing content |
US9786267B2 (en) * | 2012-07-06 | 2017-10-10 | Samsung Electronics Co., Ltd. | Method and apparatus for recording and playing user voice in mobile terminal by synchronizing with text |
US20140012583A1 (en) * | 2012-07-06 | 2014-01-09 | Samsung Electronics Co. Ltd. | Method and apparatus for recording and playing user voice in mobile terminal |
US20140013192A1 (en) * | 2012-07-09 | 2014-01-09 | Sas Institute Inc. | Techniques for touch-based digital document audio and user interface enhancement |
US10109278B2 (en) | 2012-08-02 | 2018-10-23 | Audible, Inc. | Aligning body matter across content formats |
US9799336B2 (en) | 2012-08-02 | 2017-10-24 | Audible, Inc. | Identifying corresponding regions of content |
US9099089B2 (en) | 2012-08-02 | 2015-08-04 | Audible, Inc. | Identifying corresponding regions of content |
US9003325B2 (en) | 2012-09-07 | 2015-04-07 | Google Inc. | Stackable workspaces on an electronic device |
US9639244B2 (en) | 2012-09-07 | 2017-05-02 | Google Inc. | Systems and methods for handling stackable workspaces |
US9696879B2 (en) | 2012-09-07 | 2017-07-04 | Google Inc. | Tab scrubbing using navigation gestures |
US9367196B1 (en) | 2012-09-26 | 2016-06-14 | Audible, Inc. | Conveying branched content |
US9632647B1 (en) * | 2012-10-09 | 2017-04-25 | Audible, Inc. | Selecting presentation positions in dynamic content |
US9223830B1 (en) | 2012-10-26 | 2015-12-29 | Audible, Inc. | Content presentation analysis |
US20150243294A1 (en) * | 2012-10-31 | 2015-08-27 | Nec Casio Mobile Communications, Ltd. | Playback apparatus, setting apparatus, playback method, and program |
US9728201B2 (en) * | 2012-10-31 | 2017-08-08 | Nec Corporation | Playback apparatus, setting apparatus, playback method, and program |
US9761277B2 (en) * | 2012-11-01 | 2017-09-12 | Sony Corporation | Playback state control by position change detection |
US20150248919A1 (en) * | 2012-11-01 | 2015-09-03 | Sony Corporation | Information processing apparatus, playback state controlling method, and program |
CN104756484A (en) * | 2012-11-01 | 2015-07-01 | 索尼公司 | Information processing device, reproduction state control method, and program |
USRE49990E1 (en) | 2012-12-31 | 2024-05-28 | Divx, Llc | Use of objective quality measures of streamed content to reduce streaming bandwidth |
US10805368B2 (en) | 2012-12-31 | 2020-10-13 | Divx, Llc | Systems, methods, and media for controlling delivery of content |
US9264475B2 (en) | 2012-12-31 | 2016-02-16 | Sonic Ip, Inc. | Use of objective quality measures of streamed content to reduce streaming bandwidth |
US11785066B2 (en) | 2012-12-31 | 2023-10-10 | Divx, Llc | Systems, methods, and media for controlling delivery of content |
US9313510B2 (en) | 2012-12-31 | 2016-04-12 | Sonic Ip, Inc. | Use of objective quality measures of streamed content to reduce streaming bandwidth |
US11438394B2 (en) | 2012-12-31 | 2022-09-06 | Divx, Llc | Systems, methods, and media for controlling delivery of content |
US10225299B2 (en) | 2012-12-31 | 2019-03-05 | Divx, Llc | Systems, methods, and media for controlling delivery of content |
USRE48761E1 (en) | 2012-12-31 | 2021-09-28 | Divx, Llc | Use of objective quality measures of streamed content to reduce streaming bandwidth |
US9280906B2 (en) | 2013-02-04 | 2016-03-08 | Audible. Inc. | Prompting a user for input during a synchronous presentation of audio content and textual content |
US9472113B1 (en) | 2013-02-05 | 2016-10-18 | Audible, Inc. | Synchronizing playback of digital content with physical content |
US10715806B2 (en) | 2013-03-15 | 2020-07-14 | Divx, Llc | Systems, methods, and media for transcoding video data |
US10397292B2 (en) | 2013-03-15 | 2019-08-27 | Divx, Llc | Systems, methods, and media for delivery of content |
US10264255B2 (en) | 2013-03-15 | 2019-04-16 | Divx, Llc | Systems, methods, and media for transcoding video data |
US9906785B2 (en) | 2013-03-15 | 2018-02-27 | Sonic Ip, Inc. | Systems, methods, and media for transcoding video data according to encoding parameters indicated by received metadata |
US11849112B2 (en) | 2013-03-15 | 2023-12-19 | Divx, Llc | Systems, methods, and media for distributed transcoding video data |
US10462537B2 (en) | 2013-05-30 | 2019-10-29 | Divx, Llc | Network video streaming with trick play based on separate trick play files |
US9712890B2 (en) | 2013-05-30 | 2017-07-18 | Sonic Ip, Inc. | Network video streaming with trick play based on separate trick play files |
US9317486B1 (en) | 2013-06-07 | 2016-04-19 | Audible, Inc. | Synchronizing playback of digital content with captured physical content |
US9967305B2 (en) | 2013-06-28 | 2018-05-08 | Divx, Llc | Systems, methods, and media for streaming media content |
US9489360B2 (en) | 2013-09-05 | 2016-11-08 | Audible, Inc. | Identifying extra material in companion content |
US20150112465A1 (en) * | 2013-10-22 | 2015-04-23 | Joseph Michael Quinn | Method and Apparatus for On-Demand Conversion and Delivery of Selected Electronic Content to a Designated Mobile Device for Audio Consumption |
US9866878B2 (en) | 2014-04-05 | 2018-01-09 | Sonic Ip, Inc. | Systems and methods for encoding and playing back video at different frame rates using enhancement layers |
US11711552B2 (en) | 2014-04-05 | 2023-07-25 | Divx, Llc | Systems and methods for encoding and playing back video at different frame rates using enhancement layers |
US10893305B2 (en) | 2014-04-05 | 2021-01-12 | Divx, Llc | Systems and methods for encoding and playing back video at different frame rates using enhancement layers |
US10321168B2 (en) | 2014-04-05 | 2019-06-11 | Divx, Llc | Systems and methods for encoding and playing back video at different frame rates using enhancement layers |
US10488922B2 (en) | 2014-06-11 | 2019-11-26 | Drivemode, Inc. | Graphical user interface for non-foveal vision |
US20150362991A1 (en) * | 2014-06-11 | 2015-12-17 | Drivemode, Inc. | Graphical user interface for non-foveal vision |
US9898079B2 (en) * | 2014-06-11 | 2018-02-20 | Drivemode, Inc. | Graphical user interface for non-foveal vision |
CN106557296A (en) * | 2015-09-28 | 2017-04-05 | 百度在线网络技术(北京)有限公司 | Method and apparatus for obtaining acoustic information |
US20180143800A1 (en) * | 2016-11-22 | 2018-05-24 | Microsoft Technology Licensing, Llc | Controls for dictated text navigation |
CN109983432A (en) * | 2016-11-22 | 2019-07-05 | 微软技术许可有限责任公司 | Control for dictated text navigation |
US11343300B2 (en) | 2017-02-17 | 2022-05-24 | Divx, Llc | Systems and methods for adaptive switching between multiple content delivery networks during adaptive bitrate streaming |
US10498795B2 (en) | 2017-02-17 | 2019-12-03 | Divx, Llc | Systems and methods for adaptive switching between multiple content delivery networks during adaptive bitrate streaming |
US20190361975A1 (en) * | 2018-05-22 | 2019-11-28 | Microsoft Technology Licensing, Llc | Phrase-level abbreviated text entry and translation |
US10699074B2 (en) * | 2018-05-22 | 2020-06-30 | Microsoft Technology Licensing, Llc | Phrase-level abbreviated text entry and translation |
US10664658B2 (en) | 2018-08-23 | 2020-05-26 | Microsoft Technology Licensing, Llc | Abbreviated handwritten entry translation |
US12096068B2 (en) * | 2018-12-14 | 2024-09-17 | Orange | Spatio-temporal navigation of content of different types with synchronous displacement of playback position indicators |
US20200196006A1 (en) * | 2018-12-14 | 2020-06-18 | Orange | Spatio-temporal navigation of content |
JP2022510528A (en) * | 2019-11-14 | 2022-01-27 | グーグル エルエルシー | Automatic audio playback of displayed text content |
JP7395505B2 (en) | 2019-11-14 | 2023-12-11 | グーグル エルエルシー | Automatic audio playback of displayed text content |
US11887581B2 (en) | 2019-11-14 | 2024-01-30 | Google Llc | Automatic audio playback of displayed textual content |
WO2021096507A1 (en) | 2019-11-14 | 2021-05-20 | Google Llc | Automatic audio playback of displayed textual content |
EP3841458B1 (en) * | 2019-11-14 | 2024-07-03 | Google LLC | Automatic audio playback of displayed textual content |
KR102717737B1 (en) * | 2019-11-14 | 2024-10-16 | 구글 엘엘씨 | Automatic audio playback of displayed textual content |
DE102021100581A1 (en) | 2021-01-13 | 2022-07-14 | Bayerische Motoren Werke Aktiengesellschaft | conveying textual information |
IT202100025436A1 (en) * | 2021-10-06 | 2023-04-06 | D D Innovation Srl | System and method for displaying an electronic text and controlling the playback of at least one synchronized soundtrack |
CN116052671A (en) * | 2022-11-21 | 2023-05-02 | 深圳市东象设计有限公司 | Intelligent translator and translation method |
Also Published As
Publication number | Publication date |
---|---|
US10649726B2 (en) | 2020-05-12 |
US20180032309A1 (en) | 2018-02-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10649726B2 (en) | Navigation and orientation tools for speech synthesis | |
AU2021269318B2 (en) | Devices, methods, and graphical user interfaces for messaging | |
EP3376358B1 (en) | Devices, methods, and graphical user interfaces for messaging | |
US9552015B2 (en) | Device, method, and graphical user interface for navigating through an electronic document | |
US9009612B2 (en) | Devices, methods, and graphical user interfaces for accessibility using a touch-sensitive surface | |
DK179747B1 (en) | Devices, methods and graphical user interfaces for messaging | |
US20120311508A1 (en) | Devices, Methods, and Graphical User Interfaces for Providing Accessibility Using a Touch-Sensitive Surface | |
US20120327009A1 (en) | Devices, methods, and graphical user interfaces for accessibility using a touch-sensitive surface | |
JP2013536528A (en) | How to create and navigate link-based multimedia | |
US20150324074A1 (en) | Digital Book Graphical Navigator | |
US20150067489A1 (en) | Zoomable pages for continuous digital writing | |
US9087508B1 (en) | Presenting representative content portions during content navigation | |
CN104102438A (en) | Chinese Pinyin input method based on touch screen equipment | |
KR20170009487A (en) | Chunk-based language learning method and electronic device to do this | |
KR20120123752A (en) | Method of displaying text data in information terminal | |
Kajastila et al. | Funkyplayer: Integrating Auditory and Visual Menus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |