Long running transcription using webgpu-whisper #802

iamhitarth · 2024-06-10T16:44:01Z

Question

Noob question - the webgpu-whisper demo does real time transcription, however it doesn't build out a full transcript from the start ie. 2 mins into transcription, the first few transcribed lines disappear.

Transcript at time x 👇

Cool, let's test this out. We'll see how this works. So turns out that the transcription when I try to access it is actually just empty. And so the only thing that actually comes through is. So yeah, so the output that's getting cut is basically coming from the

Transcript at time x+1 👇

this out, we'll see how this works. So turns out that the transcription when I try to access it is actually just empty. And so the only thing that actually comes through is. So yeah, so the output that's getting cut is basically coming from the work

Note how the "Cool, let's test" is missing from the start of the second transcript.

I'm wondering what it would take to keep building the transcript for a long running meeting without losing any of the previously transcribed stuff?

I tried a naive appending approach and that just results in a transcript full of repetition.

So I'm very curious about what it would take to build out a streaming transcription similar to what something like Deepgram would offer. Would that require a change to the pipeline? Are there models that can take an appended transcript with lots of repetition and trim it down to a clean transcript?

Please let me know if my questions are unclear. Just looking for some direction so that I can potentially put up a PR for this (if needed).

The text was updated successfully, but these errors were encountered:

xenova · 2024-06-19T09:52:58Z

Hi there 👋 Indeed, that demo only considers the latest 30 seconds of audio, and was more to showcase the ability of the model to run in real-time with WebGPU. The rest of the pipeline should be implemented by the user, since this is out-of-scope for the transformers.js library (at least for now). I suggest you take a look at this paper, which details a nice way of doing this.

Hope that helps!

iamhitarth added the question Further information is requested label Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long running transcription using webgpu-whisper #802

Long running transcription using webgpu-whisper #802

Long running transcription using webgpu-whisper #802

Long running transcription using webgpu-whisper #802

Comments

Question