You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Noob question - the webgpu-whisper demo does real time transcription, however it doesn't build out a full transcript from the start ie. 2 mins into transcription, the first few transcribed lines disappear.
Transcript at time x 馃憞
Cool, let's test this out. We'll see how this works. So turns out that the transcription when I try to access it is actually just empty. And so the only thing that actually comes through is. So yeah, so the output that's getting cut is basically coming from the
Transcript at time x+1 馃憞
this out, we'll see how this works. So turns out that the transcription when I try to access it is actually just empty. And so the only thing that actually comes through is. So yeah, so the output that's getting cut is basically coming from the work
Note how the "Cool, let's test" is missing from the start of the second transcript.
I'm wondering what it would take to keep building the transcript for a long running meeting without losing any of the previously transcribed stuff?
I tried a naive appending approach and that just results in a transcript full of repetition.
So I'm very curious about what it would take to build out a streaming transcription similar to what something like Deepgram would offer. Would that require a change to the pipeline? Are there models that can take an appended transcript with lots of repetition and trim it down to a clean transcript?
Please let me know if my questions are unclear. Just looking for some direction so that I can potentially put up a PR for this (if needed).
The text was updated successfully, but these errors were encountered:
Hi there 馃憢 Indeed, that demo only considers the latest 30 seconds of audio, and was more to showcase the ability of the model to run in real-time with WebGPU. The rest of the pipeline should be implemented by the user, since this is out-of-scope for the transformers.js library (at least for now). I suggest you take a look at this paper, which details a nice way of doing this.
Question
Noob question - the webgpu-whisper demo does real time transcription, however it doesn't build out a full transcript from the start ie. 2 mins into transcription, the first few transcribed lines disappear.
Transcript at time x 馃憞
Transcript at time x+1 馃憞
Note how the "Cool, let's test" is missing from the start of the second transcript.
I'm wondering what it would take to keep building the transcript for a long running meeting without losing any of the previously transcribed stuff?
I tried a naive appending approach and that just results in a transcript full of repetition.
So I'm very curious about what it would take to build out a streaming transcription similar to what something like Deepgram would offer. Would that require a change to the pipeline? Are there models that can take an appended transcript with lots of repetition and trim it down to a clean transcript?
Please let me know if my questions are unclear. Just looking for some direction so that I can potentially put up a PR for this (if needed).
The text was updated successfully, but these errors were encountered: