[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long running transcription using webgpu-whisper #802

Open
iamhitarth opened this issue Jun 10, 2024 · 1 comment
Open

Long running transcription using webgpu-whisper #802

iamhitarth opened this issue Jun 10, 2024 · 1 comment
Labels
question Further information is requested

Comments

@iamhitarth
Copy link
iamhitarth commented Jun 10, 2024

Question

Noob question - the webgpu-whisper demo does real time transcription, however it doesn't build out a full transcript from the start ie. 2 mins into transcription, the first few transcribed lines disappear.

Transcript at time x 馃憞

Cool, let's test this out. We'll see how this works. So turns out that the transcription when I try to access it is actually just empty. And so the only thing that actually comes through is. So yeah, so the output that's getting cut is basically coming from the

Transcript at time x+1 馃憞

this out, we'll see how this works. So turns out that the transcription when I try to access it is actually just empty. And so the only thing that actually comes through is. So yeah, so the output that's getting cut is basically coming from the work

Note how the "Cool, let's test" is missing from the start of the second transcript.

I'm wondering what it would take to keep building the transcript for a long running meeting without losing any of the previously transcribed stuff?

I tried a naive appending approach and that just results in a transcript full of repetition.

So I'm very curious about what it would take to build out a streaming transcription similar to what something like Deepgram would offer. Would that require a change to the pipeline? Are there models that can take an appended transcript with lots of repetition and trim it down to a clean transcript?

Please let me know if my questions are unclear. Just looking for some direction so that I can potentially put up a PR for this (if needed).

@iamhitarth iamhitarth added the question Further information is requested label Jun 10, 2024
@xenova
Copy link
Owner
xenova commented Jun 19, 2024

Hi there 馃憢 Indeed, that demo only considers the latest 30 seconds of audio, and was more to showcase the ability of the model to run in real-time with WebGPU. The rest of the pipeline should be implemented by the user, since this is out-of-scope for the transformers.js library (at least for now). I suggest you take a look at this paper, which details a nice way of doing this.

Hope that helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants