[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

voxceleb data preparation for PLDA training #1242

Open
jsvir opened this issue Jan 11, 2022 · 6 comments
Open

voxceleb data preparation for PLDA training #1242

jsvir opened this issue Jan 11, 2022 · 6 comments

Comments

@jsvir
Copy link
jsvir commented Jan 11, 2022

Thank you for your great code!

It looks like you use the same utterance ids both in enrollment and test sets.

In the loop here you split the pair of veri_test file into two part: one for enrollment and one for test. But both columns in veri_test file contain all utterances from the test set.

So I think we should split it into two disjoint sets since enrollment set can contain the same speakers but not the same utterances as in the test set.

@mravanelli
Copy link
Collaborator

@nauman-daw, do you remember was done here?

@nauman-daw
Copy link
Collaborator

Thanks @jsvir !
@mravanelli I think this was done long back. I don't exactly remember what was done at that time. But just looking at the loop as @jsvir pointed, it looks like it is taking 2 different ids, isn't it?

Also @mravanelli , the code looks updated to me. But are we still using csv? I thought you shifted to json for this part as done later for diarization.

@mravanelli
Copy link
Collaborator
mravanelli commented Jan 17, 2022 via email

@jsvir
Copy link
Author
jsvir commented Jan 17, 2022

@nauman-daw, it takes ids from two different columns but each columns includes all ids so the unique ids are the same in both columns.

@nauman-daw
Copy link
Collaborator
nauman-daw commented Jan 18, 2022

I see. @mravanelli can you please look into this? I think you can quickly check the speaker verification results that you have for ecapa?
Just for reference... this file is used (https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test2.txt) was used here (

verification_file: https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test2.txt
)
Also, looks like this is a general data preparation code for voxceleb and not only for PLDA.

@Adel-Moumen
Copy link
Collaborator

Hello,

Any news on this issue please? Thanks.

Best,
Adel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants