voxceleb data preparation for PLDA training #1242

jsvir · 2022-01-11T17:07:12Z

Thank you for your great code!

It looks like you use the same utterance ids both in enrollment and test sets.

In the loop here you split the pair of veri_test file into two part: one for enrollment and one for test. But both columns in veri_test file contain all utterances from the test set.

So I think we should split it into two disjoint sets since enrollment set can contain the same speakers but not the same utterances as in the test set.

mravanelli · 2022-01-16T03:10:54Z

@nauman-daw, do you remember was done here?

nauman-daw · 2022-01-17T01:13:22Z

Thanks @jsvir !
@mravanelli I think this was done long back. I don't exactly remember what was done at that time. But just looking at the loop as @jsvir pointed, it looks like it is taking 2 different ids, isn't it?

Also @mravanelli , the code looks updated to me. But are we still using csv? I thought you shifted to json for this part as done later for diarization.

mravanelli · 2022-01-17T01:15:56Z

Both csv and json are supported now

…

On Sun, Jan 16, 2022 at 8:13 PM Nauman Dawalatabad ***@***.***> wrote: Thanks @jsvir <https://github.com/jsvir> ! @mravanelli <https://github.com/mravanelli> I think this was done long back. I don't exactly remember what was done at that time. But just looking at the loop as @jsvir <https://github.com/jsvir> pointed, it looks like it is taking 2 different ids, isn't it? Also @mravanelli <https://github.com/mravanelli> , the code looks updated to me. But are we still using csv? I thought you shifted to json for this part as done later for diarization. — Reply to this email directly, view it on GitHub <#1242 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVRVY7SFPEOXOS5CFRDUWNUL3ANCNFSM5LW22UJA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

jsvir · 2022-01-17T06:21:51Z

@nauman-daw, it takes ids from two different columns but each columns includes all ids so the unique ids are the same in both columns.

nauman-daw · 2022-01-18T03:10:59Z

I see. @mravanelli can you please look into this? I think you can quickly check the speaker verification results that you have for ecapa?
Just for reference... this file is used (https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test2.txt) was used here (

speechbrain/recipes/VoxCeleb/SpeakerRec/hparams/train_x_vectors.yaml

Line 27 in e14d2c1

    
           verification_file: https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test2.txt

)
Also, looks like this is a general data preparation code for voxceleb and not only for PLDA.

Adel-Moumen · 2023-09-01T14:42:42Z

Hello,

Any news on this issue please? Thanks.

Best,
Adel

mravanelli assigned nauman-daw Jan 16, 2022

anautsch added this to To do in Speaker Recognition & Diarization (Voice Biometrics) via automation Apr 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

voxceleb data preparation for PLDA training #1242

voxceleb data preparation for PLDA training #1242

voxceleb data preparation for PLDA training #1242

voxceleb data preparation for PLDA training #1242

Comments