-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
voxceleb data preparation for PLDA training #1242
Comments
@nauman-daw, do you remember was done here? |
Thanks @jsvir ! Also @mravanelli , the code looks updated to me. But are we still using csv? I thought you shifted to json for this part as done later for diarization. |
Both csv and json are supported now
…On Sun, Jan 16, 2022 at 8:13 PM Nauman Dawalatabad ***@***.***> wrote:
Thanks @jsvir <https://github.com/jsvir> !
@mravanelli <https://github.com/mravanelli> I think this was done long
back. I don't exactly remember what was done at that time. But just looking
at the loop as @jsvir <https://github.com/jsvir> pointed, it looks like
it is taking 2 different ids, isn't it?
Also @mravanelli <https://github.com/mravanelli> , the code looks updated
to me. But are we still using csv? I thought you shifted to json for this
part as done later for diarization.
—
Reply to this email directly, view it on GitHub
<#1242 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEA2ZVRVY7SFPEOXOS5CFRDUWNUL3ANCNFSM5LW22UJA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@nauman-daw, it takes ids from two different columns but each columns includes all ids so the unique ids are the same in both columns. |
I see. @mravanelli can you please look into this? I think you can quickly check the speaker verification results that you have for ecapa?
Also, looks like this is a general data preparation code for voxceleb and not only for PLDA. |
Hello, Any news on this issue please? Thanks. Best, |
Thank you for your great code!
It looks like you use the same utterance ids both in enrollment and test sets.
In the loop here you split the pair of veri_test file into two part: one for enrollment and one for test. But both columns in veri_test file contain all utterances from the test set.
So I think we should split it into two disjoint sets since enrollment set can contain the same speakers but not the same utterances as in the test set.
The text was updated successfully, but these errors were encountered: