[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Added warning for ndcg_score when used w/ negative y_true values #23461

Merged
merged 31 commits into from
Jun 22, 2022

Conversation

Micky774
Copy link
Contributor

Reference Issues/PRs

Resolves #22710
Fixes #17639

What does this implement/fix? Explain your changes.

Adds warning for ndcg_score when used w/ negative y_true values

Any other comments?

sklearn/metrics/_ranking.py Show resolved Hide resolved
sklearn/metrics/_ranking.py Outdated Show resolved Hide resolved
sklearn/metrics/_ranking.py Outdated Show resolved Hide resolved
doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved
Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not looked into the literature. Is there a valid use case for y_true < 0?

sklearn/metrics/_ranking.py Outdated Show resolved Hide resolved
doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved
@Micky774
Copy link
Contributor Author
Micky774 commented Jun 7, 2022

I have not looked into the literature. Is there a valid use case for y_true < 0?

There was a bit of discussion about this in the original issue #17639. I'm largely basing this off of Wang et. al. 2013 which has ~250 citations (details) and affirms the expectation that NDCG is contained within $[0,1]$. More directly, this survey paper on information-retrieval methods also uses relevance values (y_true) no less than 0.

Indeed, with the idea of relevance in document retrieval where NDCG was first formulated, it makes more sense for an additional document to offer no information and have no relevance, than to remove information or relevance from the overall query. From what I've read so far, including this additional survey paper talking about IR methods overall, cumulative gain (CG) is assumed to be monotone non-decreasing, so I don't think that y_true<0 is a valid use case, at the very least not for what it was designed to do.

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, otherwise LGTM.

sklearn/metrics/_ranking.py Outdated Show resolved Hide resolved
@thomasjpfan thomasjpfan added the Quick Review For PRs that are quick to review label Jun 7, 2022
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
@glemaitre glemaitre self-requested a review June 13, 2022 18:35
Copy link
Member
@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just fixed the what's new entry that was not in the appropriate section.
Thanks @Micky774

@jeremiedbb jeremiedbb merged commit 9bb2098 into scikit-learn:main Jun 22, 2022
ogrisel pushed a commit to ogrisel/scikit-learn that referenced this pull request Jul 11, 2022
…ues (scikit-learn#23461)

Co-authored-by: trinhcon <conroy.trinh@mail.utoronto.ca>
Co-authored-by: Victor Ko <vk07275@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:metrics Quick Review For PRs that are quick to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ndcg_score fails for negative scores
5 participants