-
-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Improve efficiency of chi2 by converting the input to float first #22235
Conversation
645f42b
to
672396d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
doc/whats_new/v1.1.rst
Outdated
................................ | ||
|
||
- |Efficiency| Improve runtime performance of :func:`feature_selection.chi2` | ||
with boolean arrays. :pr:`xxxxx` by `Thomas Fan`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with boolean arrays. :pr:`xxxxx` by `Thomas Fan`_. | |
with boolean arrays. :pr:`22235` by `Thomas Fan`_. |
@@ -210,7 +210,7 @@ def chi2(X, y): | |||
|
|||
# XXX: we might want to do some of the following in logspace instead for | |||
# numerical stability. | |||
X = check_array(X, accept_sparse="csr") | |||
X = check_array(X, accept_sparse="csr", dtype=np.float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
X = check_array(X, accept_sparse="csr", dtype=np.float64) | |
X = check_array(X, accept_sparse="csr", dtype=[np.float64, np.float32]) |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Reference Issues/PRs
Fixes #22231
What does this implement/fix? Explain your changes.
This PR converts the input of
chi2
into float sosafe_sparse_dot
is faster. We will convertobserved
intofloat64
later:scikit-learn/sklearn/feature_selection/_univariate_selection.py
Line 157 in 0c65bbf
so I think it makes sense to do it ahead of time with
X
, so that the matmul is faster.CC @glemaitre