[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Improve efficiency of chi2 by converting the input to float first #22235

Merged
merged 3 commits into from
Jan 21, 2022

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Fixes #22231

What does this implement/fix? Explain your changes.

This PR converts the input of chi2 into float so safe_sparse_dot is faster. We will convert observed into float64 later:

f_obs = np.asarray(f_obs, dtype=np.float64)

so I think it makes sense to do it ahead of time with X, so that the matmul is faster.

CC @glemaitre

Copy link
Member
@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

................................

- |Efficiency| Improve runtime performance of :func:`feature_selection.chi2`
with boolean arrays. :pr:`xxxxx` by `Thomas Fan`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with boolean arrays. :pr:`xxxxx` by `Thomas Fan`_.
with boolean arrays. :pr:`22235` by `Thomas Fan`_.

@@ -210,7 +210,7 @@ def chi2(X, y):

# XXX: we might want to do some of the following in logspace instead for
# numerical stability.
X = check_array(X, accept_sparse="csr")
X = check_array(X, accept_sparse="csr", dtype=np.float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

Suggested change
X = check_array(X, accept_sparse="csr", dtype=np.float64)
X = check_array(X, accept_sparse="csr", dtype=[np.float64, np.float32])

?

Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
@jjerphan jjerphan merged commit e1e8c66 into scikit-learn:main Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chi2 significantly slower for np.ndarray (of bools/ints) compared to pd.DataFrame (of bools/ints)
3 participants