[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] iforest chunks for score_samples #13283

Merged
merged 6 commits into from
Mar 18, 2019

Conversation

ngoix
Copy link
Contributor
@ngoix ngoix commented Feb 26, 2019

fixes #12040

is currently based on top of #13260 (so do not review yet)
needs to be rebased on #13251

@agramfort
Copy link
Member

@ngoix please put the PR title to MRG when ready on your side

@@ -325,3 +326,26 @@ def test_behaviour_param():
clf2 = IsolationForest(behaviour='new', contamination='auto').fit(X_train)
assert_array_equal(clf1.decision_function([[2., 2.]]),
clf2.decision_function([[2., 2.]]))


# mock get_chunk_n_rows to actually test more than one chunk (here one
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea how to merge these 2 tests ? they are the same, just with a different mocked chunk size

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pytest's monkeypatch fixture rather than unittest.mock alone?

But for this to be a reasonable test, you also need to test that the Mock has been called (or does patch do that?)

@ngoix ngoix changed the title [WIP] iforest chunks for score_samples [MRG] iforest chunks for score_samples Mar 2, 2019
@ngoix
Copy link
Contributor Author
ngoix commented Mar 4, 2019

@agramfort this is ready for reviews

Copy link
Member
@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update what’s new needed

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looking good

" be removed in 0.22.", DeprecationWarning)
return self._threshold_

def _compute_chunked_score_samples(self, X, working_memory=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not for testing, why do we need the working_melody parameter here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the working_memory parameter here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not useful indeed, thanks

@@ -325,3 +326,26 @@ def test_behaviour_param():
clf2 = IsolationForest(behaviour='new', contamination='auto').fit(X_train)
assert_array_equal(clf1.decision_function([[2., 2.]]),
clf2.decision_function([[2., 2.]]))


# mock get_chunk_n_rows to actually test more than one chunk (here one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pytest's monkeypatch fixture rather than unittest.mock alone?

But for this to be a reasonable test, you also need to test that the Mock has been called (or does patch do that?)

@jnothman
Copy link
Member

Tests failing

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise lgtm

" be removed in 0.22.", DeprecationWarning)
return self._threshold_

def _compute_chunked_score_samples(self, X, working_memory=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the working_memory parameter here?

@@ -144,6 +144,9 @@ Support for Python 3.4 and below has been officially dropped.
by avoiding keeping in memory each tree prediction. :issue:`13260` by
`Nicolas Goix`_.

- |Efficiency| :class:`ensemble.IsolationForest` now uses chunks of data at
prediction step. :issue:`13283` by `Nicolas Goix`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps say something like "capping the memory usage"

@jnothman jnothman merged commit c22b871 into scikit-learn:master Mar 18, 2019
@jnothman
Copy link
Member

THanks @ngoix!

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Isolation forest - decision_function & average_path_length method are memory inefficient
3 participants