[MRG] iforest chunks for score_samples #13283

ngoix · 2019-02-26T14:44:46Z

fixes #12040

is currently based on top of #13260 (so do not review yet)
needs to be rebased on #13251

agramfort · 2019-03-02T08:05:49Z

@ngoix please put the PR title to MRG when ready on your side

tests fix flake8 fix ci

ngoix · 2019-03-02T14:49:53Z

sklearn/ensemble/tests/test_iforest.py

@@ -325,3 +326,26 @@ def test_behaviour_param():
    clf2 = IsolationForest(behaviour='new', contamination='auto').fit(X_train)
    assert_array_equal(clf1.decision_function([[2., 2.]]),
                       clf2.decision_function([[2., 2.]]))
+
+
+# mock get_chunk_n_rows to actually test more than one chunk (here one


any idea how to merge these 2 tests ? they are the same, just with a different mocked chunk size

Use pytest's monkeypatch fixture rather than unittest.mock alone?

But for this to be a reasonable test, you also need to test that the Mock has been called (or does patch do that?)

ngoix · 2019-03-04T10:22:32Z

@agramfort this is ready for reviews

agramfort

Update what’s new needed

jnothman

Mostly looking good

jnothman · 2019-03-06T18:11:08Z

sklearn/ensemble/iforest.py

+             " be removed in 0.22.", DeprecationWarning)
+        return self._threshold_
+
+    def _compute_chunked_score_samples(self, X, working_memory=None):


If not for testing, why do we need the working_melody parameter here?

Why do we need the working_memory parameter here?

it is not useful indeed, thanks

jnothman · 2019-03-06T19:57:46Z

sklearn/ensemble/tests/test_iforest.py

@@ -325,3 +326,26 @@ def test_behaviour_param():
    clf2 = IsolationForest(behaviour='new', contamination='auto').fit(X_train)
    assert_array_equal(clf1.decision_function([[2., 2.]]),
                       clf2.decision_function([[2., 2.]]))
+
+
+# mock get_chunk_n_rows to actually test more than one chunk (here one


Use pytest's monkeypatch fixture rather than unittest.mock alone?

But for this to be a reasonable test, you also need to test that the Mock has been called (or does patch do that?)

jnothman · 2019-03-11T06:23:29Z

Tests failing

jnothman

Otherwise lgtm

jnothman · 2019-03-17T22:17:18Z

sklearn/ensemble/iforest.py

+             " be removed in 0.22.", DeprecationWarning)
+        return self._threshold_
+
+    def _compute_chunked_score_samples(self, X, working_memory=None):


Why do we need the working_memory parameter here?

jnothman · 2019-03-17T22:17:59Z

doc/whats_new/v0.21.rst

@@ -144,6 +144,9 @@ Support for Python 3.4 and below has been officially dropped.
  by avoiding keeping in memory each tree prediction. :issue:`13260` by
  `Nicolas Goix`_.

+- |Efficiency| :class:`ensemble.IsolationForest` now uses chunks of data at
+  prediction step. :issue:`13283` by `Nicolas Goix`_.


Perhaps say something like "capping the memory usage"

jnothman · 2019-03-18T10:06:13Z

THanks @ngoix!

…cikit-learn#13283)

…utation (scikit-learn#13283)" This reverts commit bce9351.

…cikit-learn#13283)

ngoix mentioned this pull request Feb 27, 2019

Isolation forest final stage very slow and single threaded #13295

Closed

ngoix force-pushed the iforest_chunk branch from 75c358e to 591fd9e Compare March 1, 2019 13:49

ENH chunk data - iforest score_samples

40a5a4b

tests fix flake8 fix ci

ngoix force-pushed the iforest_chunk branch from 3aac238 to 40a5a4b Compare March 2, 2019 14:47

ngoix commented Mar 2, 2019

View reviewed changes

ngoix changed the title ~~[WIP] iforest chunks for score_samples~~ [MRG] iforest chunks for score_samples Mar 2, 2019

agramfort approved these changes Mar 4, 2019

View reviewed changes

jnothman reviewed Mar 6, 2019

View reviewed changes

assert mock called in tests

2306485

ngoix added 3 commits March 12, 2019 22:40

fix tests

d5638a7

pep8

c094a09

update whatsnew

d647498

jnothman approved these changes Mar 17, 2019

View reviewed changes

rm working_memory param

4463cc1

jnothman merged commit c22b871 into scikit-learn:master Mar 18, 2019

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

ENH iforest's score_samples uses chunks for fixed-memory computation (s…

bce9351

…cikit-learn#13283)

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "ENH iforest's score_samples uses chunks for fixed-memory comp…

33946b7

…utation (scikit-learn#13283)" This reverts commit bce9351.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "ENH iforest's score_samples uses chunks for fixed-memory comp…

7e8d4a8

…utation (scikit-learn#13283)" This reverts commit bce9351.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

ENH iforest's score_samples uses chunks for fixed-memory computation (s…

ae6dbfe

…cikit-learn#13283)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] iforest chunks for score_samples #13283

[MRG] iforest chunks for score_samples #13283

[MRG] iforest chunks for score_samples #13283

[MRG] iforest chunks for score_samples #13283

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment