[MRG] Allow for refit=callable in *SearchCV to add flexibility in identifying the best estimator #11269 #11354

wenhaoz-fengcai · 2018-06-25T18:34:31Z

Reference Issues/PRs

Fixes #11269. Fixes #12865. See also #9499

What does this implement/fix? Explain your changes.

Allow a callable to be passed to refit in *SearchCV to balance score and model complexity. This interface adds flexibility in identifying the "best" estimator. The function passed to parameter refit
incorporate of which metric to optimise. Hence users can use multi-metric evaluation with this interface.

Any other comments?

2 test cases added:

test_refit_callable()
- add an example of choosing the mode with the least mean_test_score
test_refit_callable_multi_metric()
- test the same example in a multi-metric evaluation setting

Added documentation describing this feature to users(see additions in _search.py under model_selection directory)
Added a example(plot_grid_search_refit_callable.py) of demonstrating the usage of this interface under examples/model_selection/
This implementation passes all test suites by running make

Checklist:

Rewrite test case for refit=callable using simple dummy refit function.
Rewrite test case for refit=callable using similar example in multi-metric eval settings
Add functions in _search.py to pass the above tests
Documentation
Polishing

…lexity #11269.

jnothman · 2018-06-25T23:38:05Z

sklearn/model_selection/_search.py

+                                         "refit should be set to False "
+                                         "explicitly. %r was passed"
+                                         % self.refit)
+                    refit_metric = scorer_key


I don't get why you need this. you don't use refit_metric if refit is callable below. I also think making inferences from the name of the function is inappropriate.

I think refit_metric is needed to compute self.best_score_ as shown here in the original code base.

Ah, I see. I would just disable best_score_ when refit is callable. Please test and document that behaviour.

jnothman · 2018-06-25T23:38:23Z

sklearn/model_selection/_search.py

        Refit an estimator using the best found parameters on the whole
        dataset.

        For multiple metric evaluation, this needs to be a string denoting the
        scorer is used to find the best parameters for refitting the estimator
        at the end.

+        Where there are considerations other than maximum model performance in
+        choosing a best estimator, ``refit`` can be set to a function which returns
+        thre selected ``best_index_`` given ``cv_results_``.


thre -> the

jnothman · 2018-06-25T23:38:48Z

sklearn/model_selection/_search.py

        Refit an estimator using the best found parameters on the whole
        dataset.

        For multiple metric evaluation, this needs to be a string denoting the
        scorer is used to find the best parameters for refitting the estimator
        at the end.

+        Where there are considerations other than maximum model performance in


model performance -> score

jnothman · 2018-06-25T23:39:13Z

sklearn/model_selection/_search.py

        Refit an estimator using the best found parameters on the whole
        dataset.

        For multiple metric evaluation, this needs to be a string denoting the
        scorer that would be used to find the best parameters for refitting
        the estimator at the end.

+        Where there are considerations other than maximum model performance in


Please use the same text and formatting in both places.

jnothman · 2018-06-25T23:39:36Z

sklearn/model_selection/_search.py

        The refitted estimator is made available at the ``best_estimator_``
        attribute and permits using ``predict`` directly on this
        ``GridSearchCV`` instance.

        Also for multiple metric evaluation, the attributes ``best_index_``,
        ``best_score_`` and ``best_parameters_`` will only be available if
        ``refit`` is set and all of them will be determined w.r.t this specific
-        scorer.
+        scorer. If a callable is passed to parameter refit, the function's name


This is an unnecessary and unhelpful condition.

jnothman · 2018-06-25T23:42:47Z

sklearn/model_selection/tests/test_search.py

+    For multi-metric evaluation, the name of refit callable function must
+    end with a scorer key(`_<scorer_name>`).
+    """
+    def refit_prec(cv_results):


We should have a realistic example in examples/model_selection/ rather than here.

As a simple example, I would consider using maximising score while minimising the number of selected features or PCA components.

Here we should merely be testing interface, and a dummy function (for instance, one that always chooses the lowest-score model) is sufficient / most appropriate, as it is then easy for us to be sure what correct behaviour is.

@jnothman Would you say a dummy function like below is good enough to test our interface?

def refit_callable(cv_results): return cv_results['mean_test_score'].argmin()

It seems that you're suggesting two things here :(

Yes, that looks good. I might add to that an assertion that all the keys we expect to be in results are in there.

Yes, I am indeed suggesting a second thing here. An example in examples/model_selection will hugely increase the visibility and practical usability of this feature. The example gallery is how we advise users how to use the features described in technical detail in the docstrings (and before StackOverflow has all the answers).

Thanks! I'm adding a example from examples/model_selection for this feature in the docstring.

@jnothman is it appropriate to add one more example for refit=callable in the docstring under GridSearchCV class after this one?

scikit-learn/sklearn/model_selection/_search.py

Lines 931 to 958 in 3b5abf7

Examples

--------

>>> from sklearn import svm, datasets

>>> from sklearn.model_selection import GridSearchCV

>>> iris = datasets.load_iris()

>>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}

>>> svc = svm.SVC(gamma="scale")

>>> clf = GridSearchCV(svc, parameters)

>>> clf.fit(iris.data, iris.target)

... # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS

GridSearchCV(cv=None, error_score=...,

estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,

decision_function_shape='ovr', degree=..., gamma=...,

kernel='rbf', max_iter=-1, probability=False,

random_state=None, shrinking=True, tol=...,

verbose=False),

fit_params=None, iid=..., n_jobs=1,

param_grid=..., pre_dispatch=..., refit=..., return_train_score=...,

scoring=..., verbose=...)

>>> sorted(clf.cv_results_.keys())

... # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS

['mean_fit_time', 'mean_score_time', 'mean_test_score',...

'mean_train_score', 'param_C', 'param_kernel', 'params',...

'rank_test_score', 'split0_test_score',...

'split0_train_score', 'split1_test_score', 'split1_train_score',...

'split2_test_score', 'split2_train_score',...

'std_fit_time', 'std_score_time', 'std_test_score', 'std_train_score'...]

I think a meaningful example is too large, and too much of a power-user feature, to be in the docstring.

@jnothman It seems that we dont need to write test cases for our example under examples directory, right? ;)

jnothman · 2018-06-25T23:43:25Z

Feel free to use GitHub's todo list feature in the PR description.

wenhaoz-fengcai · 2018-06-26T00:19:27Z

@jnothman Thanks for your input! I'll improve my implementation based on your feedback.

jnothman · 2018-06-26T03:58:32Z

sklearn/model_selection/tests/test_search.py

+                      enumerate(cv_results['mean_test_prec'])}
+        # Select models which have test precisions within 1 standard deviation
+        # of the best 'mean_test_prec'
+        candidates = dict(filter(lambda i: (i[1] >= test_prec_lower


btw, a dict comprehension is much easier to read than this

So is test_prec_upper > i[1] >= test_prec_lower

jnothman · 2018-06-26T03:59:22Z

sklearn/model_selection/tests/test_search.py

+                    enumerate(cv_results['mean_fit_time'])}
+        fit_time_rank = sorted(fit_time)
+        for i in fit_time_rank:
+            if fit_time[i] in candidates:


This isn't working in AppVeyor. The function is returning None there.

Yes, I'm replacing these two test cases with simpler ones.

… the tests

jnothman

Circle CI should fail if the example does

jnothman

Please reference the example from doc/modules/grid_search.rst. you should probably put the motivation / use case there more than in the example

jnothman · 2018-06-27T07:42:33Z

Documentation is rendered at https://26300-843222-gh.circle-artifacts.com/0/doc/_changed.html

jnothman · 2018-06-27T07:43:58Z

examples/model_selection/plot_grid_search_refit_callable.py

+    }
+]
+
+grid = GridSearchCV(pipe, cv=3, n_jobs=1, param_grid=param_grid,


I don't think we should be encouraging users to calculate a standard deviation over 3 samples. Make cv=10.

jnothman · 2018-06-27T08:10:22Z

examples/model_selection/plot_grid_search_refit_callable.py

+interface can also be used in multiple metrics evaluation.
+
+This example balances model complexity and cross-validated score by 
+finding a decent accuracy within 1 standard deviation of the best accuracy


You might want to say that this is a rule of thumb for insignificant difference.

We could determine insignificant difference in a more proper way, such as with a wilcoxon rank-sum test

jnothman · 2018-06-27T08:11:03Z

examples/model_selection/plot_grid_search_refit_callable.py

@@ -0,0 +1,125 @@
+"""
+=======================================================================
+Balance model complexity and cross-validated score using refit=callable


Drop "using *"

jnothman · 2018-06-27T08:12:49Z

examples/model_selection/plot_grid_search_refit_callable.py

+        upper/lower bounds within 1 standard deviation of the
+        best `mean_test_scores`.
+    """
+    std_test_score = np.std(scores)


Should be using std_test_score: you want standard deviation across cv splits, not across parameter candidates

wenhaoz-fengcai · 2019-01-06T20:51:18Z

@jnothman @adrinjalali Probably need your help to fix travis-ci issue... :-/

sklearn/model_selection/_search.py

adrinjalali

Thanks @jiaowoshabi , LGTM!

jnothman · 2019-01-08T10:57:38Z

Awesome, @jiaowoshabi!

Please add an entry to the change log at doc/whats_new/v0.21.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

jnothman · 2019-01-08T23:00:33Z

doc/whats_new/v0.21.rst

@@ -144,6 +144,14 @@ Support for Python 3.4 and below has been officially dropped.
  :func:`~model_selection.validation_curve` only the latter is required.
  :issue:`12613` and :issue:`12669` by :user:`Marc Torrellas <marctorrellas>`.

+- |Enhancement| :class:`~model_selection.BaseSearchCV` now allows for


BaseSearchCV is not listed in doc/modules/classes.rst so this link won't work. Ordinarily we'd reference GridSearchCV and RandomizedSearchCV. you could also consider referencing the user guide rather than the example?

jnothman · 2019-01-08T23:04:25Z

sklearn/model_selection/_search.py


        See ``scoring`` parameter to know more about multiple metric
        evaluation.

+        .. versionadded:: 0.20


I think versionchanged may be more appropriate, since the parameter was not added.

jnothman · 2019-01-08T23:04:46Z

sklearn/model_selection/_search.py


        See ``scoring`` parameter to know more about multiple metric
        evaluation.

+        .. versionadded:: 0.20
+            GridSearchCV supports ``refit`` = callable to add flexibility in


Don't mention GridSearchCV here. Simply say "Support for callable added." the rest is documented above.

jnothman · 2019-01-09T01:56:35Z

Thanks @jiaowoshabi!!

amueller · 2019-03-07T20:15:15Z

sklearn/model_selection/_search.py

+                self.best_index_ = self.refit(results)
+                if not isinstance(self.best_index_, (int, np.integer)):
+                    raise TypeError('best_index_ returned is not an integer')
+                if self.best_index_ < 0 or self.best_index_ >= len(results):


Pretty sure this is a bug: results is a dictionary of things, and each value is an array the size of the grid.

opened #13413

…est estimator (scikit-learn#11354)

…ng the best estimator (scikit-learn#11354)" This reverts commit b4f76cf.

…est estimator (scikit-learn#11354)

Wenhao added 9 commits June 23, 2018 12:53

Allow for refit=callable in *SearchCV to balance score and model comp…

727b32f

…lexity #11269.

update comments in test case #11269

b4a99f3

add test cases and add multi-metric scoring support

ea38fff

fix code style

d979714

fix more coding style issues

23a9adf

get rid of trailing white space

75892e0

fix code coverage issue

d1ddfb4

removing trailing semi-colon

1969dd5

fix appveyor build fail

d3ab7e5

jnothman reviewed Jun 25, 2018

View reviewed changes

jnothman changed the title ~~Allow for refit=callable in *SearchCV to balance score and model complexity #11269~~ [WIP] Allow for refit=callable in *SearchCV to balance score and model complexity #11269 Jun 25, 2018

jnothman reviewed Jun 26, 2018

View reviewed changes

Wenhao added 9 commits June 26, 2018 10:11

replace with 2 new test cases and add functions in _search.py to pass…

3cb622d

… the tests

delete whitespace in blank line

11246e9

fix code style issue

2a51641

document behavior of best_score_ in this interface

8404687

fix documentation discrepency

deb4e00

add a example in to demo the usage of refit=callable interface

bb8f71d

fix code style issue

57e0836

remove trailing white spaces

e622514

code polishing

5dec514

jnothman reviewed Jun 27, 2018

View reviewed changes

fix flake8 issue

291d152

jnothman reviewed Jun 27, 2018

View reviewed changes

fix: requested changes

02dc0e7

Wenhao added 8 commits January 6, 2019 17:03

fix indentation issue

57d85e1

use pytest.raise

791cb21

PEP8

ea3c178

more PEP8

5625449

more pep8

66f5b08

trying to fix travis issue

7f0e869

refactor test case

cef43cf

pep8

5a431d9

pakallis mentioned this pull request Jan 6, 2019

[MRG] MNT Extract refit_metric assigment to a private method #12918

Closed

adrinjalali reviewed Jan 6, 2019

View reviewed changes

sklearn/model_selection/_search.py Show resolved Hide resolved

Wenhao added 3 commits January 7, 2019 23:37

fixing travis failure

561ab54

trying to use pytest.raises

e4af2a7

trying to use pytest.raises

b6edd2d

adrinjalali approved these changes Jan 8, 2019

View reviewed changes

add whats new

916f995

jnothman reviewed Jan 8, 2019

View reviewed changes

Wenhao and others added 2 commits January 9, 2019 00:29

what's new

800e547

Tweak to what's new

b4ba215

jnothman merged commit 5817520 into scikit-learn:master Jan 9, 2019

amueller reviewed Mar 7, 2019

View reviewed changes

amueller mentioned this pull request Mar 7, 2019

bug in refit=callable validation #13413

Closed

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

ENH refit=callable in *SearchCV adds flexibility in identifying the b…

b4f76cf

…est estimator (scikit-learn#11354)

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "ENH refit=callable in *SearchCV adds flexibility in identifyi…

a55411b

…ng the best estimator (scikit-learn#11354)" This reverts commit b4f76cf.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "ENH refit=callable in *SearchCV adds flexibility in identifyi…

387e3dc

…ng the best estimator (scikit-learn#11354)" This reverts commit b4f76cf.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

ENH refit=callable in *SearchCV adds flexibility in identifying the b…

3f73b24

…est estimator (scikit-learn#11354)

dPys mentioned this pull request Aug 27, 2019

OneSE: the One-Standard-Error Rule #14820

Closed

dPys mentioned this pull request Feb 22, 2022

ENH: Constrained model selection with SearchCV #22573

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Allow for refit=callable in *SearchCV to add flexibility in identifying the best estimator #11269 #11354

[MRG] Allow for refit=callable in *SearchCV to add flexibility in identifying the best estimator #11269 #11354


	Examples
	--------
	>>> from sklearn import svm, datasets
	>>> from sklearn.model_selection import GridSearchCV
	>>> iris = datasets.load_iris()
	>>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
	>>> svc = svm.SVC(gamma="scale")
	>>> clf = GridSearchCV(svc, parameters)
	>>> clf.fit(iris.data, iris.target)
	... # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
	GridSearchCV(cv=None, error_score=...,
	estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,
	decision_function_shape='ovr', degree=..., gamma=...,
	kernel='rbf', max_iter=-1, probability=False,
	random_state=None, shrinking=True, tol=...,
	verbose=False),
	fit_params=None, iid=..., n_jobs=1,
	param_grid=..., pre_dispatch=..., refit=..., return_train_score=...,
	scoring=..., verbose=...)
	>>> sorted(clf.cv_results_.keys())
	... # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
	['mean_fit_time', 'mean_score_time', 'mean_test_score',...
	'mean_train_score', 'param_C', 'param_kernel', 'params',...
	'rank_test_score', 'split0_test_score',...
	'split0_train_score', 'split1_test_score', 'split1_train_score',...
	'split2_test_score', 'split2_train_score',...
	'std_fit_time', 'std_score_time', 'std_test_score', 'std_train_score'...]

[MRG] Allow for refit=callable in *SearchCV to add flexibility in identifying the best estimator #11269 #11354

[MRG] Allow for refit=callable in *SearchCV to add flexibility in identifying the best estimator #11269 #11354

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment