GridSearchCV and HalvingGridSearchCV remove validation from init and set_params #21880

MrinalTyagi · 2021-12-04T20:00:26Z

Reference Issues/PRs

Fixes #21406

What does this implement/fix? Explain your changes.

All test working for HalvingGridSearchCV. Working on GridSearchCV

Any other comments?

MrinalTyagi · 2021-12-05T07:10:34Z

@glemaitre @ogrisel Could you please guide me on how to fix the failing tests?

ogrisel

Instead of doing this, I think we should adapt the ParamGrid.__init__ constructor to consolidate it with the missing checks of _check_param_grid and remove the _check_param_grid private helper entirely.

Note that the existing code in ParamGrid.__init__ tend to raise TypeError while approximately matching statements in _check_param_grid would instead raise ValueError. I am not sure if others agree but whenever the condition if checking with isinstance clauses, I think we should raise TypeError. For the clause that checks len(v) == 0, I think it's correct to raise ValueError.

This means that this code will introduce an additional behavior change for cases where we will raise TypeError instead of ValueError but I think this is fine because we are introducing a behavior change anyway.

sklearn/model_selection/tests/test_search.py

ogrisel · 2021-12-06T18:27:01Z

Forgot to reply to:

@glemaitre @ogrisel Could you please guide me on how to fix the failing tests?

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=35734&view=logs&j=78a0bf4f-79e5-5387-94ec-13e67d216d6e&t=dbc16e8a-e84c-5cb6-14f6-590d872d6e0b

E           ValueError: 
E           
E           /home/vsts/work/1/s/sklearn/model_selection/_search.py
E           
E           RandomizedSearchCV.fit(self, X, y=None, *, groups=None, **fit_params)
E           
E           
E           
E           # Errors
E           
E            - GL08: The object does not have a docstring

This is because you introduced a new public function (RandomizedSearchCV.fit) without a docstring. But if you consider my suggestion above, this problem will go away because there would be no need to introduce a new public method on the base class.

This reverts commit 6b64d03.

This reverts commit 1f581ff.

This reverts commit b6abb12.

MrinalTyagi · 2021-12-06T18:57:53Z

Forgot to reply to:

@glemaitre @ogrisel Could you please guide me on how to fix the failing tests?

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=35734&view=logs&j=78a0bf4f-79e5-5387-94ec-13e67d216d6e&t=dbc16e8a-e84c-5cb6-14f6-590d872d6e0b
E           ValueError: 
E           
E           /home/vsts/work/1/s/sklearn/model_selection/_search.py
E           
E           RandomizedSearchCV.fit(self, X, y=None, *, groups=None, **fit_params)
E           
E           
E           
E           # Errors
E           
E            - GL08: The object does not have a docstring
This is because you introduced a new public function (RandomizedSearchCV.fit) without a docstring. But if you consider my suggestion above, this problem will go away because there would be no need to introduce a new public method on the base class.

surely will be proceeding with your suggestion.

MrinalTyagi · 2021-12-06T19:39:36Z

@ogrisel I have tried to make changes to ParameterGrid in a way you asked. Let me know if I was able to replicate the behaviour you told me to. Also in case this works I think, there should be changes to tests for the same regarding the error message that its currently testing to compare using pytest.

ogrisel

This looks good to me but the change should be documented in the changelog as done for the other PRs related to #21406.

I am surprised the tests for HalvingGridSearchCV did not require a similar update but no big deal.

sklearn/model_selection/tests/test_search.py

ogrisel · 2021-12-07T00:00:15Z

sklearn/model_selection/tests/test_search.py

@@ -448,16 +449,18 @@ def test_grid_search_bad_param_grid():
        " Single values need to be wrapped in a list"
        " with one element."
    )
+    search = GridSearchCV(clf, param_dict)
    with pytest.raises(ValueError, match=error_msg):


I think this needs to be updated expect a TypeError with a new expected error message. Launch the test locally to see the failure of this specific test more quickly on your local machine before pushing:

pytest sklearn/model_selection/tests/test_search.py -k test_grid_search_bad_param_grid

MrinalTyagi · 2021-12-07T05:17:28Z

@ogrisel I was not able to figure out how to update the test despite several tries. Will it be possible for you to update the error messages?

ogrisel · 2021-12-07T09:41:29Z

@ogrisel I was not able to figure out how to update the test despite several tries. Will it be possible for you to update the error messages?

@MrinalTyagi I will have a more detailed look at the first case to get you started and then let you do the others.

As a side note, please used dedicated named branch (different from main) to create your pull request. Putting your local changes into a branch named main will be the cause of endless confusion when using git to synchronize your repo with upstream changes. No need to change anything for this PR, but next time use:

Assuming that upstream is your short name for the https://github.com/scikit-learn/scikit-learn remote repository (you can check with the git remote -v command), then you can sync-up your local repo and create a branch off of the upstream main branch using:

git fetch upstream main
git checkout -b branch-dedicated-to-some-bugfix upstream/main

You obviously need to replace branch-dedicated-to-some-bugfix by some meaningful short name that concisely though specifically relates to the purpose of your branch.

…grid

ogrisel · 2021-12-07T11:02:41Z

@MrinalTyagi I will have a more detailed look at the first case to get you started and then let you do the others.

I pushed 0f1e162 with what I think it is a good consolidation of the previous checks of _check_param_grid into the constructor of ParamGrid. I took the opportunity to improve the error messages to make them slightly more explicit and use TypeError or ValueError depending on whether the type or the value of a parameter is invalid.

I also fixed the first assertion. Please fetch this new commit in your local main branch (since this is unfortunately the one you used to work on this PR):

git pull origin main

and then run the tests as:

pytest -vl -k test_grid_search_bad_param_grid  sklearn/model_selection/tests/test_search.py

then fix the next failing assertion with an updated error message and/or exception type and iterate.

ogrisel · 2021-12-07T11:05:11Z

Once you are done with test_grid_search_bad_param_grid then move on to test_validate_parameter_input that will require similar updates.

MrinalTyagi · 2021-12-07T14:39:57Z

Once you are done with test_grid_search_bad_param_grid then move on to test_validate_parameter_input that will require similar updates.

Done with the test_grid_search_bad_param_grid but finding difficulty in test_validate_parameter_input

ogrisel · 2021-12-07T16:41:42Z

The test_validate_parameter_input test uses parametrized tests where the third argument (error_message) is the regular expression pattern passed to the pytest exception matcher in the body of the test function.

If you are not familiar with regexps, you at least need to know that the sub-pattern .* can match any substring of any length in a regexp. Parenthesis and angle brackets have a special meaning and need to be escaped with a \: for instance the r"\[\]" pattern will match the string representation of an empty list:"[]".

Note that, in Python, regular expressions patterns are typically written using "raw" Python string literals (r"this is a raw string" instead of "this is a regular string") to avoid having the to double \ characters that are often used in regexp patterns.

If you don't want, to write regexp patterns, you can alternatively copy the full expected message for the tuple entry that defines the error_message parameter by calling re.escape("The full error message here"). Rely on black to automatically reformat such complex multiline expressions.

MrinalTyagi · 2021-12-07T22:34:12Z

@ogrisel Updated the error message to give more info as well as in the third test for test_validate_parameter_input, there is very less similarity between parameter sampler and parameter grid so added a .*

MrinalTyagi · 2021-12-08T16:21:59Z

@ogrisel Please merge. Thanks for the great mentoring.

ogrisel · 2021-12-09T09:38:14Z

@ogrisel Please merge.

For such a complex PR we will need for a second reviewer.

Thanks for the great mentoring.

Thanks for your patience ;) I think that the error messages are much improved and more consistent, on top of the original purpose of the PR to perform the checks only in fit instead of __init__.

ogrisel · 2021-12-09T14:52:58Z

Just so that this does not get forgotten: there is a remaining comment to address:

#21880 (comment)

Other than that, LGTM.

MrinalTyagi · 2021-12-09T17:14:19Z

Just so that this does not get forgotten: there is a remaining comment to address:

#21880 (comment)

Other than that, LGTM.

Thank you for the reminder. Forgot that by mistake. Have updated that now. Hopefully, it looks as you expected it to be now.

thomasjpfan

Thank you for the PR @MrinalTyagi !

sklearn/model_selection/_search.py

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

thomasjpfan

Minor comments, otherwise LGTM!

sklearn/model_selection/_search.py

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

This reverts commit a2ae9b2.

This reverts commit 6d344fc.

MrinalTyagi · 2021-12-10T21:41:36Z

@ogrisel @thomasjpfan I have written that code due to which conflicts were happening. Do I remove that code now after these tests?

MrinalTyagi · 2021-12-11T15:21:23Z

@ogrisel @thomasjpfan Do merge

glemaitre · 2021-12-15T10:34:45Z

Merging since 2 approvals and after making a quick check. Thanks @MrinalTyagi

Base Version

b6abb12

github-actions bot added the module:model_selection label Dec 4, 2021

MrinalTyagi changed the title ~~Remove validation from __init__ and set_params from GridSearchCV and HalvingGridSearchCV~~ GridSearchCV and HalvingGridSearchCV remove validation from __init__ and set_params from Dec 4, 2021

MrinalTyagi added 2 commits December 5, 2021 11:31

Updates for GridSearchCV

1f581ff

Update _search_successive_halving.py

6b64d03

MrinalTyagi changed the title ~~GridSearchCV and HalvingGridSearchCV remove validation from __init__ and set_params from~~ GridSearchCV and HalvingGridSearchCV remove validation from __init__ and set_params Dec 5, 2021

ogrisel reviewed Dec 6, 2021

View reviewed changes

sklearn/model_selection/tests/test_search.py Outdated Show resolved Hide resolved

MrinalTyagi added 4 commits December 7, 2021 00:12

Merge branch 'scikit-learn:main' into main

46c5a5e

Revert "Update _search_successive_halving.py"

431bd7f

This reverts commit 6b64d03.

Revert "Updates for GridSearchCV"

b532260

This reverts commit 1f581ff.

Revert "Base Version"

4fe7f87

This reverts commit b6abb12.

Updated in ParameterGrid with asked changes

a4c65bf

Update _search.py

8e2db67

ogrisel reviewed Dec 6, 2021

View reviewed changes

sklearn/model_selection/tests/test_search.py Outdated Show resolved Hide resolved

ogrisel reviewed Dec 7, 2021

View reviewed changes

Updated changelogs

18722f8

Improve checks and fix first assertion in test_grid_search_bad_param_…

0f1e162

…grid

Update test_search.py

20a5cbd

Updated test_validate_parameter_input

c81103d

Update _search.py

1f4be54

ogrisel added the Waiting for Reviewer label Dec 9, 2021

Updated error messages to contain key

c2e6e0b

thomasjpfan reviewed Dec 9, 2021

View reviewed changes

sklearn/model_selection/_search.py Outdated Show resolved Hide resolved

MrinalTyagi and others added 2 commits December 10, 2021 03:58

Update sklearn/model_selection/_search.py

c21c5e3

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Update _search.py

b8330d0

MrinalTyagi requested a review from thomasjpfan December 10, 2021 14:32

thomasjpfan approved these changes Dec 10, 2021

View reviewed changes

sklearn/model_selection/_search.py Outdated Show resolved Hide resolved

sklearn/model_selection/_search.py Outdated Show resolved Hide resolved

MrinalTyagi and others added 7 commits December 11, 2021 02:51

Update sklearn/model_selection/_search.py

a2ae9b2

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Update sklearn/model_selection/_search.py

6d344fc

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Revert "Update sklearn/model_selection/_search.py"

ba498be

This reverts commit a2ae9b2.

Revert "Update sklearn/model_selection/_search.py"

bc7cdb9

This reverts commit 6d344fc.

Updated with required changes

e490a11

Update test_common.py

57d1625

Update test_common.py

5a966fe

MrinalTyagi added 2 commits December 11, 2021 11:04

Updated test files

194939a

Merge branch 'main' into main

5b1b00e

glemaitre merged commit 7774697 into scikit-learn:main Dec 15, 2021

jrbourbeau mentioned this pull request Dec 17, 2021

⚠️ Upstream CI failed ⚠️ dask/dask-ml#894

Closed

This was referenced Dec 28, 2021

Add daily unit tests Quantco/glum#502

Merged

Add scikit-learn to nightlies Quantco/glum#506

Closed

Handle upstream removal of _check_param_grid Quantco/glum#507

Closed

charlesbluca mentioned this pull request Jan 18, 2022

New dask-ml release to resolve upstream breaking changes? dask/dask-ml#899

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GridSearchCV and HalvingGridSearchCV remove validation from init and set_params #21880

GridSearchCV and HalvingGridSearchCV remove validation from init and set_params #21880

GridSearchCV and HalvingGridSearchCV remove validation from __init__ and set_params #21880

GridSearchCV and HalvingGridSearchCV remove validation from __init__ and set_params #21880

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GridSearchCV and HalvingGridSearchCV remove validation from init and set_params #21880

GridSearchCV and HalvingGridSearchCV remove validation from init and set_params #21880