[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GridSearchCV and HalvingGridSearchCV remove validation from __init__ and set_params #21880

Merged
merged 39 commits into from
Dec 15, 2021
Merged

Conversation

MrinalTyagi
Copy link
Contributor

Reference Issues/PRs

Fixes #21406

What does this implement/fix? Explain your changes.

All test working for HalvingGridSearchCV. Working on GridSearchCV

Any other comments?

@MrinalTyagi MrinalTyagi changed the title Remove validation from __init__ and set_params from GridSearchCV and HalvingGridSearchCV GridSearchCV and HalvingGridSearchCV remove validation from __init__ and set_params from Dec 4, 2021
@MrinalTyagi MrinalTyagi changed the title GridSearchCV and HalvingGridSearchCV remove validation from __init__ and set_params from GridSearchCV and HalvingGridSearchCV remove validation from __init__ and set_params Dec 5, 2021
@MrinalTyagi
Copy link
Contributor Author

@glemaitre @ogrisel Could you please guide me on how to fix the failing tests?

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this, I think we should adapt the ParamGrid.__init__ constructor to consolidate it with the missing checks of _check_param_grid and remove the _check_param_grid private helper entirely.

Note that the existing code in ParamGrid.__init__ tend to raise TypeError while approximately matching statements in _check_param_grid would instead raise ValueError. I am not sure if others agree but whenever the condition if checking with isinstance clauses, I think we should raise TypeError. For the clause that checks len(v) == 0, I think it's correct to raise ValueError.

This means that this code will introduce an additional behavior change for cases where we will raise TypeError instead of ValueError but I think this is fine because we are introducing a behavior change anyway.

sklearn/model_selection/tests/test_search.py Outdated Show resolved Hide resolved
@ogrisel
Copy link
Member
ogrisel commented Dec 6, 2021

Forgot to reply to:

@glemaitre @ogrisel Could you please guide me on how to fix the failing tests?

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=35734&view=logs&j=78a0bf4f-79e5-5387-94ec-13e67d216d6e&t=dbc16e8a-e84c-5cb6-14f6-590d872d6e0b

E           ValueError: 
E           
E           /home/vsts/work/1/s/sklearn/model_selection/_search.py
E           
E           RandomizedSearchCV.fit(self, X, y=None, *, groups=None, **fit_params)
E           
E           
E           
E           # Errors
E           
E            - GL08: The object does not have a docstring

This is because you introduced a new public function (RandomizedSearchCV.fit) without a docstring. But if you consider my suggestion above, this problem will go away because there would be no need to introduce a new public method on the base class.

@MrinalTyagi
Copy link
Contributor Author

Forgot to reply to:

@glemaitre @ogrisel Could you please guide me on how to fix the failing tests?

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=35734&view=logs&j=78a0bf4f-79e5-5387-94ec-13e67d216d6e&t=dbc16e8a-e84c-5cb6-14f6-590d872d6e0b

E           ValueError: 
E           
E           /home/vsts/work/1/s/sklearn/model_selection/_search.py
E           
E           RandomizedSearchCV.fit(self, X, y=None, *, groups=None, **fit_params)
E           
E           
E           
E           # Errors
E           
E            - GL08: The object does not have a docstring

This is because you introduced a new public function (RandomizedSearchCV.fit) without a docstring. But if you consider my suggestion above, this problem will go away because there would be no need to introduce a new public method on the base class.

surely will be proceeding with your suggestion.

@MrinalTyagi
Copy link
Contributor Author

@ogrisel I have tried to make changes to ParameterGrid in a way you asked. Let me know if I was able to replicate the behaviour you told me to. Also in case this works I think, there should be changes to tests for the same regarding the error message that its currently testing to compare using pytest.

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me but the change should be documented in the changelog as done for the other PRs related to #21406.

I am surprised the tests for HalvingGridSearchCV did not require a similar update but no big deal.

sklearn/model_selection/tests/test_search.py Outdated Show resolved Hide resolved
@@ -448,16 +449,18 @@ def test_grid_search_bad_param_grid():
" Single values need to be wrapped in a list"
" with one element."
)
search = GridSearchCV(clf, param_dict)
with pytest.raises(ValueError, match=error_msg):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be updated expect a TypeError with a new expected error message. Launch the test locally to see the failure of this specific test more quickly on your local machine before pushing:

pytest sklearn/model_selection/tests/test_search.py -k test_grid_search_bad_param_grid

@MrinalTyagi
Copy link
Contributor Author

@ogrisel I was not able to figure out how to update the test despite several tries. Will it be possible for you to update the error messages?

@ogrisel
Copy link
Member
ogrisel commented Dec 7, 2021

@ogrisel I was not able to figure out how to update the test despite several tries. Will it be possible for you to update the error messages?

@MrinalTyagi I will have a more detailed look at the first case to get you started and then let you do the others.

As a side note, please used dedicated named branch (different from main) to create your pull request. Putting your local changes into a branch named main will be the cause of endless confusion when using git to synchronize your repo with upstream changes. No need to change anything for this PR, but next time use:

Assuming that upstream is your short name for the https://github.com/scikit-learn/scikit-learn remote repository (you can check with the git remote -v command), then you can sync-up your local repo and create a branch off of the upstream main branch using:

git fetch upstream main
git checkout -b branch-dedicated-to-some-bugfix upstream/main

You obviously need to replace branch-dedicated-to-some-bugfix by some meaningful short name that concisely though specifically relates to the purpose of your branch.

@ogrisel
Copy link
Member
ogrisel commented Dec 7, 2021

@MrinalTyagi I will have a more detailed look at the first case to get you started and then let you do the others.

I pushed 0f1e162 with what I think it is a good consolidation of the previous checks of _check_param_grid into the constructor of ParamGrid. I took the opportunity to improve the error messages to make them slightly more explicit and use TypeError or ValueError depending on whether the type or the value of a parameter is invalid.

I also fixed the first assertion. Please fetch this new commit in your local main branch (since this is unfortunately the one you used to work on this PR):

git pull origin main

and then run the tests as:

pytest -vl -k test_grid_search_bad_param_grid  sklearn/model_selection/tests/test_search.py

then fix the next failing assertion with an updated error message and/or exception type and iterate.

@ogrisel
Copy link
Member
ogrisel commented Dec 7, 2021

Once you are done with test_grid_search_bad_param_grid then move on to test_validate_parameter_input that will require similar updates.

@MrinalTyagi
Copy link
Contributor Author

Once you are done with test_grid_search_bad_param_grid then move on to test_validate_parameter_input that will require similar updates.

Done with the test_grid_search_bad_param_grid but finding difficulty in test_validate_parameter_input

@ogrisel
Copy link
Member
ogrisel commented Dec 7, 2021

The test_validate_parameter_input test uses parametrized tests where the third argument (error_message) is the regular expression pattern passed to the pytest exception matcher in the body of the test function.

If you are not familiar with regexps, you at least need to know that the sub-pattern .* can match any substring of any length in a regexp. Parenthesis and angle brackets have a special meaning and need to be escaped with a \: for instance the r"\[\]" pattern will match the string representation of an empty list:"[]".

Note that, in Python, regular expressions patterns are typically written using "raw" Python string literals (r"this is a raw string" instead of "this is a regular string") to avoid having the to double \ characters that are often used in regexp patterns.

If you don't want, to write regexp patterns, you can alternatively copy the full expected message for the tuple entry that defines the error_message parameter by calling re.escape("The full error message here"). Rely on black to automatically reformat such complex multiline expressions.

@MrinalTyagi
Copy link
Contributor Author

@ogrisel Updated the error message to give more info as well as in the third test for test_validate_parameter_input, there is very less similarity between parameter sampler and parameter grid so added a .*

@MrinalTyagi
Copy link
Contributor Author

@ogrisel Please merge. Thanks for the great mentoring.

@ogrisel
Copy link
Member
ogrisel commented Dec 9, 2021

@ogrisel Please merge.

For such a complex PR we will need for a second reviewer.

Thanks for the great mentoring.

Thanks for your patience ;) I think that the error messages are much improved and more consistent, on top of the original purpose of the PR to perform the checks only in fit instead of __init__.

@ogrisel
Copy link
Member
ogrisel commented Dec 9, 2021

Just so that this does not get forgotten: there is a remaining comment to address:

#21880 (comment)

Other than that, LGTM.

@MrinalTyagi
Copy link
Contributor Author

Just so that this does not get forgotten: there is a remaining comment to address:

#21880 (comment)

Other than that, LGTM.

Thank you for the reminder. Forgot that by mistake. Have updated that now. Hopefully, it looks as you expected it to be now.

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @MrinalTyagi !

sklearn/model_selection/_search.py Outdated Show resolved Hide resolved
MrinalTyagi and others added 2 commits December 10, 2021 03:58
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, otherwise LGTM!

sklearn/model_selection/_search.py Outdated Show resolved Hide resolved
sklearn/model_selection/_search.py Outdated Show resolved Hide resolved
@MrinalTyagi
Copy link
Contributor Author

@ogrisel @thomasjpfan I have written that code due to which conflicts were happening. Do I remove that code now after these tests?

@MrinalTyagi
Copy link
Contributor Author
MrinalTyagi commented Dec 11, 2021

@ogrisel @thomasjpfan Do merge

@glemaitre glemaitre merged commit 7774697 into scikit-learn:main Dec 15, 2021
@glemaitre
Copy link
Member

Merging since 2 approvals and after making a quick check. Thanks @MrinalTyagi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove validation from __init__ and set_params
4 participants