[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEP loss_ attribute in gradient boosting #23079

Merged
merged 4 commits into from
Apr 12, 2022

Conversation

lorentzenchr
Copy link
Member

Reference Issues/PRs

None.

What does this implement/fix? Explain your changes.

This PR deprecates the attribute loss_ of GradientBoostingClassifier and GradientBoostingRegressor.

Any other comments?

This will greatly simplify using the common losses under sklearn._loss in (old) gradient boosting.

@lorentzenchr lorentzenchr added this to the 1.1 milestone Apr 8, 2022
Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecating loss_ was mentioned once here: #15139 (comment) I am happy with deprecating it as well.

LGTM

doc/whats_new/v1.1.rst Outdated Show resolved Hide resolved
@lorentzenchr
Copy link
Member Author

This will greatly simplify using the common losses under sklearn._loss in (old) gradient boosting.

Meaning, this improvement will have to wait until 1.3 (PR will be large enough without keeping backwards compat loss_)😬

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ogrisel ogrisel merged commit 0d669dc into scikit-learn:main Apr 12, 2022
@lorentzenchr lorentzenchr deleted the dep_loss_attribute_gb branch April 12, 2022 20:44
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Apr 29, 2022
@sam-s
Copy link
Contributor
sam-s commented May 31, 2022

@lorentzenchr
Copy link
Member Author

For most of them, your can use the equivalent function under sklearn.metrics. It also depends on your use case. What do you want to achieve?

@sam-s
Copy link
Contributor
sam-s commented Jun 13, 2022

For most of them, your can use the equivalent function under sklearn.metrics.

  1. It would be nice if it were clearly documented - e.g., figuring out the need for sigmoid is not obvious.
  2. Which metrics correspond to deviance and exponential?

It also depends on your use case. What do you want to achieve?

Please see the link in the original comment for code sample.

@lorentzenchr
Copy link
Member Author

I meant: What is your use case? In words, not in code.
And sure, improving the docs would be great. Any volunteers?

Note that HistGradientBoostingRegressor and HistGradientBoostingClassifier might be better suited than the old GBDTs for large sample sizes. There, the (inverse) link functions are documented.

@sam-s
Copy link
Contributor
sam-s commented Jun 14, 2022

the code allows to see the loss as a function of stage, which lets me figure out the optimal number of estimators.

@sam-s
Copy link
Contributor
sam-s commented Jun 14, 2022

Also, I see nothing like _loss in HistGradientBoostingClassifier - except maybe that since the only loss that is supported is log_loss, my code would work there too.

@lorentzenchr
Copy link
Member Author

the code allows to see the loss as a function of stage, which lets me figure out the optimal number of estimators.

Note that this would only give the training error. A better strategy is looking at the test/validation error, preferably with cross validation, e.g. with GridSearchCV.

@sam-s
Copy link
Contributor
sam-s commented Jun 14, 2022

the code allows to see the loss as a function of stage, which lets me figure out the optimal number of estimators.

Note that this would only give the training error. A better strategy is looking at the test/validation error, preferably with cross validation, e.g. with GridSearchCV.

I am passing test_X to staged_decision_function, not train_X.

@sam-s
Copy link
Contributor
sam-s commented Jun 16, 2022

preferably with cross validation, e.g. with GridSearchCV.

GridSearchCV is way too expensive compared to my method.

sebp added a commit to sebp/scikit-survival that referenced this pull request Aug 2, 2022
sebp added a commit to sebp/scikit-survival that referenced this pull request Aug 7, 2022
sebp added a commit to sebp/scikit-survival that referenced this pull request Aug 13, 2022
sebp added a commit to sebp/scikit-survival that referenced this pull request Aug 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants