[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Config: change default display to "diagram" #22856

Merged
merged 6 commits into from
Mar 21, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 9 additions & 6 deletions doc/modules/compose.rst
Original file line number Diff line number Diff line change
Expand Up @@ -567,14 +567,17 @@ will use the column names to select the columns::
Visualizing Composite Estimators
================================

Estimators can be displayed with a HTML representation when shown in a
jupyter notebook. This can be useful to diagnose or visualize a Pipeline with
many estimators. This visualization is activated by setting the
`display` option in :func:`~sklearn.set_config`::
Estimators are displayed with an HTML representation when shown in a
jupyter notebook. This is useful to diagnose or visualize a Pipeline with
many estimators. This visualization is activated by default::

>>> column_trans # doctest: +SKIP

It can be deactivated by setting the `text` option in :func:`~sklearn.set_config`::
jeremiedbb marked this conversation as resolved.
Show resolved Hide resolved

>>> from sklearn import set_config
>>> set_config(display='diagram') # doctest: +SKIP
>>> # displays HTML representation in a jupyter context
>>> set_config(display='text') # doctest: +SKIP
>>> # displays text representation in a jupyter context
>>> column_trans # doctest: +SKIP

An example of the HTML output can be seen in the
Expand Down
5 changes: 5 additions & 0 deletions doc/whats_new/v1.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -875,6 +875,11 @@ Changelog
- |Fix| :func:`utils.estimator_html_repr` has an improved visualization for nested
meta-estimators. :pr:`21310` by `Thomas Fan`_.

- |API| Rich html representation of estimators is now enabled by default in Jupyter
notebooks. It can be deactivated by setting `display='text'` in
:func:`~sklearn.set_config`.
:pr:`22856` by `Jérémie du Boisberranger <jeremiedbb>`.

Code and Documentation Contributors
-----------------------------------

Expand Down
17 changes: 9 additions & 8 deletions examples/compose/plot_column_transformer_mixed_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
#
# License: BSD 3 clause

# %%
import numpy as np

from sklearn.compose import ColumnTransformer
Expand All @@ -40,6 +41,7 @@

np.random.seed(0)

# %%
# Load data from https://www.openml.org/d/40945
X, y = fetch_openml("titanic", version=1, as_frame=True, return_X_y=True)

Expand All @@ -49,7 +51,7 @@

# %%
# Use ``ColumnTransformer`` by selecting column by names
###############################################################################
#
# We will train our classifier with the following features:
#
# Numeric Features:
Expand Down Expand Up @@ -82,6 +84,7 @@
]
)

# %%
# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf = Pipeline(
Expand All @@ -95,17 +98,14 @@

# %%
# HTML representation of ``Pipeline`` (display diagram)
###############################################################################
#
# When the ``Pipeline`` is printed out in a jupyter notebook an HTML
# representation of the estimator is displayed as follows:
from sklearn import set_config

set_config(display="diagram")
# representation of the estimator is displayed:
clf

# %%
# Use ``ColumnTransformer`` by selecting column by data types
###############################################################################
#
# When dealing with a cleaned dataset, the preprocessing can be automatic by
# using the data types of the column to decide whether to treat a column as a
# numerical or categorical feature.
Expand Down Expand Up @@ -150,6 +150,7 @@

clf.fit(X_train, y_train)
print("model score: %.3f" % clf.score(X_test, y_test))
clf

# %%
# The resulting score is not exactly the same as the one from the previous
Expand All @@ -164,7 +165,7 @@

# %%
# Using the prediction pipeline in a grid search
##############################################################################
#
# Grid search can also be performed on the different preprocessing steps
# defined in the ``ColumnTransformer`` object, together with the classifier's
# hyperparameters as part of the ``Pipeline``.
Expand Down
4 changes: 0 additions & 4 deletions examples/ensemble/plot_feature_transformation.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,6 @@
#
# License: BSD 3 clause

from sklearn import set_config

set_config(display="diagram")

# %%
# First, we will create a large dataset and split it into three sets:
#
Expand Down
4 changes: 0 additions & 4 deletions examples/ensemble/plot_stack_predictors.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,6 @@
# Maria Telenczuk <https://github.com/maikia>
# License: BSD 3 clause

from sklearn import set_config

set_config(display="diagram")

# %%
# Download the dataset
##############################################################################
Expand Down
4 changes: 0 additions & 4 deletions examples/feature_selection/plot_feature_selection_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,6 @@

"""

from sklearn import set_config

set_config(display="diagram")

# %%
# We will start by generating a binary classification dataset. Subsequently, we
# will divide the dataset into two subsets.
Expand Down
5 changes: 0 additions & 5 deletions examples/linear_model/plot_lasso_lars_ic.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,6 @@
# Guillaume Lemaitre
# License: BSD 3 clause

# %%
import sklearn

sklearn.set_config(display="diagram")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: it seems that we actually do not display the pipeline diagram in this example but I think this is fine.

# %%
# We will use the diabetes dataset.
from sklearn.datasets import load_diabetes
Expand Down
5 changes: 0 additions & 5 deletions examples/linear_model/plot_lasso_model_selection.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,6 @@
# Guillaume Lemaitre
# License: BSD 3 clause

# %%
import sklearn

sklearn.set_config(display="diagram")

# %%
# Dataset
# -------
Expand Down
36 changes: 10 additions & 26 deletions examples/miscellaneous/plot_pipeline_display.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
Displaying Pipelines
=================================================================

The default configuration for displaying a pipeline is `'text'` where
`set_config(display='text')`. To visualize the diagram in Jupyter Notebook,
use `set_config(display='diagram')` and then output the pipeline object.
The default configuration for displaying a pipeline in a Jupyter Notebook is
`'diagram'` where `set_config(display='diagram')`. To deactivate HTML representation,
use `set_config(display='text')`.

To see more detailed steps in the visualization of the pipeline, click on the
steps in the pipeline.
Expand All @@ -31,14 +31,18 @@
pipe = Pipeline(steps)

# %%
# To view the text pipeline, the default is `display='text'`.
# To visualize the diagram, the default is `display='diagram'`.
set_config(display="diagram")
pipe # click on the diagram below to see the details of each step

# %%
# To view the text pipeline, change to `display='text'`.
set_config(display="text")
pipe

# %%
# To visualize the diagram, change `display='diagram'`.
# Put back the default display
set_config(display="diagram")
pipe # click on the diagram below to see the details of each step

# %%
# Displaying a Pipeline Chaining Multiple Preprocessing Steps & Classifier
Expand All @@ -52,18 +56,13 @@
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import LogisticRegression
from sklearn import set_config

steps = [
("standard_scaler", StandardScaler()),
("polynomial", PolynomialFeatures(degree=3)),
("classifier", LogisticRegression(C=2.0)),
]
pipe = Pipeline(steps)

# %%
# To visualize the diagram, change to display='diagram'
set_config(display="diagram")
pipe # click on the diagram below to see the details of each step

# %%
Expand All @@ -77,14 +76,9 @@
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn import set_config

steps = [("reduce_dim", PCA(n_components=4)), ("classifier", SVC(kernel="linear"))]
pipe = Pipeline(steps)

# %%
# To visualize the diagram, change to `display='diagram'`.
set_config(display="diagram")
pipe # click on the diagram below to see the details of each step

# %%
Expand All @@ -102,7 +96,6 @@
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn import set_config

numeric_preprocessor = Pipeline(
steps=[
Expand All @@ -129,10 +122,6 @@
)

pipe = make_pipeline(preprocessor, LogisticRegression(max_iter=500))

# %%
# To visualize the diagram, change to `display='diagram'`
set_config(display="diagram")
pipe # click on the diagram below to see the details of each step

# %%
Expand All @@ -151,7 +140,6 @@
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn import set_config

numeric_preprocessor = Pipeline(
steps=[
Expand Down Expand Up @@ -189,8 +177,4 @@
}

grid_search = GridSearchCV(pipe, param_grid=param_grid, n_jobs=1)

# %%
# To visualize the diagram, change to `display='diagram'`.
set_config(display="diagram")
grid_search # click on the diagram below to see the details of each step
6 changes: 3 additions & 3 deletions sklearn/_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"assume_finite": bool(os.environ.get("SKLEARN_ASSUME_FINITE", False)),
"working_memory": int(os.environ.get("SKLEARN_WORKING_MEMORY", 1024)),
"print_changed_only": True,
"display": "text",
"display": "diagram",
"pairwise_dist_chunk_size": int(
os.environ.get("SKLEARN_PAIRWISE_DIST_CHUNK_SIZE", 256)
),
Expand Down Expand Up @@ -85,7 +85,7 @@ def set_config(
display : {'text', 'diagram'}, default=None
If 'diagram', estimators will be displayed as a diagram in a Jupyter
lab or notebook context. If 'text', estimators will be displayed as
text. Default is 'text'.
text. Default is 'diagram'.

.. versionadded:: 0.23

Expand Down Expand Up @@ -173,7 +173,7 @@ def config_context(
If 'diagram', estimators will be displayed as a diagram in a Jupyter
lab or notebook context. If 'text', estimators will be displayed as
text. If None, the existing value won't change.
The default value is 'text'.
The default value is 'diagram'.

.. versionadded:: 0.23

Expand Down
19 changes: 10 additions & 9 deletions sklearn/tests/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -539,24 +539,25 @@ def test_repr_mimebundle_():
tree = DecisionTreeClassifier()
output = tree._repr_mimebundle_()
assert "text/plain" in output
assert "text/html" not in output
assert "text/html" in output

with config_context(display="diagram"):
with config_context(display="text"):
output = tree._repr_mimebundle_()
assert "text/plain" in output
assert "text/html" in output
assert "text/html" not in output


def test_repr_html_wraps():
# Checks the display configuration flag controls the html output
tree = DecisionTreeClassifier()
msg = "_repr_html_ is only defined when"
with pytest.raises(AttributeError, match=msg):
output = tree._repr_html_()

with config_context(display="diagram"):
output = tree._repr_html_()
assert "<style>" in output
output = tree._repr_html_()
assert "<style>" in output

with config_context(display="text"):
msg = "_repr_html_ is only defined when"
with pytest.raises(AttributeError, match=msg):
output = tree._repr_html_()


def test_n_features_in_validation():
Expand Down
6 changes: 3 additions & 3 deletions sklearn/tests/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ def test_config_context():
"assume_finite": False,
"working_memory": 1024,
"print_changed_only": True,
"display": "text",
"display": "diagram",
"pairwise_dist_chunk_size": 256,
"enable_cython_pairwise_dist": True,
}
Expand All @@ -27,7 +27,7 @@ def test_config_context():
"assume_finite": True,
"working_memory": 1024,
"print_changed_only": True,
"display": "text",
"display": "diagram",
"pairwise_dist_chunk_size": 256,
"enable_cython_pairwise_dist": True,
}
Expand Down Expand Up @@ -58,7 +58,7 @@ def test_config_context():
"assume_finite": False,
"working_memory": 1024,
"print_changed_only": True,
"display": "text",
"display": "diagram",
"pairwise_dist_chunk_size": 256,
"enable_cython_pairwise_dist": True,
}
Expand Down
2 changes: 1 addition & 1 deletion sklearn/utils/tests/test_pprint.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from sklearn.pipeline import make_pipeline
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.feature_selection import SelectKBest, chi2
from sklearn import set_config, config_context
from sklearn import config_context


# Ignore flake8 (lots of line too long issues)
Expand Down