-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix non-idempotent prediction due to in-place update to model-config #11014
Conversation
In predict() method of _TransformersWrapper, self.model_config is modified in-place a few times, like update(), pop(). Then next prediction call will see the different model_config tweaked by the first call. As we run multiple prediction during model saving, this also causes that the saved model_config is not equal to what user specifies. Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Documentation preview for 43e1642 will be available here when this CircleCI job completes successfully. More info
|
data=example, | ||
model_config=model_config, | ||
params=params, | ||
flavor_config=flavor_config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB: This is a fix for another small bug - flavor_config needs to be passed for inferring signature (otherwise logging an InstructionalPipeline with input_example will raise NPE).
# deep copy of the original model_config that was specified by the user, otherwise the | ||
# prediction won't be idempotent. Hence we creates an immutable dictionary of the original | ||
# model config here and enforce creating a deep copy at every predict call. | ||
self.model_config = MappingProxyType(model_config or {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice use of the read-only dict wrapper!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great fix for an insidious bug. Marked for inclusion in 2.10.1 patch
Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com>
Co-authored-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com> Signed-off-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com>
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…lflow#11014) Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com> Signed-off-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com> Co-authored-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com>
…11014) Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com> Signed-off-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com> Co-authored-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com>
…lflow#11014) Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> Signed-off-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com> Signed-off-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com> Co-authored-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com> Signed-off-by: lu-wang-dl <lu.wang@databricks.com>
🛠 DevTools 🛠
Install mlflow from this PR
Checkout with GitHub CLI
What changes are proposed in this pull request?
In
predict()
method of_TransformersWrapper
,self.model_config
is modified in-place for a few times, e.g. (1) params overrides model config values (2) somepop()
operations to remove our custom keys likeinclude_prompt
before actual pipeline inference. These updates directly change the original model config, so as a result, the next prediction uses the different model config tweaked by the first call.E.g.
This also cause another issue in model logging, where incorrect model config is saved. This is because we run sometimes multiple prediction during model saving e.g. when user specified
input_example
we generate model output to infer signature from it. Then the saved model config is the one tweaked by these predictions.Note: I hit this issue while development for another issue and running
test_instructional_pipeline_no_prompt_in_output
test, which is skipped in CI (due to large model size). Other CI tests didn't catch this, because most of them specify signature manually + not giving input example so only single prediction call only in the test case.How is this PR tested?
Does this PR require documentation update?
We probably need to update some document to warn users for this model config issue, as we suggest users to use the option (example). I can do that as a follow-up.
Release Notes
Is this a user-facing change?
Fix idempotent prediction issue due to in-place update to model_config, which can also cause incorrect metadata to be saved.
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrationsarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes