This page describes how to configure batch prediction job requests to include one-time Model Monitoring analysis. For batch predictions, Model Monitoring supports feature skew detection for categorical and numerical input features.
To create a batch prediction job with Model Monitoring skew analysis, you must include both your batch prediction input data and original training data for your model in the request. You can only add Model Monitoring analysis when creating new batch prediction jobs.
For more information about skew, see Introduction to Model Monitoring.
For instructions on how to set up Model Monitoring for online (real-time) predictions, see Using Model Monitoring.
Prerequisites
To use Model Monitoring with batch predictions, complete the following:
Have an available model in Vertex AI Model Registry that is either a tabular AutoML or tabular custom training type.
Upload your training data to Cloud Storage or BigQuery and obtain the URI link to the data.
- For models trained with AutoML, you can use the dataset id for your training dataset instead.
Model Monitoring compares the training data to the batch prediction output. Make sure you use supported file formats for the training data and batch prediction output:
Model type Training data Batch prediction output Custom-trained CSV, JSONL, BigQuery, TfRecord(tf.train.Example) JSONL AutoML tabular CSV, JSONL, BigQuery, TfRecord(tf.train.Example) CSV, JSONL, BigQuery, TfRecord(Protobuf.Value) Optional: For custom-trained models, upload the schema for your model to Cloud Storage. Model Monitoring requires the schema to calculate the baseline distribution for skew detection.
Request a batch prediction
You can use the following methods to add Model Monitoring configurations to batch prediction jobs:
Console
Follow the instructions to make a batch prediction request with Model Monitoring enabled:
REST API
Follow the instructions to make a batch prediction request using the REST API:
When you create the batch prediction request, add the following Model Monitoring configuration to the request JSON body:
"modelMonitoringConfig": { "alertConfig": { "emailAlertConfig": { "userEmails": "EMAIL_ADDRESS" }, "notificationChannels": [NOTIFICATION_CHANNELS] }, "objectiveConfigs": [ { "trainingDataset": { "dataFormat": "csv", "gcsSource": { "uris": [ "TRAINING_DATASET" ] } }, "trainingPredictionSkewDetectionConfig": { "skewThresholds": { "FEATURE_1": { "value": VALUE_1 }, "FEATURE_2": { "value": VALUE_2 } } } } ] }
where:
EMAIL_ADDRESS is the email address where you want to receive alerts from Model Monitoring. For example,
example@example.com
.NOTIFICATION_CHANNELS: a list of Cloud Monitoring notification channels where you want to receive alerts from Model Monitoring. Use the resource names for the notification channels, which you can retrieve by listing the notification channels in your project. For example,
"projects/my-project/notificationChannels/1355376463305411567", "projects/my-project/notificationChannels/1355376463305411568"
.TRAINING_DATASET is the link to the training dataset stored in Cloud Storage.
- To use a link to a BigQuery training dataset, replace the
the
gcsSource
field with the following:
"bigquerySource": { { "inputUri": "TRAINING_DATASET" } }
- To use a link to an AutoML model, replace the
gcsSource
field with the following:
"dataset": "TRAINING_DATASET"
- To use a link to a BigQuery training dataset, replace the
the
FEATURE_1:VALUE_1 and FEATURE_2:VALUE_2 is the alerting threshold for each feature you want to monitor. For example, if you specify
Age=0.4
, Model Monitoring logs an alert when the statistical distance between the input and baseline distributions for theAge
feature exceeds 0.4. By default, every categorical and numerical feature is monitored with threshold values of 0.3.
For more information about Model Monitoring configurations, see the Monitoring job reference.
Python
See the example notebook to run a batch prediction job with Model Monitoring for a custom tabular model.
Model Monitoring automatically notifies you of job updates and alerts through email.
Access skew metrics
You can use the following methods to access skew metrics for batch prediction jobs:
Console (Histogram)
Use the Google Cloud console to view the feature distribution histograms for each monitored feature and learn which changes led to skew over time:
Go to the Batch predictions page:
On the Batch predictions page, click the batch prediction job you want to analyze.
Click the Model Monitoring Alerts tab to view a list of the model's input features, along with pertinent information, such as the alert threshold for each feature.
To analyze a feature, click the name of the feature. A page shows the feature distribution histograms for that feature.
Visualizing data distribution as histograms lets you quickly focus on the changes that occurred in the data. Afterward, you might decide to adjust your feature generation pipeline or retrain the model.
Console (JSON file)
Use the Google Cloud console to access the metrics in JSON format:
Go to the Batch predictions page:
Click on the name of the batch prediction monitoring job.
Click the Monitoring properties tab.
Click the Monitoring output directory link, which directs you to a Cloud Storage bucket.
Click on the
metrics/
folder.Click on the
skew/
folder.Click on the
feature_skew.json
file, which directs you to the Object details page.Open the JSON file using either option:
Click Download and open the file in your local text editor.
Use the gsutil URI path to run
gcloud storage cat gsutil_URI
in the Cloud Shell or your local terminal.
The feature_skew.json
file includes a dictionary where the key is the
feature name and the value is the feature skew. For example:
{ "cnt_ad_reward": 0.670936, "cnt_challenge_a_friend": 0.737924, "cnt_completed_5_levels": 0.549467, "month": 0.293332, "operating_system": 0.05758, "user_pseudo_id": 0.1 }
Python
See the example notebook to access skew metrics for a custom tabular model after running a batch prediction job with Model Monitoring.
Debug batch prediction monitoring failures
If your batch prediction monitoring job fails, you can find debugging logs in the Google Cloud console:
Go to the Batch predictions page.
Click the name of the failed batch prediction monitoring job.
Click the Monitoring properties tab.
Click the Monitoring output directory link, which directs you to a Cloud Storage bucket.
Click the
logs/
folder.Click either of the
.INFO
files, which directs you to the Object details page.Open the logs file using either option:
Click Download and open the file in your local text editor.
Use the gsutil URI path to run
gcloud storage cat gsutil_URI
in the Cloud Shell or your local terminal.
Notebook tutorials
Learn more about how to use Vertex AI Model Monitoring to get visualizations and statistics for models with these end-to-end tutorials.
AutoML
- Vertex AI Model Monitoring for AutoML tabular models
- Vertex AI Model Monitoring for batch prediction in AutoML image models
- Vertex AI Model Monitoring for online prediction in AutoML image models
Custom
- Vertex AI Model Monitoring for custom tabular models
- Vertex AI Model Monitoring for custom tabular models with TensorFlow Serving container
XGBoost models
Vertex Explainable AI feature attributions
Batch prediction
Setup for tabular models
What's next
- Learn how to use Model Monitoring.
- Learn how Model Monitoring calculates training-serving skew and prediction drift.