[go: nahoru, domu]

Page MenuHomePhabricator

Deploy RR-language-agnostic batch version to prod
Closed, ResolvedPublic3 Estimated Story Points

Description

In T348536, we tested the kserve batcher on RR-language-agnostic in staging and confirmed its effectiveness. For this test, I used my branch of knowledge integrity, which adds new functions to RR-language-agnostic to support batch feature extraction and classification.

To deploy the RR-language-agnostic batch version to production, we need to merge these changes into the latest knowledge integrity release, v0.6.0. However, this is currently on hold due to a problem with Pydantic (see T355742). We will start this task once the issue is resolved.

Event Timeline

calbon set the point value for this task to 3.

I repost what I previously wrote here as the issue is more related to deployment.

@kevinbazira posed a question - how can end users switch between batch and non-batch requests?

First to clarify, the batch model can also handle single requests. For example, give this input:

{
    "instances": [
      {
        "lang": "en",
        "rev_id": 123456
      }
    ]
}

The main differences between the base model (currently in production) and the batch model (the new one) are:

  • The batch model supports multiple predictions in a single request.
  • The batch model uses a different input/output schema, required by the Kserve batcher.

Regarding how end users access the batch model, there are three options:

  1. Replace the current model with the batch model

I think this is the plan when we set up the goal T348153. The concern here is the input/output schema is a breaking change, that could impact downstream applications. Given that the Revert Risk-language agnostic model currently handles production traffic, we would need to notify downstream product owners and provide support as needed. This switch would also introduce some inconsistency among our Lift Wing models, as this model server would be the first one using a different input/output schema.

  1. Create a new endpoint for the batch model

We could add a new endpoint, such as /v1/models/revertrisk-language-agnostic-batch, and document the changed schema and usage examples on the model card, API Gateway doc, and Lift Wing doc. We would then inform end users about this new endpoint that they can use for requesting multiple predictions. However, this would bring us more maintenance work, as we basically provide two different services for the same model.

  1. Find a way to support both schemas in one endpoint

We could make the batch model backwards compatible with the current schema for single requests, but this would complicate our code, and the distinction between the base model and the batch model would become blurred, which is not desired. Alternatively, maybe there is a way to redirect batch requests to the batch isvc, which I'm not sure of its feasibility but that would be ideal.

At first I leaned towards the second option to avoid introducing a breaking change to our production service. However, upon further consideration, it seems excessive to create a new endpoint for the batch model.

What do people think about this?

Change #1020835 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] revertrisk: add support for base model's payloads in batch model

https://gerrit.wikimedia.org/r/1020835

Change #1020835 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk: add support for base model's payloads in batch model

https://gerrit.wikimedia.org/r/1020835

Change #1021966 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update batch revertrisk LA image in staging

https://gerrit.wikimedia.org/r/1021966

Change #1021966 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update batch revertrisk LA image in staging

https://gerrit.wikimedia.org/r/1021966

I got an error when testing the batch model after deploying the new image of kserve 0.12.1 for revert risk models

aikochou@deploy1002:~$ curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/revertrisk-language-agnostic:predict" -d@./input_some_succeed.json -H "Host: revertrisk-language-agnostic-batcher.revertrisk.wikimedia.org" --http1.1 -k | jq '.'
{
  "error": "AttributeError : 'JSONResponse' object has no attribute 'encode'"
}

It worked before. There may be a change in kserve 0.12.1 that's causing the problem. I'll debug this.

Change #1035012 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] revertrisk: modify the response to dict type in batch model

https://gerrit.wikimedia.org/r/1035012

Change #1035012 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] revertrisk: modify the response type in batch model

https://gerrit.wikimedia.org/r/1035012

Change #1038736 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update RevertRisk LA/ML/Wikidata's images

https://gerrit.wikimedia.org/r/1038736

Change #1038736 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update RevertRisk LA/ML/Wikidata's images

https://gerrit.wikimedia.org/r/1038736

The new revertrisk images have been deployed to production.

Next steps:

  • Update API documentation to inform users that they can request multiple predictions in a single request.