You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Flow to link Kepler-deploying node specification to model selection from Kepler model DB.
Why is this needed?
Problem description
As previously, we have only a single node_type in the pipeline. We always put _1 after the trainer name to get the model name. However, with SPECPower and AWS instances, we can now train multiple node_type.
Currently, we have a function generate_spec to generate machine spec implemented in python on kepler-model-server.
Idea
The thing to do is to let Kepler determine know its node_type.
The logic of generate_spec may not need to merge into inside Kepler.
It can run in init container to generate spec and save to a file to mount. Server API may need to update to allow adding machine spec inside the request to select the model.
Note that,
node_type is per pipeline determined by node_type_index.json inside the pipeline folder.
we can set default pipeline to spec_benchmark for acpi value and aws_instance_pipeline for rapl value.
The text was updated successfully, but these errors were encountered:
Now, working on adding a simple logic on estimator to discover a core number and find the candidate models that built by the machine with the same number of cores. If not exists, list the candidates that have the largest number of cores.
The change needed is the ModelRequest to also add spec field to the request to server-api.
What would you like to be added?
Flow to link Kepler-deploying node specification to model selection from Kepler model DB.
Why is this needed?
Problem description
As previously, we have only a single node_type in the pipeline. We always put _1 after the trainer name to get the model name. However, with SPECPower and AWS instances, we can now train multiple node_type.
Currently, we have a function generate_spec to generate machine spec implemented in python on kepler-model-server.
Idea
The thing to do is to let Kepler determine know its node_type.
The logic of generate_spec may not need to merge into inside Kepler.
It can run in init container to generate spec and save to a file to mount. Server API may need to update to allow adding machine spec inside the request to select the model.
Note that,
node_type_index.json
inside the pipeline folder.The text was updated successfully, but these errors were encountered: