Find spec/node_type of Kepler node for model selection #231

sunya-ch · 2024-02-22T07:15:51Z

What would you like to be added?

Flow to link Kepler-deploying node specification to model selection from Kepler model DB.

Why is this needed?

Problem description

As previously, we have only a single node_type in the pipeline. We always put _1 after the trainer name to get the model name. However, with SPECPower and AWS instances, we can now train multiple node_type.

Currently, we have a function generate_spec to generate machine spec implemented in python on kepler-model-server.

Idea

The thing to do is to let Kepler determine know its node_type.
The logic of generate_spec may not need to merge into inside Kepler.
It can run in init container to generate spec and save to a file to mount. Server API may need to update to allow adding machine spec inside the request to select the model.

Note that,

node_type is per pipeline determined by node_type_index.json inside the pipeline folder.
we can set default pipeline to spec_benchmark for acpi value and aws_instance_pipeline for rapl value.

The text was updated successfully, but these errors were encountered:

sunya-ch · 2024-03-28T12:34:24Z

Now, working on adding a simple logic on estimator to discover a core number and find the candidate models that built by the machine with the same number of cores. If not exists, list the candidates that have the largest number of cores.

The change needed is the ModelRequest to also add spec field to the request to server-api.

sunya-ch added the kind/feature New feature or request label Feb 22, 2024

sunya-ch self-assigned this Mar 28, 2024

sunya-ch mentioned this issue Jun 20, 2024

Documenting Kepler Metrics Validation framework sustainable-computing-io/kepler#1533

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find spec/node_type of Kepler node for model selection #231

Find spec/node_type of Kepler node for model selection #231

Find spec/node_type of Kepler node for model selection #231

Find spec/node_type of Kepler node for model selection #231

Comments

What would you like to be added?

Why is this needed?

Problem description

Idea