[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find spec/node_type of Kepler node for model selection #231

Open
sunya-ch opened this issue Feb 22, 2024 · 1 comment
Open

Find spec/node_type of Kepler node for model selection #231

sunya-ch opened this issue Feb 22, 2024 · 1 comment
Assignees
Labels
kind/feature New feature or request

Comments

@sunya-ch
Copy link
Contributor

What would you like to be added?

Flow to link Kepler-deploying node specification to model selection from Kepler model DB.

Why is this needed?

Problem description

As previously, we have only a single node_type in the pipeline. We always put _1 after the trainer name to get the model name. However, with SPECPower and AWS instances, we can now train multiple node_type.

Currently, we have a function generate_spec to generate machine spec implemented in python on kepler-model-server.

Idea

The thing to do is to let Kepler determine know its node_type.
The logic of generate_spec may not need to merge into inside Kepler.
It can run in init container to generate spec and save to a file to mount. Server API may need to update to allow adding machine spec inside the request to select the model.

Note that,

  • node_type is per pipeline determined by node_type_index.json inside the pipeline folder.
  • we can set default pipeline to spec_benchmark for acpi value and aws_instance_pipeline for rapl value.
@sunya-ch sunya-ch added the kind/feature New feature or request label Feb 22, 2024
@sunya-ch sunya-ch self-assigned this Mar 28, 2024
@sunya-ch
Copy link
Contributor Author

Now, working on adding a simple logic on estimator to discover a core number and find the candidate models that built by the machine with the same number of cores. If not exists, list the candidates that have the largest number of cores.

The change needed is the ModelRequest to also add spec field to the request to server-api.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant