[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataproc: add support for the spark autotuning feature preview #12789

Open
1 task done
khaledh opened this issue Jun 10, 2024 · 0 comments
Open
1 task done

Dataproc: add support for the spark autotuning feature preview #12789

khaledh opened this issue Jun 10, 2024 · 0 comments
Labels
api: dataproc Issues related to the Dataproc API. triage me I really want to be triaged. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@khaledh
Copy link
khaledh commented Jun 10, 2024

Determine this is the right repository

  • I determined this is the correct repository in which to report this feature request.

Summary of the feature request

The Dataproc API seems to support the Spark autotuning feature preview, as documented here:
https://cloud.google.com/dataproc-serverless/docs/concepts/autotuning#dataproc_serverless_autotuning-api

We'd like to make use of this feature through the Dataproc Python SDK.

Desired code experience

file: main.py

    pyspark_batch = dataproc.types.PySparkBatch(
        main_python_file_uri=...,
        args=...,
        python_file_uris=...,
        jar_file_uris=...,
        archive_uris=[...],
    )
    batch = dataproc.types.Batch(
        pyspark_batch=pyspark_batch,
        runtime_config=dataproc.types.RuntimeConfig(
            version=...,
            properties=...,
            autotuning_config=dataproc.types.AutotuningConfig(    ###
                cohort="my-cohort",                               #  new
                scenarios=["SCALING", "BHJ", "MEMORY"],           #  config
            ),                                                    ###
        ),
    )

    request = dataproc.CreateBatchRequest(
        parent=...,
        batch=batch,
    )

    client = dataproc.BatchControllerClient()
    op = client.create_batch(request=request)

Expected results

A Dataproc Serverless batch is created with autotuning enabled, and the cohort and scenarios set as indicated.

API client name and version

google-cloud-dataproc

@khaledh khaledh added triage me I really want to be triaged. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Jun 10, 2024
@product-auto-label product-auto-label bot added the api: dataproc Issues related to the Dataproc API. label Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: dataproc Issues related to the Dataproc API. triage me I really want to be triaged. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

1 participant