[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextToSpeechLongAudioSynthesizeClient AudioConfig does not honour speakingRate nor any settings #4148

Closed
nickaws opened this issue Apr 3, 2023 · 4 comments
Assignees
Labels
priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue.

Comments

@nickaws
Copy link
nickaws commented Apr 3, 2023

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please be sure to include as much information as possible:

Environment details

  • which product (packages/*): @google-cloud/text-to-speech"
  • OS: Ventura 13.0.1
  • Node.js version: v19.8.1
  • npm version: 9.5.1
  • google-cloud-node version: 4.2.1

Steps to reproduce

Please include any and all code and/or steps related to reproducing the bug.



// Copyright 2023 Google LLC
(from google examples)


'use strict';

function main() {

    const input = { text: "Hey there! I'd be delighted to share with you the latest advancements in car navigation systems. There's so much to talk about, so buckle up and let's get started!" }
    /**
     *  Required. The configuration of the synthesized audio.
     */
    const audioConfig = {
        audioEncoding: "MP3",
        speakingRate: "0.5"
    }
    /**
     *  Specifies a Cloud Storage URI for the synthesis results. Must be
     *  specified in the format: `gs://bucket_name/object_name`, and the bucket
     *  must already exist.
     */
    // const outputGcsUri = 'abc123'
    /**
     *  The desired voice of the synthesized audio.
     */
    const voice = {
        name: "en-US-Wavenet-J",
        languageCode: 'en-US'
    }

    // Imports the Texttospeech library
    const { TextToSpeechLongAudioSynthesizeClient } = require('@google-cloud/text-to-speech').v1;

    // Instantiates a client
    const texttospeechClient = new TextToSpeechLongAudioSynthesizeClient();

    async function callSynthesizeLongAudio() {
        // Construct request
        const request = {
            voice,
            input,
            audioConfig,
            outputGcsUri: "gs://somebucket/file.mp3"
        };

        // Run request
        const [operation] = await texttospeechClient.synthesizeLongAudio(request);
        const [response] = await operation.promise();
        console.log(response);
    }

    callSynthesizeLongAudio();
}

process.on('unhandledRejection', err => {
    console.error(err.message);
    process.exitCode = 1;
});
main();

Speaking rate nor pitch are honoured.

Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

@danielbankhead danielbankhead self-assigned this Apr 8, 2023
@danielbankhead danielbankhead added type: question Request for information or clarification. Not an issue. priority: p3 Desirable enhancement or fix. May not be included in next release. labels Apr 8, 2023
@danielbankhead
Copy link
Member

Hey @nickaws,

speakingRate and pitch should be numbers, not strings:

/** AudioConfig speakingRate. */
public speakingRate: number;
/** AudioConfig pitch. */
public pitch: number;

@nickaws
Copy link
Author
nickaws commented Apr 10, 2023

Hi @danielbankhead ! this actually made no difference, and even throws "'Request contains an invalid argument.'," with the code above (which worked sometimes)

@danielbankhead
Copy link
Member

@nickaws to clarify, the request should look like this:

const operation = await texttospeechClient.synthesizeLongAudio({
  voice: {
    name: "en-US-Wavenet-J",
    languageCode: 'en-US'
  },
  input,
  audioConfig: {
    audioEncoding: "MP3",
    speakingRate: 0.5,
    // pitch: 1,
  },
  outputGcsUri: "gs://somebucket/file.mp3"
});

Does this request not work for you?

which worked sometimes

I'm not sure how a request parameter would fail sometimes; are there any additional details here?

@nickolivera
Copy link
nickolivera commented Apr 10, 2023

@danielbankhead I am very sorry for the cryptic message.

The code I put when I opened the ticket, generated an mp3 but crashed with "Request contains an invalid argument.'" right now, the same code, with the new request works sometimes (it does generate the mp3, but just sometimes, i am varying the name of the mp3 file of course), sometimes fails with -- using latest sdk.

[
  {
    voice: { name: 'en-US-Wavenet-J', languageCode: 'en-US' },
    input: {
      text: "Hey there! I'd be delighted to share with you the latest advancements in car navigation systems. There's so much to talk about, so buckle up and let's get started!"
    },
    audioConfig: { audioEncoding: 'MP3', speakingRate: 0.5 },
    outputGcsUri: 'gs://xxx/file2.mp3'
  },
  Metadata {
    internalRepr: Map(2) {
      'x-goog-api-client' => [Array],
      'x-goog-request-params' => [Array]
    },
    options: {}
  },
  { deadline: 2023-04-10T19:06:05.200Z },
  [Function (anonymous)]
]
3 INVALID_ARGUMENT: Request contains an invalid argument.

    at callErrorFromStatus (/Users/nick/Development/text-to-speech-node/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
    at Object.onReceiveStatus (/Users/nick/Development/text-to-speech-node/node_modules/@grpc/grpc-js/build/src/client.js:192:76)
    at Object.onReceiveStatus (/Users/nick/Development/text-to-speech-node/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)
    at Object.onReceiveStatus (/Users/nick/Development/text-to-speech-node/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)
    at /Users/nick/Development/text-to-speech-node/node_modules/@grpc/grpc-js/build/src/resolving-call.js:94:78
    at process.processTicksAndRejections (node:internal/process/task_queues:77:11)
for call at
    at ServiceClientImpl.makeUnaryRequest (/Users/nick/Development/text-to-speech-node/node_modules/@grpc/grpc-js/build/src/client.js:160:34)
    at ServiceClientImpl.<anonymous> (/Users/nick/Development/text-to-speech-node/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)
    at /Users/nick/Development/text-to-speech-node/node_modules/@google-cloud/text-to-speech/build/src/v1/text_to_speech_long_audio_synthesize_client.js:200:29
    at /Users/nick/Development/text-to-speech-node/node_modules/google-gax/build/src/normalCalls/timeout.js:44:16
    at LongrunningApiCaller._wrapOperation (/Users/nick/Development/text-to-speech-node/node_modules/google-gax/build/src/longRunningCalls/longRunningApiCaller.js:55:16)
    at /Users/nick/Development/text-to-speech-node/node_modules/google-gax/build/src/longRunningCalls/longRunningApiCaller.js:46:25
    at OngoingCall.call (/Users/nick/Development/text-to-speech-node/node_modules/google-gax/build/src/call.js:67:27)
    at LongrunningApiCaller.call (/Users/nick/Development/text-to-speech-node/node_modules/google-gax/build/src/longRunningCalls/longRunningApiCaller.js:45:19)
    at /Users/nick/Development/text-to-speech-node/node_modules/google-gax/build/src/createApiCall.js:84:30
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  code: 3,
  details: 'Request contains an invalid argument.',
  metadata: Metadata { internalRepr: Map(0) {}, options: {} }

We also were looking at https://github.com/googleapis/google-api-nodejs-client/blob/main/discovery/texttospeech-v1.json#L175 seems there's a parameter missing, but what is funny, is that something it does generate the mp3.

Full code

nick@nick text-to-speech-node % node -v
v19.8.1
nick@nick text-to-speech-node % cat package.json| grep speech
"@google-cloud/text-to-speech": "^4.2.1"


'use strict';

function main() {

    const input = { text: "Hey there! I'd be delighted to share with you the latest advancements in car navigation systems. There's so much to talk about, so buckle up and let's get started!" }

    const audioConfig = {
        audioEncoding: "MP3",
        // speakingRate: 1.0
    }

    const voice = {
        name: "en-US-Wavenet-J",
        languageCode: 'en-US'
    }

    // Imports the Texttospeech library
    const { TextToSpeechLongAudioSynthesizeClient } = require('@google-cloud/text-to-speech').v1;

    // Instantiates a client
    const texttospeechClient = new TextToSpeechLongAudioSynthesizeClient();

    async function callSynthesizeLongAudio() {
        // Construct request
        const request = {

            voice,
            input,
            audioConfig,
            outputGcsUri: "gs://xxx/file32222.mp3"
        };


        // Run request
        // const [operation] = await texttospeechClient.synthesizeLongAudio(request);
        const [operation] = await texttospeechClient.synthesizeLongAudio({
            voice: {
                name: "en-US-Wavenet-J",
                languageCode: 'en-US'
            },
            input,
            audioConfig: {
                audioEncoding: "MP3",
                speakingRate: 0.5,
                // pitch: 1,
            },
            outputGcsUri: "gs://xxx/filexxxxxxxx2.mp3"
        });


        const [response] = await operation.promise();
        console.log(response);
    }

    callSynthesizeLongAudio();
}

process.on('unhandledRejection', err => {
    console.error(err.message);
    process.exitCode = 1;
});
main();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

4 participants