文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在nodejs上连接/连接音频缓冲区数组(文本到语音结果)？

问如何在nodejs上连接/连接音频缓冲区数组(文本到语音结果)？
EN

Stack Overflow用户

提问于 2021-11-03 08:05:11

回答 1查看 655关注 0票数 0

我想将多个文本转换成一个音频，但我很困惑如何将多个音频连接到单个音频文件中(由于5k字符/请求限制，无法将长文本转换为音频)。

我现在的代码在下面。它生成多个音频字节数组，但无法合并mp3音频，因为它忽略了头/元信息。是否建议在TTS领域使用LINEAR16？我很高兴听到任何建议。谢谢。

  const client = new textToSpeech.TextToSpeechClient();
  const promises = ['hi','world'].map(text => {
    const requestBody = {
      audioConfig: {
        audioEncoding: 'MP3'
      },
      input: {
        text: text,
      },
      voice: {
        languageCode: 'en-US',
        ssmlGender: 'NEUTRAL'
      },
    };
    return client.synthesizeSpeech(requestBody)
  })
  const responses = await Promise.all(promises)
  console.log(responses)
  const audioContents = responses.map(res => res[0].audioContent)
  const audioContent = audioContents.join() // this line has a problem

标准输出

[
  [
    {
      audioContent: <Buffer ff f3 44 c4 00 12 a0 01 24 01 40 00 01 7c 06 43 fa 7f 80 38 46 63 fe 1f 00 33 3f c7 f0 03 03 33 1f c1 f0 0c eb fa 3f 03 20 7e 63 f3 78 03 ba 64 73 e0 ... 2638 more bytes>
    },
    null,
    null
  ],
  [
    {
      audioContent: <Buffer ff f3 44 c4 00 12 58 05 24 01 41 00 01 1e 02 23 9e 1f e0 1f 83 83 df ef 80 e8 ff 99 f0 0c 00 e8 7f c3 68 03 cf fd f8 8f ff 0f 3c 7f 88 f8 8c 87 e0 23 ... 2926 more bytes>
    },
    null,
    null
  ]
]

node.js

google-text-to-speech

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-11-10 17:37:20

解决方案-1

正如我在评论中提到的，节点中有一个google concat ssml包满足您的需求，这不是谷歌的官方包。它将根据API的5K字符限制自动发出多个请求，并将结果音频连接到一个音频文件中。在执行代码之前，安装以下客户端库：

npm install @google-cloud/text-to-speech
npm install google-text-to-speech-concat --save

通过在每个请求的标记之间添加少于5k的字符到API，尝试下面的代码。例如，如果您有9K字符，则需要将其拆分为2个或更多请求，因此在标记之间添加第一个5K字符，然后在新的标记之间添加剩余的4k字符。因此，通过使用google-text-to-speech-concat包，API返回的音频文件被连接到一个音频文件中。

const textToSpeech =require('@google-cloud/text-to-speech');
const testSynthesize =require('google-text-to-speech-concat');
const fs = require('fs');
const path= require('path');
(async () => {
 const request = {
   voice: {
     languageCode: 'en-US',
     ssmlGender: 'FEMALE'
   },
   input: {
     ssml: `
     <speak>
     <p>add less than 5k chars between paragraph tags</p>
     <p>add less than 5k chars between paragraph tags</p>
     </speak>`
   },
   audioConfig: {
     audioEncoding: 'MP3'
   }
 };
 try {
   // Create your Text To Speech client
   // More on that here: https://cloud.google.com/docs/authentication/production#providing_credentials_to_your_application
   const textToSpeechClient = new textToSpeech.TextToSpeechClient({
     keyFilename: path.join(__dirname, 'google-cloud-credentials.json')
   });
   // Synthesize the text, resulting in an audio buffer
   const buffer = await testSynthesize.synthesize(textToSpeechClient, request);
   // Handle the buffer
   // For example write it to a file or directly upload it to storage, like S3 or Google Cloud Storage
   const outputFile = path.join(__dirname, 'Output.mp3');
   // Write the file
   fs.writeFile(outputFile, buffer, 'binary', (err) => {
     if (err) throw err;
     console.log('Got audio!', outputFile);
   });
 } catch (err) {
   console.log(err);
 }
})();

Workaround-2

尝试下面的代码将整个文本分割成一组5K字符，并将它们发送到API进行转换。正如您所知，这会创建多个音频文件。在执行代码之前，请在当前工作目录中创建一个文件夹来存储输出音频文件。

const textToSpeech = require('@google-cloud/text-to-speech');
const fs = require('fs');
const util = require('util');
 
// Creates a client
const client = new textToSpeech.TextToSpeechClient();
 
(async function () {
 
 // The text to synthesize
 var text = fs.readFileSync('./text.txt', 'utf8');
 var newArr = text.match(/[^\.]+\./g);
 
 var charCount = 0;
 var textChunk = "";
 var index = 0;
 
 for (var n = 0; n < newArr.length; n++) {
 
   charCount += newArr[n].length;
   textChunk = textChunk + newArr[n];
 
   console.log(charCount);
 
   if (charCount > 4600 || n == newArr.length - 1) {
 
     console.log(textChunk);
 
     // Construct the request
     const request = {
       input: {
         text: textChunk
       },
       // Select the language and SSML voice gender (optional)
       voice: {
         languageCode: 'en-US',
         ssmlGender: 'MALE',
         name: "en-US-Wavenet-B"
       },
       // select the type of audio encoding
       audioConfig: {
         effectsProfileId: [
           "headphone-class-device"
         ],
         pitch: -2,
         speakingRate: 1.1,
         audioEncoding: "MP3"
       },
     };
 
     // Performs the text-to-speech request
     const [response] = await client.synthesizeSpeech(request);
 
     console.log(response);
 
     // Write the binary audio content to a local file
     const writeFile = util.promisify(fs.writeFile);
     await writeFile('result/Output' + index + '.mp3', response.audioContent, 'binary');
     console.log('Audio content written to file: output.mp3');
 
     index++;
 
     charCount = 0;
     textChunk = "";
   }
 }
}());

为了将输出的音频文件合并到单个音频文件中，可以使用音容厂包，这不是谷歌的官方包。您还可以使用其他类似的可用包来连接音频文件。

要使用这个音像库，必须已经安装了ffmpeg应用程序(而不是ffmpeg NPM包)。因此，在执行连接音频文件的代码之前，安装基于操作系统的ffmpeg工具，并安装以下客户端库：

npm install audioconcat
npm install ffmpeg --enable-libmp3lame

尝试下面的代码，它将输出目录中的所有音频文件连接起来，并将单个级联的output.mp3音频文件存储在当前的工作目录中。

const audioconcat = require('audioconcat')
const testFolder = 'result/';
const fs = require('fs');
var array=[];
fs.readdirSync(testFolder).forEach(songs => {
 array.push("result/"+songs);
 console.log(songs);
});
 
audioconcat(array)
 .concat('output.mp3')
 .on('start', function (command) {
   console.log('ffmpeg process started:', command)
 })
 .on('error', function (err, stdout, stderr) {
   console.error('Error:', err)
   console.error('ffmpeg stderr:', stderr)
 })
 .on('end', function (output) {
   console.error('Audio successfully created', output)
 })

对于这两种解决方案，我测试了来自各种GitHub链接的代码，并根据您的需求修改了代码。以下是供您参考的链接。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69821441

复制

相似问题

问如何在nodejs上连接/连接音频缓冲区数组(文本到语音结果)？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在nodejs上连接/连接音频缓冲区数组(文本到语音结果)？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在nodejs上连接/连接音频缓冲区数组(文本到语音结果)？
EN