Javascript 如何解码二进制音频数据？_Javascript_Audio

Javascript 如何解码二进制音频数据？

javascript audio

Javascript 如何解码二进制音频数据？,javascript,audio,Javascript,Audio,我对网络开发还是新手，我正在制作一个聊天机器人，但我想先通过谷歌的文本到语音运行响应，然后在客户端播放声音。所以客户端向服务器发送消息->服务器创建响应->服务器向谷歌发送消息->获取音频数据->发送到客户端->客户端播放。我一路走到了最后一步，但现在我力不从心了我一直在谷歌上搜索，似乎有很多关于从二进制数据、音频上下文等播放音频的信息，我创建了一个函数，但它不起作用。以下是我所做的： export const SendMessage: Client.Common.Footer.API.Sen

我对网络开发还是新手，我正在制作一个聊天机器人，但我想先通过谷歌的文本到语音运行响应，然后在客户端播放声音。所以客户端向服务器发送消息->服务器创建响应->服务器向谷歌发送消息->获取音频数据->发送到客户端->客户端播放。我一路走到了最后一步，但现在我力不从心了

我一直在谷歌上搜索，似乎有很多关于从二进制数据、音频上下文等播放音频的信息，我创建了一个函数，但它不起作用。以下是我所做的：

export const SendMessage: Client.Common.Footer.API.SendMessage = async message => {
    const baseRoute = process.env.REACT_APP_BASE_ROUTE;
    const port = process.env.REACT_APP_SERVER_PORT;
    const audioContext = new AudioContext();
    let audio: any;
    const url = baseRoute + ":" + port + "/ChatBot";
    console.log("%c Sending post request...", "background: #1fa67f; color: white", url, JSON.stringify(message));
    let responseJson = await fetch(url, {
        method: "POST",
        mode: "cors",
        headers: {
            Accept: "application/json",
            "Content-Type": "application/json"
        },
        body: JSON.stringify(message)
    });
    let response = await responseJson.json();
    await audioContext.decodeAudioData(
        new ArrayBuffer(response.data.audio.data),
        buffer => {
            audio = buffer;
        },
        error => console.log("===ERROR===\n", error)
    );
    const source = audioContext.createBufferSource();
    source.buffer = audio;
    source.connect(audioContext.destination);
    source.start(0);
    console.log("%c Post response:", "background: #1fa67f; color: white", url, response);
};

此函数将消息发送到服务器，并返回响应消息和音频数据。我的response.data.audio.data中确实有一些二进制数据，但我得到一个错误，表示音频数据无法解码（decodeAudioData方法中的错误正在触发）。我知道数据是有效的，因为在我的服务器上，我使用以下代码将其转换为mp3文件，播放效果良好：

const writeFile = util.promisify(fs.writeFile);
await writeFile("output/TTS.mp3", response.audioContent, "binary");

我几乎不知道二进制数据在这里是如何处理的，也不知道会出什么问题。我是否需要指定更多参数来正确解码二进制数据？我怎么知道是哪个？我想了解这里到底发生了什么，而不仅仅是复制粘贴一些解决方案

编辑：

因此，似乎没有正确创建数组缓冲区。如果我运行此代码：

    console.log(response);
    const audioBuffer = new ArrayBuffer(response.data.audio.data);
    console.log("===audioBuffer===", audioBuffer);
    audio = await audioContext.decodeAudioData(audioBuffer);

答复如下：

{message: "Message successfully sent.", status: 1, data: {…}}
    message: "Message successfully sent."
    status: 1
    data:
        message: "Sorry, I didn't understand your question, try rephrasing."
        audio:
            type: "Buffer"
            data: Array(14304)
                [0 … 9999]
                [10000 … 14303]
                length: 14304
            __proto__: Array(0)
        __proto__: Object
    __proto__: Object
__proto__: Object

但缓冲区记录如下：

===audioBuffer=== 
ArrayBuffer(0) {}
    [[Int8Array]]: Int8Array []
    [[Uint8Array]]: Uint8Array []
    [[Int16Array]]: Int16Array []
    [[Int32Array]]: Int32Array []
    byteLength: 0
__proto__: ArrayBuffer

显然JS不理解我的响应对象中的格式，但这是我从google的文本到语音API中得到的。也许我从服务器上发错了？如前所述，在我的服务器上，以下代码将该阵列转换为mp3文件：

    const writeFile = util.promisify(fs.writeFile);
    await writeFile("output/TTS.mp3", response.audioContent, "binary");
    return response.audioContent;

其中response.audioContent也被发送到客户端，如下所示：


//in index.ts
...
const app = express();
app.use(bodyParser.json());
app.use(cors(corsOptions));

app.post("/TextToSpeech", TextToSpeechController);
...
//textToSpeech.ts
export const TextToSpeechController = async (req: Req<Server.API.TextToSpeech.RequestQuery>, res: Response) => {
    let response: Server.API.TextToSpeech.ResponseBody = {
        message: null,
        status: CONSTANTS.STATUS.ERROR,
        data: undefined
    };
    try {
        console.log("===req.body===", req.body);
        if (!req.body) throw new Error("No message recieved");
        const audio = await TextToSpeech({ message: req.body.message });
        response = {
            message: "Audio file successfully created!",
            status: CONSTANTS.STATUS.SUCCESS,
            data: audio
        };
        res.send(response);
    } catch (error) {
        response = {
            message: "Error converting text to speech: " + error.message,
            status: CONSTANTS.STATUS.ERROR,
            data: undefined
        };
        res.json(response);
    }
};
...

我尝试将response.data、response.data.audio和response.data.audio.data传递给新的ArrayBuffer（），但所有结果都是相同的空缓冲区。

在代码中有一些东西，您无法通过该构造函数填充

ArrayBuffer

。您对

decodeAudioData

的调用是异步的，将导致

audio

处于

未定义状态。我建议您将对decodeAudioData
的调用更新为新的基于承诺的函数
编辑：
您对Google Text to Speech的调用和我发布的上一个示例的返回结果一定有些奇怪，因为无论我使用mp3还是Google的响应，只要传递了缓冲区的正确引用，它们都可以工作
事实上，您可以让它使用mp3
文件，而不是文本到语音，这可能是因为您没有在调用google api返回的结果中引用正确的属性。来自api调用的响应是一个数组
，因此请确保您引用的是结果数组中的0
索引（请参见下面的textToSpeech.js
）
完整的应用程序如下所述
// textToSpeech.js
const textToSpeech = require('@google-cloud/text-to-speech');
const client = new textToSpeech.TextToSpeechClient();

module.exports = {
    say: async function(text) {
        const request = {
            input: { text },
            voice: { languageCode: 'en-US', ssmlGender: 'NEUTRAL' },
            audioConfig: { audioEncoding: 'MP3' },
          };
        const response = await client.synthesizeSpeech(request);
        return response[0].audioContent    
    }
}

//index.html
异步函数播放（）{
const audioContext=新的audioContext（）；
const request=wait fetch（'/speech'）；
const response=wait request.json（）；
const arr=Uint8Array.from（response.data.data）
const audio=等待audioContext.decodeAudioData（arr.buffer）；
const source=audioContext.createBufferSource（）；
source.buffer=音频；
source.connect（audioContext.destination）；
source.start（0）；
}
你好音频
玩

也许试试
const audioBuffer = Buffer.from(response.data.audio);
console.log("===audioBuffer===", audioBuffer);

非常感谢，我会尽快尝试。更新答案，因为不需要转换为十六进制。正如您所说，我不正确地转换了数组缓冲区，但这不是问题所在。看起来不管问题是什么，都是编码类型的问题。您的示例演示了如何将mp3文件转换为缓冲区并发送到客户端，我可以做到这一点，我甚至可以使用google API中的数据创建mp3，然后从mp3文件创建缓冲区并发送到客户端。这是可行的，但这是一个愚蠢的解决方案，使用内存和处理能力来创建一个无意义的文件。我应该能够直接将数据发送到客户端并在那里使用它，但不知道如何更新我的答案…我用一个工作示例更新了我的答案，调用了谷歌的文本到语音APII。我终于成功了，它甚至不工作，但它给了我足够的工作，我找到了我犯错误的地方。是我在服务器代码中的一个输入错误搞砸了转换。非常感谢你的帮助，如果我没有反例的话，我永远也不会发现这个。
// textToSpeech.js
const textToSpeech = require('@google-cloud/text-to-speech');
const client = new textToSpeech.TextToSpeechClient();

module.exports = {
    say: async function(text) {
        const request = {
            input: { text },
            voice: { languageCode: 'en-US', ssmlGender: 'NEUTRAL' },
            audioConfig: { audioEncoding: 'MP3' },
          };
        const response = await client.synthesizeSpeech(request);
        return response[0].audioContent    
    }
}

// server.js
const express = require('express');
const path = require('path');
const app = express();
const textToSpeechService = require('./textToSpeech');

app.get('/', (req, res) => {
    res.sendFile(path.join(__dirname + '/index.html'));
});

app.get('/speech', async (req, res) => {
    const buffer = await textToSpeechService.say('hello world');
    res.json({
        status: `y'all good :)`,
        data: buffer
    })
});

app.listen(3000);

// index.html
<!DOCTYPE html>
<html>
    <script>
        async function play() {
            const audioContext = new AudioContext();
            const request = await fetch('/speech');
            const response = await request.json();
            const arr = Uint8Array.from(response.data.data)
            const audio = await audioContext.decodeAudioData(arr.buffer);
            const source = audioContext.createBufferSource();
            source.buffer = audio;
            source.connect(audioContext.destination);
            source.start(0);
        }
    </script>
    <body>
        <h1>Hello Audio</h1>
        <button onclick="play()">play</button>
    </body>
</html>

const audioBuffer = new ArrayBuffer(response.data.audio.data);
console.log("===audioBuffer===", audioBuffer);

const audioBuffer = Buffer.from(response.data.audio);
console.log("===audioBuffer===", audioBuffer);