Javascript 如何访问Microsoft Speech SDK录制的音频流
我正在用微软的JavaScript语音SDK转录麦克风流。录音和转录都是使用Speech SDK完成的,我无法找到一种方法,在录音完成后如何访问和保存录制的音频文件 用于创建记录器和录制的代码Javascript 如何访问Microsoft Speech SDK录制的音频流,javascript,audio-recording,speech-to-text,microsoft-cognitive,Javascript,Audio Recording,Speech To Text,Microsoft Cognitive,我正在用微软的JavaScript语音SDK转录麦克风流。录音和转录都是使用Speech SDK完成的,我无法找到一种方法,在录音完成后如何访问和保存录制的音频文件 用于创建记录器和录制的代码 recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig); // to start the recording recognizer.startContinuousRecognitionAsync( () =>
recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);
// to start the recording
recognizer.startContinuousRecognitionAsync(
() => {
portFromCS.postMessage({ type: "started", data: "" });
},
err => {
recognizer.close();
},
);
// used after user input to stop the recording
recognizer.stopContinuousRecognitionAsync(
() => {
window.console.log("successfully stopped");
// TODO: somehow need to save the file
},
err => {
window.console.log("error on stop", err);
},
);
这是相当糟糕的,我无法找到一个内置的方式如何访问原始音频使用他们的SDK。我唯一的选择是使用两个音频流进行录制并使用单独的录制流保存文件吗?这意味着什么?SDK既不保存音频,也没有内置的保存音频的功能 在版本1.11.0中,在连接对象中添加了一个新的API,允许您查看发送到服务的消息,从中您可以提取音频并自己组装一个wave文件 下面是一些执行此操作的typescript:
import * as SpeechSdk from "microsoft-cognitiveservices-speech-sdk";
import * as fs from "fs";
const filename: string = "input.wav";
const outputFileName: string = "out.wav";
const subscriptionKey: string = "<SUBSCRIPTION_KEY>";
const region: string = "<SUBSCRIPTION_REGION>";
const speechConfig: SpeechSdk.SpeechConfig = SpeechSdk.SpeechConfig.fromSubscription(subscriptionKey, region);
// Load the audio from a file, alternately you could use
// const audioConfig:SpeechSdk.AudioConfig = SpeechSdk.AudioConfig.fromDefaultMicrophone() in a browser();
const fileContents: Buffer = fs.readFileSync(filename);
const inputStream: SpeechSdk.PushAudioInputStream = SpeechSdk.AudioInputStream.createPushStream();
const audioConfig: SpeechSdk.AudioConfig = SpeechSdk.AudioConfig.fromStreamInput(inputStream);
inputStream.write(fileContents);
inputStream.close();
const r: SpeechSdk.SpeechRecognizer = new SpeechSdk.SpeechRecognizer(speechConfig, audioConfig);
const con: SpeechSdk.Connection = SpeechSdk.Connection.fromRecognizer(r);
let wavFragmentCount: number = 0;
const wavFragments: { [id: number]: ArrayBuffer; } = {};
con.messageSent = (args: SpeechSdk.ConnectionMessageEventArgs): void => {
// Only record outbound audio mesages that have data in them.
if (args.message.path === "audio" && args.message.isBinaryMessage && args.message.binaryMessage !== null) {
wavFragments[wavFragmentCount++] = args.message.binaryMessage;
}
};
r.recognizeOnceAsync((result: SpeechSdk.SpeechRecognitionResult) => {
// Find the length of the audio sent.
let byteCount: number = 0;
for (let i: number = 0; i < wavFragmentCount; i++) {
byteCount += wavFragments[i].byteLength;
}
// Output array.
const sentAudio: Uint8Array = new Uint8Array(byteCount);
byteCount = 0;
for (let i: number = 0; i < wavFragmentCount; i++) {
sentAudio.set(new Uint8Array(wavFragments[i]), byteCount);
byteCount += wavFragments[i].byteLength;
}
// Set the file size in the wave header:
const view = new DataView(sentAudio.buffer);
view.setUint32(4, byteCount, true);
view.setUint32(40, byteCount, true);
// Write the audio back to disk.
fs.writeFileSync(outputFileName, sentAudio);
r.close();
});
import*作为“microsoft认知服务语音sdk”中的SpeechSdk;
从“fs”导入*作为fs;
常量文件名:string=“input.wav”;
const outputFileName:string=“out.wav”;
const subscriptionKey:string=“”;
常量区域:字符串=”;
const speechConfig:SpeechSdk.speechConfig=SpeechSdk.speechConfig.fromSubscription(subscriptionKey,region);
//从文件加载音频,也可以使用
//const audioConfig:SpeechSdk.audioConfig=SpeechSdk.audioConfig.fromDefaultMirror()在浏览器()中;
const fileContents:Buffer=fs.readFileSync(文件名);
const inputStream:SpeechSdk.PushAudioInputStream=SpeechSdk.AudioInputStream.createPushStream();
const audioConfig:SpeechSdk.audioConfig=SpeechSdk.audioConfig.fromStreamInput(inputStream);
inputStream.write(文件内容);
inputStream.close();
const r:SpeechSdk.SpeechRecognizer=新建SpeechSdk.SpeechRecognizer(speechConfig,audioConfig);
const con:SpeechSdk.Connection=SpeechSdk.Connection.fromsrecognizer(r);
让wavFragmentCount:number=0;
常量:{[id:number]:ArrayBuffer;}={};
con.messageSent=(args:SpeechSdk.ConnectionMessageEventArgs):void=>{
//仅录制包含数据的出站音频数据表。
if(args.message.path==“音频”&&args.message.isBinaryMessage&&args.message.binaryMessage!==null){
wavFragmentCount++=args.message.binaryMessage;
}
};
r、 recognizeOnceAsync((结果:SpeechSdk.SpeechRecognitionResult)=>{
//查找发送的音频的长度。
让字节计数:number=0;
for(设i:number=0;i
它从一个文件加载,这样我就可以在NodeJS而不是浏览器中进行测试,但核心部分是一样的