使用Unity进行Microsoft Azure文本语音转换时,播放声音的开头和结尾将出现断音
我正在使用Microsoft Azure文本对Unity进行演讲。但在播放声音的开始和结束时会出现断音。这是正常的,还是结果。音频数据已损坏。下面是代码使用Unity进行Microsoft Azure文本语音转换时,播放声音的开头和结尾将出现断音,azure,unity3d,text-to-speech,microsoft-cognitive,Azure,Unity3d,Text To Speech,Microsoft Cognitive,我正在使用Microsoft Azure文本对Unity进行演讲。但在播放声音的开始和结束时会出现断音。这是正常的,还是结果。音频数据已损坏。下面是代码 public AudioSource audioSource; void Start() { SynthesisToSpeaker("你好世界"); } public void SynthesisToSpeaker(string text) { var config
public AudioSource audioSource;
void Start()
{
SynthesisToSpeaker("你好世界");
}
public void SynthesisToSpeaker(string text)
{
var config = SpeechConfig.FromSubscription("[redacted]", "southeastasia");
config.SpeechSynthesisLanguage = "zh-CN";
config.SpeechSynthesisVoiceName = "zh-CN-XiaoxiaoNeural";
// Creates a speech synthesizer.
// Make sure to dispose the synthesizer after use!
SpeechSynthesizer synthesizer = new SpeechSynthesizer(config, null);
Task<SpeechSynthesisResult> task = synthesizer.SpeakTextAsync(text);
StartCoroutine(CheckSynthesizer(task, config, synthesizer));
}
private IEnumerator CheckSynthesizer(Task<SpeechSynthesisResult> task,
SpeechConfig config,
SpeechSynthesizer synthesizer)
{
yield return new WaitUntil(() => task.IsCompleted);
var result = task.Result;
// Checks result.
string newMessage = string.Empty;
if (result.Reason == ResultReason.SynthesizingAudioCompleted)
{
var sampleCount = result.AudioData.Length / 2;
var audioData = new float[sampleCount];
for (var i = 0; i < sampleCount; ++i)
{
audioData[i] = (short)(result.AudioData[i * 2 + 1] << 8
| result.AudioData[i * 2]) / 32768.0F;
}
// The default output audio format is 16K 16bit mono
var audioClip = AudioClip.Create("SynthesizedAudio", sampleCount,
1, 16000, false);
audioClip.SetData(audioData, 0);
audioSource.clip = audioClip;
audioSource.Play();
}
else if (result.Reason == ResultReason.Canceled)
{
var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
}
synthesizer.Dispose();
}
公共音频源音频源;
void Start()
{
合成器扬声器(“你好世界");
}
公共扬声器(字符串文本)
{
var config=SpeechConfig.FromSubscription(“[修订]”,“东南亚”);
config.SpeechSynthesisLanguage=“zh CN”;
config.SpeechSynthesisVoiceName=“zh CN XiaoxiaoNeural”;
//创建语音合成器。
//请务必在使用后处理合成器!
SpeechSynthesizer合成器=新的SpeechSynthesizer(配置,空);
任务=合成器.SpeakTextAsync(文本);
Start例程(检查合成器(任务、配置、合成器));
}
专用IEnumerator校验合成器(任务,
SpeechConfig配置,
语音合成器(语音合成器)
{
返回新的等待时间(()=>task.IsCompleted);
var result=task.result;
//检查结果。
string newMessage=string.Empty;
if(result.Reason==ResultReason.synthesis已完成)
{
var sampleCount=result.AudioData.Length/2;
var audioData=新浮点数[sampleCount];
对于(变量i=0;i audioData[i]=(短)(结果。audioData[i*2+1]默认音频格式为RIFF16KHZ16BITMONOCM
,在result.audioData
的开头有一个riff头。如果您将audioData传递给audioClip,它将播放头,然后您会听到一些噪音
您可以通过speechConfig.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.raw16khz16bitcm);
将格式设置为无标题的原始格式,有关详细信息,请参阅