Java 将音频作为原始流播放时发出刺耳的白色声音_Java_Audio_Ffmpeg_Javasound

Java 将音频作为原始流播放时发出刺耳的白色声音

java audio ffmpeg

Java 将音频作为原始流播放时发出刺耳的白色声音,java,audio,ffmpeg,javasound,Java,Audio,Ffmpeg,Javasound,I.背景我正在尝试制作一个应用程序，帮助在波形级别、单词级别甚至字符级别非常准确地将字幕与音频波形匹配音频应该是梵文圣歌（瑜伽、仪式等），这是一个非常长的复合词[示例-a]ṅganyā-sokta-mātaro-bījam传统上是一个被打断的单词，只是为了帮助阅读] 输入的文本/副标题可能在句子/韵文级别上大致同步，但在单词级别上肯定不会同步应用程序应该能够找出音频波形中的静音点，以便猜测每个单词（甚至单词中的字母/辅音/元音）的起点和终点，从而使单词级（甚至字母/辅音/元音级）的音频吟诵

I.背景

我正在尝试制作一个应用程序，帮助在波形级别、单词级别甚至字符级别非常准确地将字幕与音频波形匹配

音频应该是梵文圣歌（瑜伽、仪式等），这是一个非常长的复合词[示例-a]ṅganyā-sokta-mātaro-bījam传统上是一个被打断的单词，只是为了帮助阅读]

输入的文本/副标题可能在句子/韵文级别上大致同步，但在单词级别上肯定不会同步

应用程序应该能够找出音频波形中的静音点，以便猜测每个单词（甚至单词中的字母/辅音/元音）的起点和终点，从而使单词级（甚至字母/辅音/元音级）的音频吟诵和视频字幕完全匹配，而相应的用户界面只是突出显示或动画字幕行中此时正在吟诵的确切单词（甚至字母），并以更大的字体显示该单词（甚至字母/辅音/元音）。这个应用程序的目的是帮助学习梵语诵读

这不是一个100%自动化的过程，也不是100%手动的过程，而是一个应用程序应该尽可能帮助人类的混合过程 II。下面是我为此编写的第一个代码，其中

首先我打开一个mp3（或任何音频格式）文件

查找音频文件时间轴中的任意点//从零偏移开始播放

获取原始格式的音频数据，用于两个目的：（1）播放音频数据和（2）绘制波形

使用标准java音频库播放原始音频数据

III.我面临的问题是，在每个周期之间都有刺耳的声音。

也许我需要关闭周期之间的界限？听起来很简单，我可以试试
但我也在想，这种总体方法本身是否正确？任何提示、指南、建议和链接都会非常有用
此外，我刚刚硬编码的采样率等（44100Hz等），这些是好的设置为默认预设或它应该取决于输入格式

IV.这是代码

import com.github.kokorin.jaffree.StreamType;
import com.github.kokorin.jaffree.ffmpeg.FFmpeg;
import com.github.kokorin.jaffree.ffmpeg.FFmpegProgress;
import com.github.kokorin.jaffree.ffmpeg.FFmpegResult;
import com.github.kokorin.jaffree.ffmpeg.NullOutput;
import com.github.kokorin.jaffree.ffmpeg.PipeOutput;
import com.github.kokorin.jaffree.ffmpeg.ProgressListener;
import com.github.kokorin.jaffree.ffprobe.Stream;
import com.github.kokorin.jaffree.ffmpeg.UrlInput;
import com.github.kokorin.jaffree.ffprobe.FFprobe;
import com.github.kokorin.jaffree.ffprobe.FFprobeResult;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.SourceDataLine;


public class FFMpegToRaw {
    Path BIN = Paths.get("f:\\utilities\\ffmpeg-20190413-0ad0533-win64-static\\bin");
    String VIDEO_MP4 = "f:\\org\\TEMPLE\\DeviMahatmyamRecitationAudio\\03_01_Devi Kavacham.mp3";
    FFprobe ffprobe;
    FFmpeg ffmpeg;

    public void basicCheck() throws Exception {
        if (BIN != null) {
            ffprobe = FFprobe.atPath(BIN);
        } else {
            ffprobe = FFprobe.atPath();
        }
        FFprobeResult result = ffprobe
                .setShowStreams(true)
                .setInput(VIDEO_MP4)
                .execute();

        for (Stream stream : result.getStreams()) {
            System.out.println("Stream " + stream.getIndex()
                    + " type " + stream.getCodecType()
                    + " duration " + stream.getDuration(TimeUnit.SECONDS));
        }    
        if (BIN != null) {
            ffmpeg = FFmpeg.atPath(BIN);
        } else {
            ffmpeg = FFmpeg.atPath();
        }

        //Sometimes ffprobe can't show exact duration, use ffmpeg trancoding to NULL output to get it
        final AtomicLong durationMillis = new AtomicLong();
        FFmpegResult fFmpegResult = ffmpeg
                .addInput(
                        UrlInput.fromUrl(VIDEO_MP4)
                )
                .addOutput(new NullOutput())
                .setProgressListener(new ProgressListener() {
                    @Override
                    public void onProgress(FFmpegProgress progress) {
                        durationMillis.set(progress.getTimeMillis());
                    }
                })
                .execute();
        System.out.println("audio size - "+fFmpegResult.getAudioSize());
        System.out.println("Exact duration: " + durationMillis.get() + " milliseconds");
    }

    public void toRawAndPlay() throws Exception {
        ProgressListener listener = new ProgressListener() {
            @Override
            public void onProgress(FFmpegProgress progress) {
                System.out.println(progress.getFrame());
            }
        };

        // code derived from : https://stackoverflow.com/questions/32873596/play-raw-pcm-audio-received-in-udp-packets

        int sampleRate = 44100;//24000;//Hz
        int sampleSize = 16;//Bits
        int channels   = 1;
        boolean signed = true;
        boolean bigEnd = false;
        String format  = "s16be"; //"f32le"

        //https://trac.ffmpeg.org/wiki/audio types
        final AudioFormat af = new AudioFormat(sampleRate, sampleSize, channels, signed, bigEnd);
        final DataLine.Info info = new DataLine.Info(SourceDataLine.class, af);
        final SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info);

        line.open(af, 4096); // format , buffer size
        line.start();

        OutputStream destination = new OutputStream() {
            @Override public void write(int b) throws IOException {
                throw new UnsupportedOperationException("Nobody uses thi.");
            }
            @Override public void write(byte[] b, int off, int len) throws IOException {
                String o = new String(b);
                boolean showString = false;
                System.out.println("New output ("+ len
                        + ", off="+off + ") -> "+(showString?o:"")); 
                // output wave form repeatedly

                if(len%2!=0) {
                    len -= 1;
                    System.out.println("");
                }
                line.write(b, off, len);
                System.out.println("done round");
            }
        };

        // src : http://blog.wudilabs.org/entry/c3d357ed/?lang=en-US
        FFmpegResult result = FFmpeg.atPath(BIN).
            addInput(UrlInput.fromPath(Paths.get(VIDEO_MP4))).
            addOutput(PipeOutput.pumpTo(destination).
                disableStream(StreamType.VIDEO). //.addArgument("-vn")
                setFrameRate(sampleRate).            //.addArguments("-ar", sampleRate)
                addArguments("-ac", "1").
                setFormat(format)              //.addArguments("-f", format)
            ).
            setProgressListener(listener).
            execute();

        // shut down audio
        line.drain();
        line.stop();
        line.close();

        System.out.println("result = "+result.toString());
    }

    public static void main(String[] args) throws Exception {
        FFMpegToRaw raw = new FFMpegToRaw();
        raw.basicCheck();
        raw.toRawAndPlay();
    }
}

谢谢你

我怀疑你的尖叫声来自交给音响系统的半满缓冲区

正如上面的评论所指出的，我会使用类似（如果在mac或Windows上）的代码，然后使用下面的代码，这更像java风格

只要确保FFSampledSP完整的jar在您的路径中，您就可以开始了

import javax.sound.sampled.*;
import java.io.File;
import java.io.IOException;

public class PlayerDemo {

    /**
     * Derive a PCM format.
     */
    private static AudioFormat toSignedPCM(final AudioFormat format) {
        final int sampleSizeInBits = format.getSampleSizeInBits() <= 0 ? 16 : format.getSampleSizeInBits();
        final int channels = format.getChannels() <= 0 ? 2 : format.getChannels();
        final float sampleRate = format.getSampleRate() <= 0 ? 44100f : format.getSampleRate();
        return new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
                sampleRate,
                sampleSizeInBits,
                channels,
                (sampleSizeInBits > 0 && channels > 0) ? (sampleSizeInBits/8)*channels : AudioSystem.NOT_SPECIFIED,
                sampleRate,
                format.isBigEndian()
        );
    }


    public static void main(final String[] args) throws IOException, UnsupportedAudioFileException, LineUnavailableException {
        final File audioFile = new File(args[0]);
        // open mp3 or whatever
        final Long durationInMicroseconds = (Long)AudioSystem.getAudioFileFormat(audioFile).getProperty("duration");
        // how long is the file, use AudioFileFormat properties
        System.out.println("Duration in microseconds (not millis!): " + durationInMicroseconds);
        // open the mp3 stream (not yet decoded)
        final AudioInputStream mp3In = AudioSystem.getAudioInputStream(audioFile);
        // derive a suitable PCM format that can be played by the AudioSystem
        final AudioFormat desiredFormat = toSignedPCM(mp3In.getFormat());
        // ask the AudioSystem for a source line for playback
        // that corresponds to the derived PCM format
        final SourceDataLine line = AudioSystem.getSourceDataLine(desiredFormat);

        // now play, typically in separate thread
        new Thread(() -> {
            final byte[] buf = new byte[4096];
            int justRead;
            // convert to raw PCM samples with the AudioSystem
            try (final AudioInputStream rawIn = AudioSystem.getAudioInputStream(desiredFormat, mp3In)) {
                line.open();
                line.start();
                while ((justRead = rawIn.read(buf)) >= 0) {
                    // only write bytes we really read, not more!
                    line.write(buf, 0, justRead);
                    final long microsecondPosition = line.getMicrosecondPosition();
                    System.out.println("Current position in microseconds: " + microsecondPosition);
                }
            } catch (IOException | LineUnavailableException e) {
                e.printStackTrace();
            } finally {
                line.drain();
                line.stop();
            }
        }).start();
    }
}

import javax.sound.sampled.*；
导入java.io.File；
导入java.io.IOException；
公共类PlayerDemo{
/**
*导出PCM格式。
*/
专用静态音频格式至已签名PCM（最终音频格式）{
final int-sampleSizeInBits=format.getSampleSizeInBits（）{
最终字节[]buf=新字节[4096]；
int justRead；
//使用音响系统转换为原始PCM样本
try（final AudioInputStream rawIn=AudioSystem.getAudioInputStream（desiredFormat，mp3In））{
line.open（）；
line.start（）；
而（（justRead=rawIn.read（buf））>=0）{
//只写我们真正读取的字节，而不是更多！
行写入（buf，0，justRead）；
final long microsecondPosition=line.getMicrosecondPosition（）；
System.out.println（“以微秒为单位的当前位置：“+微秒位置”）；
}
}捕获（IOException | LineUnavailableException e）{
e、 printStackTrace（）；
}最后{
line.drain（）；
line.stop（）；
}
}).start（）；
}
}

常规JavaAPI不允许跳转到任意位置。但是，FFSampledSP包含一个扩展，即方法。要使用它，只需从上面的例子将<代码> RaWin 转换为<代码> FudioOnIdPoStuts并调用<代码> SeCK（）/<代码> <代码>时间>代码>和<代码>时间单元< /代码>。< /P>如果您在MACOS或Windows上，您可能需要考虑使用以使其更加优雅。@亨德里克-与任何示例代码有任何链接吗？那会有帮助的。感谢您的评论。通过读取具有已知音频（例如100赫兹）的文件来简化，并通过以PCM格式打印原始音频曲线（仅音频曲线上的点）来确认您的代码工作，以便您可以看到音频曲线数据点根据sin曲线向上/向下变化。。。这将让你确认你的代码是正确的solid@ScottStensland-谢谢你的评论。我可以听到音频点亮，它播放ok，然后是尖叫声，然后是下一个循环，播放ok，然后是尖叫声。仍然在解决问题。谢谢，让我试试这个。我会回来的。它播放得很好，现在让我试着理解代码。我如何从任意时间点查找和播放音频？正如我所解释的，我需要非常准确地同步音频和字幕，所以用户应该能够准确地做到这一点。我把精确的寻找排除在当前问题的范围之外，但是有人怎么做甚至是基本的寻找呢？这很重要。非常感谢。旁白：现在我可能需要图形化的音频数据来查看静默点，或者可以通过计算来完成。那将是不同的话题，不是这个问题。不，你完全正确。您无法使用MediaPlayer获取波形。您需要将字节转换为采样值，即整数或浮点值，然后绘制它们。有关转换的示例代码，请查看这是否有帮助，请发布新问题。