Java 从.JPG中提取经度和纬度的最快方法_Java_Metadata_Bufferedreader_Exif

Java 从.JPG中提取经度和纬度的最快方法

java

Java 从.JPG中提取经度和纬度的最快方法,java,metadata,bufferedreader,exif,Java,Metadata,Bufferedreader,Exif,我有大量的.JPG文件（>100000），我想从每个文件中提取经度和纬度。我目前的设置完成了任务，但我想加快进程。这是我得到的 public static void digFolder( File[] files ) { totalCount += files.length; // Total files to be processed JPGFile jpgFile = new JPGFile( ); // Holds the extracted longitude

我有大量的.JPG文件（>100000），我想从每个文件中提取经度和纬度。我目前的设置完成了任务，但我想加快进程。这是我得到的

public static void digFolder( File[] files ) {
    totalCount += files.length;       // Total files to be processed
    JPGFile jpgFile = new JPGFile( ); // Holds the extracted longitude and latitude
    int progress;                     // How many files have been processed

    for (File file : files) {
        if (file.isDirectory( )) {
            // Updates a JLabel with the current directory
            label.setText( "Currently working on " + file.getName( ) );
            digFolder( file.listFiles( ) );    // Recursive call
        } else {
            // Sets the path to the .jpg file
            jpgFile.setPath( file.getAbsolutePath( ) );

            if (!jpgFile.initialize( )) continue; // Code for .initialize( ) is below

            // Grabs the longitude and latitude
            String record = jpgFile.getLongitude( ) + ", " + jpgFile.getLatitude( );

            // BufferedReader writes to .CSV with a buffered size of 8192
            output.writeRecord( record );

            // Updates the progress of a JProgressBar, and sets the text
            progress = ( int )Math.round( ( ++processedCount / ( double )totalCount ) * 100 );
            progressBar.setValue( progress );
            progressBar.setString( progress + "% (" + processedCount + "/" + totalCount + ")" );
        }
    }
}

下面是类

JPGFile

中

.initialize（）

的代码。它从.JPG中的EXIF数据中获取坐标。然后可以使用

\u location.getLongitude（）

和

\u location.getLatitude（）

获取经度和纬度。我正在使用的图书馆是

当我检查运行时间时，我有437秒的时间将33000.JPG文件中的数据写入.CSV文件（如果我使用完全相同的文件再次运行它，它将下降到8秒，但我认为这是因为它们已经在内存中。最好第一次运行只需要8秒！）。

元数据提取器

需要很长时间才能抓取数据，而且似乎有些过分（20个包中包含100多个类）

有没有一种简单的方法来获取数据？有人有什么建议可以缩短处理时间吗？谢谢

这是我现在拥有的。我现在正在使用。我对这个库所做的唯一更改是创建所有必要的对象和方法

static

，以便在不创建新对象的情况下使用

readMetaData

public static void walkFileSystem( File[] files ) {
    totalCount += files.length;

    for (int i = 0; i < files.length; i++) {
        if (files[i].getAbsolutePath( ).endsWith( ".jpg" )) {
            try {
                GeoTag current = JpegGeoTagReader.readMetadata( files[i] );

                // Uses a BufferedWriter to write to the file
                writer.writeRecord( current.getLongitude( ) + ", " +
                                    current.getLatitude( ) + ", " +
                                    files[i].getAbsolutePath( ) + "," +
                                    files[i].getName( ) );
            } catch (Exception e) {
                e.printStackTrace( );
            }

            if (++processedCount % 100 == 0) {
                int progress = ( int )Math.round( ( processedCount / ( double )totalCount ) * 100 );

                if (progressBar.getValue( ) != progress) progressBar.setValue( progress );
                progressBar.setString( progress + "%" + " (" + processedCount + "/" + totalCount + ")" );
            }
        } else if (files[i].isDirectory( )) {
            label.setText( "Currently working on " + files[i].getName( ) );
            walkFileSystem( files[i].listFiles( ) );
        }
    }
}

公共静态文件系统（文件[]文件）{
totalCount+=files.length；
对于（int i=0；i


我发现，当它第一次进入一个新文件夹时，脚本速度相对较快。但当它处理更多的文件时（50%通过文件夹），它的速度会减慢到爬行速度。每次迭代都会创建一些东西吗？索引不应该影响速度，我认为您不需要识别应用程序中的瓶颈。如果瓶颈是CPU：您需要并行化对jpgFile.getLength（）和jpgFile.getLatitude（）的调用。如果瓶颈是网络的硬盘驱动器，那么你就完蛋了。从Java7开始，你可以使用nio
来提高它的速度，这可以更快地处理文件和目录。正如您可能知道的那样，在Java中附加字符串的速度很慢，就像强制转换一样，所以请尽量避免这种情况。最慢的似乎是阅读器，idk它是如何创建的，但当我浏览源代码时，它似乎读取了整个JPEG文件，速度非常慢。那么，也许你可以自己检查一下读取元数据的其他方法？或者像Seb建议的那样使用并行化。时间上的差异可能是由于JVM初始化。我以前也做过非正式的基准测试，直到我在这里被教导使用更新：lol我读过437ms，如果我们说的是秒，那绝不是因为JVM初始化，但无论如何使用JMH）我尝试使用文件。walkFileTree
来自nio
，结果很小。我从exif提取器切换到此：。将我需要的额外类文件数量减少到2个。你将如何进行平行排列？我以前从未做过这样的事，你的速度已经让人印象深刻了。寻找驱动器是瓶颈。你必须使用SSD。但那会很昂贵。考虑：（1）对于大多数迭代，忽略状态消息，例如仅键入状态，并且仅在i mod 100=0时更新条。（2） 设计一个缓存机制。记住数据库甚至文本文件中每个路径/文件的坐标。
public static void walkFileSystem( File[] files ) {
    totalCount += files.length;

    for (int i = 0; i < files.length; i++) {
        if (files[i].getAbsolutePath( ).endsWith( ".jpg" )) {
            try {
                GeoTag current = JpegGeoTagReader.readMetadata( files[i] );

                // Uses a BufferedWriter to write to the file
                writer.writeRecord( current.getLongitude( ) + ", " +
                                    current.getLatitude( ) + ", " +
                                    files[i].getAbsolutePath( ) + "," +
                                    files[i].getName( ) );
            } catch (Exception e) {
                e.printStackTrace( );
            }

            if (++processedCount % 100 == 0) {
                int progress = ( int )Math.round( ( processedCount / ( double )totalCount ) * 100 );

                if (progressBar.getValue( ) != progress) progressBar.setValue( progress );
                progressBar.setString( progress + "%" + " (" + processedCount + "/" + totalCount + ")" );
            }
        } else if (files[i].isDirectory( )) {
            label.setText( "Currently working on " + files[i].getName( ) );
            walkFileSystem( files[i].listFiles( ) );
        }
    }
}