Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/312.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/performance/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java Slowtrending sax解析1.2 Terrabyte文件_Java_Performance_Openstreetmap_Sax - Fatal编程技术网

Java Slowtrending sax解析1.2 Terrabyte文件

Java Slowtrending sax解析1.2 Terrabyte文件,java,performance,openstreetmap,sax,Java,Performance,Openstreetmap,Sax,我喜欢解析这个星球(osmplanet-200309.xml~1.26tb) 为了使用java计算最少需要多长时间,我创建了一个小型sax解析应用程序: final File f = new File("f:/planet-200309.xml"); SAXParserFactory newInstance = SAXParserFactory.newInstance(); final long start = System.currentTimeMi

我喜欢解析这个星球(osm
planet-200309.xml
~1.26tb)

为了使用java计算最少需要多长时间,我创建了一个小型sax解析应用程序:

    final File f = new File("f:/planet-200309.xml");
    SAXParserFactory newInstance = SAXParserFactory.newInstance();
    final long start = System.currentTimeMillis();
    final CountingInputStream cif = new CountingInputStream(new FileInputStream(f));
    Thread t = new Thread() {
        public void run() {
            try {
                while (cif.available() > 0) {
                    Thread.sleep(10000);
                    long stop = System.currentTimeMillis();
                    long seconds = (stop - start) / 1000;
                    long bytesRead = cif.getBytesRead();
                    float bytePerSecond = bytesRead / seconds;
                    int expectedSeconds = (int) (f.length() / bytePerSecond);
                    System.out.println("Expected minutes: " + expectedSeconds / 60 + ", bytes per second:" + (int) bytePerSecond + " (reat: " + bytesRead
                            + ", took: " + seconds + ")");
                }
            } catch (IOException e) {
                e.printStackTrace();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        };
    };
    t.start();

    newInstance.newSAXParser().parse(cif, new DefaultHandler() {
        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
            if (!qName.equals("changeset") & !qName.equals("tag")) {
                System.out.println(qName);
            }
        }
    });

    long stop = System.currentTimeMillis();
    long took = stop - start;
    System.out.println(took / 1000);
我决定每10秒计算一下需要多长时间。这是我的输出:

Expected minutes: 174, bytes per second:122728576 (reat: 1227285768, took: 10)
Expected minutes: 173, bytes per second:123918400 (reat: 2478367973, took: 20)
Expected minutes: 172, bytes per second:124213368 (reat: 3726401103, took: 30)
Expected minutes: 175, bytes per second:122289280 (reat: 4891571271, took: 40)
Expected minutes: 186, bytes per second:115111152 (reat: 5755557747, took: 50)
Expected minutes: 197, bytes per second:108455448 (reat: 6507327092, took: 60)
Expected minutes: 212, bytes per second:100975920 (reat: 7068314710, took: 70)
Expected minutes: 224, bytes per second:95568256 (reat: 7645460576, took: 80)
Expected minutes: 236, bytes per second:90757120 (reat: 8168140838, took: 90)
Expected minutes: 237, bytes per second:90276424 (reat: 9027642303, took: 100)
Expected minutes: 240, bytes per second:89072968 (reat: 9798026674, took: 110)
Expected minutes: 250, bytes per second:85678456 (reat: 10281415054, took: 120)
Expected minutes: 257, bytes per second:83305712 (reat: 10829743052, took: 130)
Expected minutes: 256, bytes per second:83664016 (reat: 11712962690, took: 140)
Expected minutes: 250, bytes per second:85531576 (reat: 12829736785, took: 150)
在最初的10秒内,我计算了大约3小时的时间消耗。两分钟后,我计算了4小时的时间消耗

相互测试的事情是:似乎有一种趋势会减慢sax解析

我将这些值放到一个图表中:


这里的问题是什么?为什么趋势会变慢?

出现这种情况的原因有很多,从文件的特定内容开始(即使在整个xml中,标记/属性的密度也不是必需的)。很明显,我们没有足够的输入来给出准确的答案,而使用1TB文件的实验也不容易重现。尝试文件不同部分的子集;尝试不同的硬件、操作系统等。