Linux上的Java BlockingQueue延迟高_Java_Linux_Multithreading_Latency

Linux上的Java BlockingQueue延迟高

java linux multithreading

Linux上的Java BlockingQueue延迟高,java,linux,multithreading,latency,Java,Linux,Multithreading,Latency,我正在使用BlockingQueue:s（尝试ArrayBlockingQueue和LinkedBlockingQueue）在我当前正在处理的应用程序中的不同线程之间传递对象。性能和延迟在这个应用程序中相对重要，所以我想知道使用BlockingQueue在两个线程之间传递对象需要多少时间。为了衡量这一点，我编写了一个带有两个线程（一个使用者和一个生产者）的简单程序，其中我让生产者向使用者传递一个时间戳（使用System.nanoTime（）获取），请参见下面的代码我记得在某个论坛的某个地方读到

我正在使用BlockingQueue:s（尝试ArrayBlockingQueue和LinkedBlockingQueue）在我当前正在处理的应用程序中的不同线程之间传递对象。性能和延迟在这个应用程序中相对重要，所以我想知道使用BlockingQueue在两个线程之间传递对象需要多少时间。为了衡量这一点，我编写了一个带有两个线程（一个使用者和一个生产者）的简单程序，其中我让生产者向使用者传递一个时间戳（使用System.nanoTime（）获取），请参见下面的代码

我记得在某个论坛的某个地方读到过一篇文章，说其他人尝试了这个方法（不知道是在什么操作系统和硬件上），花了大约10微秒，所以当我在windows 7机器（Intel E7500 core 2 duo CPU，2.93GHz）上花了约30微秒，同时在后台运行许多其他应用程序时，我并不感到惊讶。然而，当我在速度快得多的Linux服务器（两个Intel X5677 3.46GHz四核CPU，运行内核为2.6.26-2-amd64的Debian 5）上进行相同的测试时，我非常惊讶。我期望延迟比我的windows机箱上的要低，但恰恰相反，它要高得多-~75–100微秒！这两个测试都是使用Sun的Hotspot JVM版本1.6.0-23完成的

有没有其他人在Linux上做过类似的测试，结果类似？或者有人知道为什么Linux（硬件更好）上的线程切换速度要慢得多，是不是Linux上的线程切换速度比windows慢得多？如果是这样的话，windows似乎更适合某些应用程序。非常感谢任何帮助我理解相对较高数字的帮助

编辑：
在DaveC发表评论之后，我还做了一个测试，将JVM（在Linux机器上）限制为一个内核（即所有线程运行在同一个内核上）。这极大地改变了结果-延迟下降到20微秒以下，即比Windows计算机上的结果要好。我还做了一些测试，其中我将生产者线程限制在一个核心上，消费者线程限制在另一个核心上（尝试在同一个套接字和不同的套接字上使用它们），但这似乎没有帮助-延迟仍然是约75微秒。顺便说一句，这个测试应用程序几乎就是我在执行测试时在机器上运行的所有程序

有人知道这些结果是否有意义吗？如果生产者和消费者在不同的内核上运行，那么它真的应该慢得多吗？任何意见都非常感谢

再次编辑（1月6日）：
我尝试了对代码和运行环境的不同更改：

我将Linux内核升级到2.6.36.2（从2.6.26.2）。内核升级后，测量的时间从升级前的75-100微秒变为60微秒，变化非常小。为生产者线程和消费者线程设置CPU相关性没有任何效果，除非将它们限制在同一个核心上。在同一个内核上运行时，测得的延迟为13微秒

在原始代码中，我让生产者在每次迭代之间休眠1秒，以便给消费者足够的时间来计算经过的时间并将其打印到控制台。如果我删除对Thread.sleep（）的调用，而是让生产者和消费者在每次迭代中调用barrier.await（）（消费者在将经过的时间打印到控制台后调用它），测得的延迟将从60微秒减少到10微秒以下。如果在同一个内核上运行线程，延迟将低于1微秒。有人能解释为什么这样显著地减少了延迟吗？我的第一个猜测是，更改的效果是生产者在消费者调用queue.take（）之前调用queue.put（），因此消费者不必阻止，但在使用ArrayBlockingQueue的修改版本后，我发现这个猜测是错误的——消费者实际上阻止了。如果你有其他的猜测，请告诉我。（顺便说一句，如果我让生产者同时调用Thread.sleep（）和barrier.await（），延迟将保持在60微秒）

我还尝试了另一种方法——不是调用queue.take（），而是调用queue.poll（），超时时间为100毫秒。这将平均延迟降低到10微秒以下，但当然CPU密集度要高得多（但可能会比繁忙等待低CPU密集度？）

再次编辑（1月10日）-问题已解决：
ninjalj认为延迟约60微秒是因为CPU必须从深度睡眠状态中醒来——他完全正确！在BIOS中禁用C状态后，延迟降低到0）睡眠；产生（）； }捕获（例外e）{ e、 printStackTrace（）； } } } 公共产品{ 试一试{ queue.put（System.nanoTime（））； }捕捉（中断异常e）{ } } 公共消费（）{ 试一试{ long t=queue.take（）； long now=System.nanoTime（）； longtime=（now-t）/1000；//除以1000得到以微秒为单位的结果如果（睡眠>0）{ System.out.println（“时间：+时间”）； } }捕获（例外e）{ e、 printStackTrace（）； } } 公共静态void main（字符串[]args）{ QueueTest test=新的QueueTest（）； System.out.println（“开始…”）； //先运行一次，忽略结果 test.sleep=0； test.start（）； //再次运行，打印结果 System.out.println（“重新开始…”）； test.sleep=1000； test.start（）； } }

如果可以的话，我会使用ArrayBlockingQueue。当我使用它时，Linux上的延迟在8-18微秒之间。有一点是不可能的

import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.CyclicBarrier;

public class QueueTest {

    ArrayBlockingQueue<Long> queue = new ArrayBlockingQueue<Long>(10);
    Thread consumerThread;
    CyclicBarrier barrier = new CyclicBarrier(2);
    static final int RUNS = 500000;
    volatile int sleep = 1000;

    public void start() {
        consumerThread = new Thread(new Runnable() {
            @Override
            public void run() {
                try {
                    barrier.await();
                    for(int i = 0; i < RUNS; i++) {
                        consume();

                    }
                } catch (Exception e) {
                    e.printStackTrace();
                } 
            }
        });
        consumerThread.start();

        try {
            barrier.await();
        } catch (Exception e) { e.printStackTrace(); }

        for(int i = 0; i < RUNS; i++) {
            try {
                if(sleep > 0)
                    Thread.sleep(sleep);
                produce();

            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

    public void produce() {
        try {
            queue.put(System.nanoTime());
        } catch (InterruptedException e) {
        }
    }

    public void consume() {
        try {
            long t = queue.take();
            long now = System.nanoTime();
            long time = (now - t) / 1000; // Divide by 1000 to get result in microseconds
            if(sleep > 0) {
                System.out.println("Time: " + time);
            }

        } catch (Exception e) {
            e.printStackTrace();
        }

    }

    public static void main(String[] args) {
        QueueTest test = new QueueTest();
        System.out.println("Starting...");
        // Run first once, ignoring results
        test.sleep = 0;
        test.start();
        // Run again, printing the results
        System.out.println("Starting again...");
        test.sleep = 1000;
        test.start();
    }
}

...
consumerThread.setPriority(Thread.MAX_PRIORITY);
consumerThread.start();

package t1;

import java.math.BigDecimal;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.SynchronousQueue;

public class QueueTest {

    static final int RUNS = 250000;

    final SynchronousQueue<Long> queue = new SynchronousQueue<Long>();

    int sleep = 1000;

    long[] results  = new long[0];
    public void start(final int runs) throws Exception {
        results = new long[runs];
        final CountDownLatch barrier = new CountDownLatch(1);
        Thread consumerThread = new Thread(new Runnable() {
            @Override
            public void run() {
                barrier.countDown();
                try {

                    for(int i = 0; i < runs; i++) {                        
                        results[i] = consume(); 

                    }
                } catch (Exception e) {
                    return;
                } 
            }
        });
        consumerThread.setPriority(Thread.MAX_PRIORITY);
        consumerThread.start();


        barrier.await();
        final long sleep = this.sleep;
        for(int i = 0; i < runs; i++) {
            try {                
                doProduce(sleep);

            } catch (Exception e) {
                return;
            }
        }
    }

    private void doProduce(final long sleep) throws InterruptedException {
        produce();
    }

    public void produce() throws InterruptedException {
        queue.put(new Long(System.nanoTime()));//new Long() is faster than value of
    }

    public long consume() throws InterruptedException {
        long t = queue.take();
        long now = System.nanoTime();
        return now-t;
    }

    public static void main(String[] args) throws Throwable {           
        QueueTest test = new QueueTest();
        System.out.println("Starting + warming up...");
        // Run first once, ignoring results
        test.sleep = 0;
        test.start(15000);//10k is the normal warm-up for -server hotspot
        // Run again, printing the results
        System.gc();
        System.out.println("Starting again...");
        test.sleep = 1000;//ignored now
        Thread.yield();
        test.start(RUNS);
        long sum = 0;
        for (long elapsed: test.results){
            sum+=elapsed;
        }
        BigDecimal elapsed = BigDecimal.valueOf(sum, 3).divide(BigDecimal.valueOf(test.results.length), BigDecimal.ROUND_HALF_UP);        
        System.out.printf("Avg: %1.3f micros%n", elapsed); 
    }
}