Java JVM崩溃，未指定帧，仅；计时器已过期，中止“；_Java_Hadoop_Java Native Interface

Java JVM崩溃，未指定帧，仅；计时器已过期，中止“；

java hadoop

Java JVM崩溃，未指定帧，仅；计时器已过期，中止“；,java,hadoop,java-native-interface,Java,Hadoop,Java Native Interface,我正在Hadoop下运行一个Java作业，它正在破坏JVM。我怀疑这是由于一些JNI代码（它使用带有多线程本机BLAS实现的JBLAS）。但是，虽然我希望崩溃日志为调试提供“有问题的框架”，但是日志看起来像： # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f204dd6fb27, pid=19570, tid=139776470402816

我正在Hadoop下运行一个Java作业，它正在破坏JVM。我怀疑这是由于一些JNI代码（它使用带有多线程本机BLAS实现的JBLAS）。但是，虽然我希望崩溃日志为调试提供“有问题的框架”，但是日志看起来像：

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f204dd6fb27, pid=19570, tid=139776470402816
#
# JRE version: 6.0_38-b05
# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.13-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# # [ timer expired, abort... ]

JVM在生成这个崩溃转储输出时，是否有一些计时器等待多长时间？如果是这样的话，有没有办法增加时间，这样我就可以得到更多有用的信息？我不认为所提到的计时器来自Hadoop，因为我在许多地方看到（没有帮助的）对这个错误的引用，这些地方没有提到Hadoop

谷歌搜索似乎表明字符串“timer expired，abort”只出现在这些JVM错误消息中，因此不太可能来自操作系统

编辑：看来我可能运气不好。从

/hotspot/src/share/vm/runtime/thread.cpp

在JVM源代码的OpenJDK版本中：

 if (is_error_reported()) {
   // A fatal error has happened, the error handler(VMError::report_and_die)
   // should abort JVM after creating an error log file. However in some
   // rare cases, the error handler itself might deadlock. Here we try to
   // kill JVM if the fatal error handler fails to abort in 2 minutes.
   //
   // This code is in WatcherThread because WatcherThread wakes up
   // periodically so the fatal error handler doesn't need to do anything;
   // also because the WatcherThread is less likely to crash than other
   // threads.

   for (;;) {
     if (!ShowMessageBoxOnError
      && (OnError == NULL || OnError[0] == '\0')
      && Arguments::abort_hook() == NULL) {
          os::sleep(this, 2 * 60 * 1000, false);
          fdStream err(defaultStream::output_fd());
          err.print_raw_cr("# [ timer expired, abort... ]");
          // skip atexit/vm_exit/vm_abort hooks
          os::die();
     }

     // Wake up 5 seconds later, the fatal handler may reset OnError or
     // ShowMessageBoxOnError when it is ready to abort.
     os::sleep(this, 5 * 1000, false);
   }
 }

 if (is_error_reported()) {
   // A fatal error has happened, the error handler(VMError::report_and_die)
   // should abort JVM after creating an error log file. However in some
   // rare cases, the error handler itself might deadlock. Here we try to
   // kill JVM if the fatal error handler fails to abort in 2 minutes.
   //
   // This code is in WatcherThread because WatcherThread wakes up
   // periodically so the fatal error handler doesn't need to do anything;
   // also because the WatcherThread is less likely to crash than other
   // threads.

   for (;;) {
     if (!ShowMessageBoxOnError
      && (OnError == NULL || OnError[0] == '\0')
      && Arguments::abort_hook() == NULL) {
          os::sleep(this, 2 * 60 * 1000, false);
          fdStream err(defaultStream::output_fd());
          err.print_raw_cr("# [ timer expired, abort... ]");
          // skip atexit/vm_exit/vm_abort hooks
          os::die();
     }

     // Wake up 5 seconds later, the fatal handler may reset OnError or
     // ShowMessageBoxOnError when it is ready to abort.
     os::sleep(this, 5 * 1000, false);
   }
 }

它似乎被硬编码为等待两分钟。我不知道为什么我的工作要花更长的时间做事故报告，但我认为这个问题至少已经得到了回答。

看来我可能运气不好。来自JVM源代码OpenJDK版本中的./hotspot/src/share/vm/runtime/thread.cpp：

 if (is_error_reported()) {
   // A fatal error has happened, the error handler(VMError::report_and_die)
   // should abort JVM after creating an error log file. However in some
   // rare cases, the error handler itself might deadlock. Here we try to
   // kill JVM if the fatal error handler fails to abort in 2 minutes.
   //
   // This code is in WatcherThread because WatcherThread wakes up
   // periodically so the fatal error handler doesn't need to do anything;
   // also because the WatcherThread is less likely to crash than other
   // threads.

   for (;;) {
     if (!ShowMessageBoxOnError
      && (OnError == NULL || OnError[0] == '\0')
      && Arguments::abort_hook() == NULL) {
          os::sleep(this, 2 * 60 * 1000, false);
          fdStream err(defaultStream::output_fd());
          err.print_raw_cr("# [ timer expired, abort... ]");
          // skip atexit/vm_exit/vm_abort hooks
          os::die();
     }

     // Wake up 5 seconds later, the fatal handler may reset OnError or
     // ShowMessageBoxOnError when it is ready to abort.
     os::sleep(this, 5 * 1000, false);
   }
 }

 if (is_error_reported()) {
   // A fatal error has happened, the error handler(VMError::report_and_die)
   // should abort JVM after creating an error log file. However in some
   // rare cases, the error handler itself might deadlock. Here we try to
   // kill JVM if the fatal error handler fails to abort in 2 minutes.
   //
   // This code is in WatcherThread because WatcherThread wakes up
   // periodically so the fatal error handler doesn't need to do anything;
   // also because the WatcherThread is less likely to crash than other
   // threads.

   for (;;) {
     if (!ShowMessageBoxOnError
      && (OnError == NULL || OnError[0] == '\0')
      && Arguments::abort_hook() == NULL) {
          os::sleep(this, 2 * 60 * 1000, false);
          fdStream err(defaultStream::output_fd());
          err.print_raw_cr("# [ timer expired, abort... ]");
          // skip atexit/vm_exit/vm_abort hooks
          os::die();
     }

     // Wake up 5 seconds later, the fatal handler may reset OnError or
     // ShowMessageBoxOnError when it is ready to abort.
     os::sleep(this, 5 * 1000, false);
   }
 }

它似乎被硬编码为等待两分钟。我不知道为什么我的工作要花更长的时间报告车祸，但我认为这个问题至少已经得到了回答。

解决方法是在命令行上指定

-XX:ShowMessageBoxOnError

，并使用另一个术语中的调试器附加到进程。

看起来你是在Linux上……你得到了内核转储吗？我不这么认为。当然，日志上什么也没说，我也找不到。作业是在Hadoop下运行的，所以我没有能力执行通常的shell命令来请求核心转储，假设这些命令是必需的。如果它只在一台服务器上，您可能需要检查该机器上的RAM。否则，您可以通过提供hadoop子配置

mapred.child.java.opts来尝试获取转储，例如设置为-Xdump:java+heap+system+snap:events=user