Java 设置/返回数组的首选方法_Java_Arrays_Performance_Jvm_Performance Testing_Jit

Java 设置/返回数组的首选方法

java arrays performance jvm

Java 设置/返回数组的首选方法,java,arrays,performance,jvm,performance-testing,jit,Java,Arrays,Performance,Jvm,Performance Testing,Jit,请比较设置/返回数组的两种方法： static public float[] test_arr_speeds_1( int a ) { return new float[]{ a, a + 1, a + 2, a + 3, a + 4, a + 5, a + 6, a + 7, a + 8, a + 9 }; } // or e.g. field = new float... in method static public float[] tes

请比较设置/返回数组的两种方法：

static public float[] test_arr_speeds_1( int a ) {
  return new float[]{ a, a + 1, a + 2, a + 3, a + 4, a + 5,
                      a + 6, a + 7, a + 8, a + 9 };
} // or e.g. field = new float... in method

static public float[] test_arr_speeds_2( int a ) {
  float[] ret = new float[10];
  ret[0] = a;
  ret[1] = a + 1;
  ret[2] = a + 2;
  ret[3] = a + 3;
  ret[4] = a + 4;
  ret[5] = a + 5;
  ret[6] = a + 6;
  ret[7] = a + 7;
  ret[8] = a + 8;
  ret[9] = a + 9;
  return ret;
} // or e.g. field[0] = ... in method

两者都会生成不同的字节码，并且都可以反编译到以前的状态。通过探查器检查执行时间（100M迭代、无偏、不同环境）后，_1方法的时间约为_2时间的4/3，即使两者都创建了一个新数组并将每个字段设置为给定值。大多数时候，时间是可以忽略不计的，但这仍然让我感到不安——为什么1明显变慢了？有人能用JVM支持的合理方式检查/确认/解释它吗？

这里是字节码之间的区别（仅针对前两项）。第一种方法：

bipush  10
newarray float      //creating an array with reference on operand stack

dup
iconst_0
iload_0
i2f
fastore             //setting first element

dup
iconst_1
iload_0
iconst_1
iadd
i2f
fastore             //setting second element

//...
areturn             //returning the top of the operand stack

第二种方法：

bipush  10
newarray float
astore_1            //creating an array and storing it in local variable

aload_1
iconst_0
iload_0
i2f
fastore             //setting first element

aload_1
iconst_1
iload_0
iconst_1
iadd
i2f
fastore             //setting second element

//...
aload_1
areturn

正如您所看到的，唯一的区别是，在第一种方案中，数组引用保留在操作数堆栈上（这就是为什么

dup

出现如此多次-以避免在

fastore

之后丢失对数组的引用），而在第二种方案中，数组引用保留在正常堆栈上（保留方法参数和局部变量的地方）。在这种情况下，必须始终读取引用（

aload_1

），因为

fastore

要求在操作数堆栈上启用arrayref

我们不应该基于这个字节码进行假设——毕竟它是由CPU指令转换的，而且很可能在这两种情况下，数组引用都存储在一个CPU寄存器中。否则，性能差异将是巨大的

如果您可以测量差异，并且正在进行低级优化，请选择更快的版本。但我怀疑差异是否“可移植”（取决于体系结构和JVM版本/实现，您将观察到不同的计时行为）。也就是说，我会选择可读性更强的版本，而不是在你的计算机上运行更快的版本。

相应的字节码表示是否足够短，可以在这里发布？两种方法的平均1000*1000000次调用次数是相同的，至少对我来说是这样。这正是我想要的。事实上，不同之处在于似乎只存在于操作数/正常堆栈使用中-我同意大多数情况下它的级别太低而不需要麻烦，我只是想知道原因。我也更喜欢可读代码而不是“优化”代码。