在Java中存储枚举的顺序_Java_Algorithm_Enums_Guava

在Java中存储枚举的顺序

java algorithm enums

在Java中存储枚举的顺序,java,algorithm,enums,guava,Java,Algorithm,Enums,Guava,在java中，枚举集使用long（RegularEnumSet）或long[]（JumboEnumSet）将其包含的项存储在位掩码/位向量中。我现在遇到了一个用例，其中我有数千个域对象（让我们称它们为节点），每个域对象都将以每个对象不同的顺序显示枚举的所有项（让我们称之为标志）目前我将订单存储为番石榴，因为这样可以保证保留插入顺序。但是，我曾经比较过EnumSet、ImmutableSet和标志[]中的内存使用情况。以下是a）标志包含64个枚举项和b）所有三个变量包含所有64个项时的结果：枚

在java中，枚举集使用

long

（

RegularEnumSet

）或

long[]

（

JumboEnumSet

）将其包含的项存储在位掩码/位向量中。我现在遇到了一个用例，其中我有数千个域对象（让我们称它们为

节点

），每个域对象都将以每个对象不同的顺序显示枚举的所有项（让我们称之为

标志

）

目前我将订单存储为番石榴，因为这样可以保证保留插入顺序。但是，我曾经比较过

EnumSet

、

ImmutableSet

和

标志[]

中的内存使用情况。以下是a）标志包含64个枚举项和b）所有三个变量包含所有64个项时的结果：

枚举集：32字节
ImmutableSet:832字节
数组：272字节

所以我的问题是：有没有一种聪明的方法将枚举顺序打包成一个数值，以获得比数组更小的内存占用？如果有区别的话：在我的用例中，我假设排序总是包含所有枚举项

澄清一下：我的enum比这个小得多，到目前为止我没有任何内存问题，这种情况也不可能给我带来内存问题。只是这种低效率让我感到厌烦，即使是在微观层面上

更新：

根据各种答案和评论的建议，我提出了这个使用字节数组的数据结构。警告：它不实现Set接口（不检查唯一值），也不会扩展到超出字节容量的大型枚举。此外，复杂性相当糟糕，因为必须重复查询Enum.values（），但下面是：

public class EnumOrdering<E extends Enum<E>> implements Iterable<E> {
    private final Class<E> type;
    private final byte[] order;

    public EnumOrdering(final Class<E> type, final Collection<E> order) {
        this.type = type;

        this.order = new byte[order.size()];

        int offset = 0;
        for (final E item : order) {
            this.order[offset++] = (byte) item.ordinal();
        }

    }

    @Override
    public Iterator<E> iterator() {
        return new AbstractIterator<E>() {
            private int offset = -1;
            private final E[] enumConstants = type.getEnumConstants();

            @Override
            protected E computeNext() {
                if (offset < order.length - 1) {
                    return enumConstants[order[++offset]];
                }
                return endOfData();
            }
        };
    }
}

public类enumerable{
私有最终类类型；
私有最终字节[]顺序；
公共枚举顺序（最终类类型、最终集合顺序）{
this.type=type；
this.order=新字节[order.size（）]；
整数偏移=0；
用于（最终E项：订单）{
this.order[offset++]=（字节）item.ordinal（）；
}
}
@凌驾
公共迭代器迭代器（）{
返回新的AbstractIterator（）{
私有整数偏移量=-1；
private final E[]枚举常量=type.getEnumConstants（）；
@凌驾
受保护的E computeNext（）{
if（偏移量<订单长度-1）{
返回枚举常量[顺序[++偏移]]；
}
返回endOfData（）；
}
};
}
}

内存占用是：

电话：104

到目前为止，这是一个非常好的结果，感谢bestsss和JB Nizet

更新：我已将代码更改为仅实现Iterable，因为任何其他内容都需要equals/hashCode/contains等的合理实现。

如果您有64个枚举值，则可以使用字节数组，其中每个字节将包含一个枚举项的序号。这将需要

3*（64+16）=240

字节，用于3个64字节的数组（16字节是字节数组的成本，无论其长度如何）

这仍然浪费空间，因为每个字节可以存储8位，但只需要6位就可以存储0到63之间的数字。因此，您可以应用一种聪明的打包算法，使用3个字节（24位）来存储4个枚举序号。这将导致

3*（64*3/4+16）=192个字节
我不擅长字节操作，所以我将把实现留给您作为练习
有没有一种聪明的方法将枚举顺序打包成一个数值
是的，您可以将排序表示为数值，但要使用它，您需要将其转换回字节/整数数组。既然有64个！64个值和64个值的可能顺序！大于Long.MAX_值
，则需要将该数字存储在biginger
中。我想这将是存储顺序的最有效的内存方式，尽管由于必须将数字转换为数组，您在内存中获得的东西会在时间上丢失
有关在数字/数组表示形式之间转换的算法，请参见
这里有一个替代上述方法的方法，不知道它是否与上述方法一样有效，您必须将代码从int
转换为biginger
，但这应该足以让您了解：
/**
   * Returns ith permutation of the n numbers [from, ..., to]
   * (Note that n == to - from + 1).
   * permutations are numbered from 0 to n!-1, if i is outside this
   * range it is treated as i%n! 
   * @param i
   * @param from
   * @param n
   * @return
   */
  public static int[] perm(long i, int from, int to)
  {
    // method specification numbers permutations from 0 to n!-1.
    // If you wanted them numbered from 1 to n!, uncomment this line.
    //  i -= 1;
    int n = to - from + 1;

    int[] initArr  = new int[n];             // numbers [from, ..., to]
    int[] finalArr = new int[n];             // permutation of numbers [from, ..., to]

    // populate initial array
    for (int k=0; k<n; k++)
      initArr[k] = k+from;

    // compute return array, element by element
    for (int k=0; k<n; k++) {
      int index = (int) ((i%factorial(n-k)) / factorial(n-k-1));

      // find the index_th element from the initial array, and
      // "remove" it by setting its value to -1
      int m = convertIndex(initArr, index);
      finalArr[k] = initArr[m];
      initArr[m] = -1;
    }

    return finalArr;
  }


  /** 
   * Helper method used by perm.
   * Find the index of the index_th element of arr, when values equal to -1 are skipped.
   * e.g. if arr = [20, 18, -1, 19], then convertIndex(arr, 2) returns 3.
   */
  private static int convertIndex(int[] arr, int index)
  {
    int m=-1;
    while (index>=0) {
      m++;
      if (arr[m] != -1)
        index--;
    }

    return m;
  }

您将获得以下输出：
0: [1, 2, 3, 4]
1: [1, 2, 4, 3]
2: [1, 3, 2, 4]
3: [1, 3, 4, 2]
4: [1, 4, 2, 3]
5: [1, 4, 3, 2]
6: [2, 1, 3, 4]
7: [2, 1, 4, 3]
8: [2, 3, 1, 4]
9: [2, 3, 4, 1]
10: [2, 4, 1, 3]
11: [2, 4, 3, 1]
12: [3, 1, 2, 4]
13: [3, 1, 4, 2]
14: [3, 2, 1, 4]
15: [3, 2, 4, 1]
16: [3, 4, 1, 2]
17: [3, 4, 2, 1]
18: [4, 1, 2, 3]
19: [4, 1, 3, 2]
20: [4, 2, 1, 3]
21: [4, 2, 3, 1]
22: [4, 3, 1, 2]
23: [4, 3, 2, 1]


根据biginger.bitLength（）
判断，应该可以在不超过37字节的范围内存储64个元素的顺序（加上使用biginger
实例的开销）。我不知道这是否值得，但这是一个很好的锻炼
 字节[]的简单数组就可以了，字节[]包含enum.ordinal。如果您有256个以上的项目，可以使用short[]/int[]。或者，您可以将项目打包到8位以下。您可能需要额外注意序列化，无论哪种方式，代码都将少于200行，而且非常简单。如果您不需要插入顺序，只需使用一个long-最多可以包含enum w/64个元素，就像在C中一样。@bestsss如果我不需要插入顺序，我将使用EnumSet，正是这样，然后使用byte[]
表示添加顺序，使用一个long
表示快速包含（即无需迭代），在设置不可变后，将byte[]
修剪为大小。因此，一组64项在编辑中将有64+8+2*object_头（~40）个总内存占用：您可以“缓存”值（）
，而类类型使用值数组来获取类，至少不需要在每个迭代器上创建它们。然后进一步创建静态的WeakHashMap，WeakHashMap有点糟糕，但在这里就可以了。因此，您几乎得到了类似的东西，如SharedSecret。执行包含操作仍然需要额外的8个字节（或者每次都必须扫描字节[]）。巴西卡
0: [1, 2, 3, 4]
1: [1, 2, 4, 3]
2: [1, 3, 2, 4]
3: [1, 3, 4, 2]
4: [1, 4, 2, 3]
5: [1, 4, 3, 2]
6: [2, 1, 3, 4]
7: [2, 1, 4, 3]
8: [2, 3, 1, 4]
9: [2, 3, 4, 1]
10: [2, 4, 1, 3]
11: [2, 4, 3, 1]
12: [3, 1, 2, 4]
13: [3, 1, 4, 2]
14: [3, 2, 1, 4]
15: [3, 2, 4, 1]
16: [3, 4, 1, 2]
17: [3, 4, 2, 1]
18: [4, 1, 2, 3]
19: [4, 1, 3, 2]
20: [4, 2, 1, 3]
21: [4, 2, 3, 1]
22: [4, 3, 1, 2]
23: [4, 3, 2, 1]