如何在Java中快速从未排序的数组中获取前N个出现项？_Java_Algorithm_Performance_Sorting_Collections

如何在Java中快速从未排序的数组中获取前N个出现项？

java algorithm performance sorting collections

如何在Java中快速从未排序的数组中获取前N个出现项？,java,algorithm,performance,sorting,collections,Java,Algorithm,Performance,Sorting,Collections,我试过两种方法使用HashMap计算每个项目的计数，然后导航地图 HashMap<Integer, Integer> doc_counts = new HashMap<Integer, Integer>(); for (int i = 0; i < p; ++i) { int doc = alld[i]; Integer count = doc_counts.get(doc); if (null == count) coun

我试过两种方法

使用HashMap计算每个项目的计数，然后导航地图

HashMap<Integer, Integer> doc_counts = new HashMap<Integer, Integer>();
for (int i = 0; i < p; ++i) {
    int doc = alld[i];
    Integer count = doc_counts.get(doc);
    if (null == count)
        count = 0;
    doc_counts.put(doc, count + 1);
}
// to now it cost 200ms already
for (Entry<Integer, Integer> item : doc_counts.entrySet()) {
    heapCheck(h, hsize, item.getKey(), item.getValue());    // heap sort top hsize items
}

HashMap doc_counts=new HashMap（）；
对于（int i=0；i


首先对数组进行排序，然后使用heap sort获得top N
Arrays.sort(alld, 0, p); // the sort costs about 160ms
int curr = alld[0];
int count = 0;
for(int i = 0; i < p; i++) {
    int doc = alld[i];
    if(doc == curr) {
        ++count;
    } else {
        ++nHits;
        //curr += base;
        heapCheck(h, hsize, curr, count);
        curr = doc;
        count = 1;
    }
}
//
// Handle the last document that was collected.
heapCheck(h, hsize, curr, count);

Arrays.sort（alld，0，p）；//该排序的成本约为160毫秒
int curr=alld[0]；
整数计数=0；
对于（int i=0；i

在一个包含1600000个项目的数组上进行的测试表明，第二种方法花费了大约170ms的时间，大部分时间都花在排序上（大约160ms），而第一种方法甚至只需将所有项目添加到HashMap中即可花费200ms的时间。如何提高性能，找到更快的映射或排序函数，或将其更改为并行函数以使用多线程？
堆排序为O（n log n），而将所有内容添加到Hashmap中则为O（n），因此，很可能是由于Hashmap的大小调整/重新灰化，导致恒定因子性能受到影响。尝试指定较大的初始容量以避免过多的调整大小操作。
该任务非常适合并行化。您可以使用来实现分治算法。例如，您可以使用并行排序算法对阵列进行排序，并减少160毫秒
或者，如果您想试验Java 8，它有一个内置的Arrays.parallelSort（）
方法。
带有基元类型的集合
框架非常昂贵
尝试使用tintintthashmap
代替第一种方法，即计数映射
根据我的观点和经验，第二种方法应该更快，特别是如果您已经在内存中存储了数据，并且可以使用基本排序，这比排序对象快得多。
不要排序-这是O（n log n）。有一个O（n）+O（n logn）解决方案：

创建一个Map
来保存每个数字O（n）的计数
对数组进行一次遍历，创建/更新计数O（n）
在地图上划一圈，保持前N个最大，可能使用可导航地图O（N log N）

如果N@assylias你提到的3个链接是关于另一个问题“top N”，而不是“top N Occurrence”。事实上，heapCheck是“top N”问题的最佳解决方案，但它只是整个问题的一部分。哦，对不起，我误读了你的问题。你是否尝试了第一个使用HashMap doc\u counts=new HashMap（alld.length，1.0f）的方法？@assylas我试过这个，性能几乎没有提高，大约170ms。你试过使用IntMap或Multiset吗？它们不在Java公共库中，但速度可能会更快。我尝试了这个方法，性能几乎没有提高，仍然提高了约170ms。如果不不断替换映射中的整数，而是创建一个可变的类Count{public int value；}
，然后在找到正确的实例后，增加它包含的计数，会怎么样。这将使地图查找数量减半。