Java 获得概率图中值的更好方法是什么？出身背景_Java

Java 获得概率图中值的更好方法是什么？出身背景

java

Java 获得概率图中值的更好方法是什么？出身背景,java,Java,我已经编写了一个程序，用于计算与模辊或模辊组合相关的每个结果的概率。具体地说，当处理像“两个六面骰子相加”（又名卡坦骰子）这样的掷骰时，概率的计算方法是保存单个结果之间的映射，以及可以表示该结果的可能掷骰数。在本例中，数据结构是一个TreeMap，其形式如下： { [2, 1], [3, 2], [4, 3], [5, 4], [6, 5], [7, 6], [8, 5], [9, 4], [10, 3], [11, 2], [12, 1] } 在任何人询问之前，在此上下文中使用BigInte

我已经编写了一个程序，用于计算与模辊或模辊组合相关的每个结果的概率。具体地说，当处理像“两个六面骰子相加”（又名卡坦骰子）这样的掷骰时，概率的计算方法是保存单个结果之间的映射，以及可以表示该结果的可能掷骰数。在本例中，数据结构是一个

TreeMap

，其形式如下：

{
[2, 1],
[3, 2],
[4, 3],
[5, 4],
[6, 5],
[7, 6],
[8, 5],
[9, 4],
[10, 3],
[11, 2],
[12, 1]
}

在任何人询问之前，在此上下文中使用

BigInteger

并不过分，因为它被设计用于处理可能出现在程序中的任何可能的掷骰，并且掷骰像100d6（将100个六面骰子的掷骰加在一起）很快就得到了非常大的数字，我不想将其近似为

双精度

作为这个程序界面的一部分，我决定我希望这些掷骰的统计数据是可查询的，我希望程序查找的一个这样的统计数据是掷骰的中间值。我当前版本的算法总结了代表低于给定结果的所有结果的试验，从最低结果开始，如果总结果超过试验的50%，则将结果报告为中位数

这就是我编写代码的方式

//Is filled with values during object construction
TreeMap<Integer, BigInteger> probabilityMap = new TreeMap<>();

//Memoization to at least make sure we only make this calculation once
private Integer memoizedMedian = null;
public int getMedian() {
    if(memoizedMedian == null) {
        BigInteger trials = BigInteger.ZERO;
        BigInteger totalTrials = numOfTrials();
        for(Map.Entry<Integer, BigInteger> entry : probabilityMap.entrySet()) {
            //We're guaranteed to be iterating in order, due to how TreeMap's work
            trials = trials.add(entry.getValue());
            double percentile = trials.doubleValue() / totalTrials.doubleValue();
            if(percentile >= 0.5) {
                memoizedMedian = entry.getKey();
                break;
            }
        }
        //If we didn't find it, someone went wrong with the object initialization
        if(memoizedMedian == null)
            throw new RuntimeException("Probability Map was not properly Initialized");
    }
    return memoizedMedian;
}

显然，10是这个数据集的中位数，但算法在扫描地图中最后一个条目之前无法计算出来，对于更大、更复杂的概率地图，这可能需要一段时间才能实现

因此，我想改进算法，以更负责任地处理这类数据集，但我不确定采取哪种方法

为了更好地计算此数据集的中值，我应该对算法进行什么样的更改？我也对根数据结构的更改持开放态度，但这应该有适当的理由。

我对您的100d6示例的可能性数量没有经验，因此这可能是或可能不是最佳的优化方法，但是，当您使用一对桶来创建概率图（用于大小值）时，它会将密集的操作加载到前面。这也是与订单相关的，尽管可以通过双向再平衡方法创建与订单无关的订单。我继续使用整数，只是为了能够进行基本的数学运算

初始条目将非常不稳定，需要进行大量的再平衡。这样做的明显缺点是，您的创作性能受到了影响，但您的中位性能变为O（1）

小bucket总是包含中间值，可以找到max（smallbucket.keySet）。大铲斗包含钥匙组上方的所有部件，仅用于重新平衡请注意，如果真实中值落在两个转鼓之间，则这不是中值，即1d2的中值为0.5，如果仅使用整数作为中值，则无法返回该值

public class MedianMap {
    TreeMap<Integer, Integer> smallBucket = new TreeMap<>();    
    TreeMap<Integer, Integer> largeBucket = new TreeMap<>();

    Integer smallBucketSize = 0;
    Integer largeBucketSize = 0;
    Integer median = 0;

    public void add(int value, int trials) {
        //initial state is smallBucket should have more trials than largeBucket
        largeBucket.add(value, trials);
        largeBucket += trials;

        if(largeBucketSize > smallBucketSize) {
            rebalance();
        }
    }

    private void rebalance() {
        List<Integer> largeKeys = new ArrayList<>(largeBucket.keySet());
        Collections.sort(largeKeys);

        while(largeBucketSize > smallBucketSize) {
            //get the smallest bucket item to move over
            Integer key = largeKeys(0);
            Integer value = largeBucket.get(key);

            //move item from large to small bucket
            largeBucket.remove(key);
            smallBucket.add(key, value);

            //update bucket values
            largeBucketSize -= value;
            smallBucketSize += value; 

            //and the largest item in the small bucket is the new median
            median = key;

            //remove the first key from our large keys list
            largeKeys.remove(0);

            //repeat as necessary
        }
    }

    private int getMedian() {
        return median;
    }
}

公共类MedianMap{
TreeMap smallBucket=新的TreeMap（）；
TreeMap largeBucket=新TreeMap（）；
整数smallBucketSize=0；
整数largeBucketSize=0；
整数中值=0；
公共无效添加（int值，int试验）{
//初始状态是smallBucket应该比largeBucket有更多的试验
增加（价值，试验）；
大桶+=试验；
如果（大BucketSize>小BucketSize）{
再平衡（）；
}
}
私有空间再平衡（）{
List largeKeys=newarraylist（largeBucket.keySet（））；
集合。排序（大键）；
而（大BucketSize>小BucketSize）{
//获取要移动的最小桶项目
整数键=大键（0）；
整数值=largeBucket.get（键）；
//将物品从大桶移到小桶
大铲斗。卸下（钥匙）；
smallBucket.add（键、值）；
//更新存储桶值
largeBucketSize-=值；
smallBucketSize+=值；
//小桶里最大的一项是新的中位数
中位数=关键点；
//从大密钥列表中删除第一个密钥
大键。删除（0）；
//必要时重复上述步骤
}
}
private int getMedian（）{
返回中值；
}
}

你是说平均值而不是中位数吗？@forpas我肯定是指中位数。平均值总是要求我迭代整个数据集，因此我可以做的优化不多（显然，除了记忆结果）。我只需将数据集展平（或在开始时创建数据的平面版本），对其排序并查看中间元素。如果集合是动态的，则将元素添加到已排序的列表需要O（logn），查找中间值需要恒定的时间。如果原始数据集很大，描述如何在线性时间内找到中值。在我看来，关键是要有一个数据的平面副本。@jrook您可能需要查看此问题的参数。为此，我必须创建一个平坦版本的数据，该数据可能大于

biginger

的最大值，即。。。。。至少可以说非常大。整平数据集不是一个明智的解决方案，除非原始数据结构以某种戏剧性的方式发生更改，而且您还没有指定我将如何进行此操作。大于BigInteger的最大值？？？您的意思是将有一个具有

（2^32）^Integer.MAX\u值的数组

？我可以想象，即使是最有效的算法，在如此大的数据集上运行也会花费大量时间。正如我提供的链接中所解释的，中间带的中间值通常非常接近实际中间值。所以，如果你不需要一个精确的中位数，这可能有助于加快速度。

public class MedianMap {
    TreeMap<Integer, Integer> smallBucket = new TreeMap<>();    
    TreeMap<Integer, Integer> largeBucket = new TreeMap<>();

    Integer smallBucketSize = 0;
    Integer largeBucketSize = 0;
    Integer median = 0;

    public void add(int value, int trials) {
        //initial state is smallBucket should have more trials than largeBucket
        largeBucket.add(value, trials);
        largeBucket += trials;

        if(largeBucketSize > smallBucketSize) {
            rebalance();
        }
    }

    private void rebalance() {
        List<Integer> largeKeys = new ArrayList<>(largeBucket.keySet());
        Collections.sort(largeKeys);

        while(largeBucketSize > smallBucketSize) {
            //get the smallest bucket item to move over
            Integer key = largeKeys(0);
            Integer value = largeBucket.get(key);

            //move item from large to small bucket
            largeBucket.remove(key);
            smallBucket.add(key, value);

            //update bucket values
            largeBucketSize -= value;
            smallBucketSize += value; 

            //and the largest item in the small bucket is the new median
            median = key;

            //remove the first key from our large keys list
            largeKeys.remove(0);

            //repeat as necessary
        }
    }

    private int getMedian() {
        return median;
    }
}