Algorithm S个项目,M个桶加权选择算法

Algorithm S个项目,M个桶加权选择算法,algorithm,sampling,Algorithm,Sampling,我想从M个桶中总共抽取S个样品。每个水桶都有一个重量W,用于描述最终样本中给定水桶中物品的表示。例如,如果我有重量分别为0.5、0.2和0.3的桶A、B和C,以及每个桶足够多的样本,那么如果我的最终样本大小S=10,我希望我的样本包含来自桶A的5个样本,来自桶B的2个样本,对于桶C和3。当考虑到每个桶可能不包含根据重量和总样本量计算的所需样本数时,问题变得更加复杂。在这种情况下,需要调整其他权重,以便交付尽可能接近所需加权表示的样本。有人知道这样做的算法吗?如果我正确理解了这个问题,我只会选择每

我想从M个桶中总共抽取S个样品。每个水桶都有一个重量W,用于描述最终样本中给定水桶中物品的表示。例如,如果我有重量分别为0.5、0.2和0.3的桶A、B和C,以及每个桶足够多的样本,那么如果我的最终样本大小S=10,我希望我的样本包含来自桶A的5个样本,来自桶B的2个样本,对于桶C和3。当考虑到每个桶可能不包含根据重量和总样本量计算的所需样本数时,问题变得更加复杂。在这种情况下,需要调整其他权重,以便交付尽可能接近所需加权表示的样本。有人知道这样做的算法吗?

如果我正确理解了这个问题,我只会选择每个结果的
下限,然后分配剩余部分

让我们以三个铲斗A、B和C为例,它们的重量分别为0.5、0.2和0.3,但这次s=13

A: floor(13 * 0.5) = floor(6.5) = 6 
B: floor(13 * 0.2) = floor(2.6) = 2
C: floor(13 * 0.3) = floor(3.9) = 3
我们从A中的6个样本开始,B中的2个样本,C中的3个样本,剩下2个样本

为了选择放置剩余样本的桶,我们按照桶容量的分数降序对桶进行排序。A的分数电容为0.5,B为0.6,C为0.9,因此剩余的样品添加到C和B中,最终结果为

A: 6
B: 3
C: 4

我用Java编写了一个解决方案。由于舍入错误,它可能会返回比要求的多一到两个样本,但这对我的应用程序来说没问题。如果你发现有什么方法可以改进算法,请随时发布解决方案

SampleNode.java

public abstract class SampleNode {
    protected double weight;

    protected abstract int getNumSamplesAvailable();
    protected abstract boolean hasSamples();
    protected abstract int takeAllSamples();
    protected abstract void sample(int target);
    public abstract boolean takeOneSample();
}
public class LeafSampleNode extends SampleNode {
    private int numselected;
    private int numsamplesavailable;

    public LeafSampleNode(double weight, int numsamplesavailable) {
        this.weight = weight;
        this.numsamplesavailable = numsamplesavailable;
        this.numselected = 0;
    }

    protected void sample(int target) {
        if(target >= numsamplesavailable) {
            takeAllSamples();
        }
        else {
            numselected += target;
            numsamplesavailable -= target;
        }
    }

    @Override
    protected int getNumSamplesAvailable() {
        return numsamplesavailable;     
    }

    protected boolean hasSamples() {
        return numsamplesavailable > 0;
    }

    protected int getNumselected() {
        return numselected;
    }

    protected int takeAllSamples() {
        int samplestaken = numsamplesavailable;
        numselected += numsamplesavailable;
        numsamplesavailable = 0;
        return samplestaken;
    }
@Override
public boolean takeOneSample() {
    if(hasSamples()) {
        numsamplesavailable--;
        numselected++;
        return true;
    }
    return false;
}
}
LeafSampleNode.java

public abstract class SampleNode {
    protected double weight;

    protected abstract int getNumSamplesAvailable();
    protected abstract boolean hasSamples();
    protected abstract int takeAllSamples();
    protected abstract void sample(int target);
    public abstract boolean takeOneSample();
}
public class LeafSampleNode extends SampleNode {
    private int numselected;
    private int numsamplesavailable;

    public LeafSampleNode(double weight, int numsamplesavailable) {
        this.weight = weight;
        this.numsamplesavailable = numsamplesavailable;
        this.numselected = 0;
    }

    protected void sample(int target) {
        if(target >= numsamplesavailable) {
            takeAllSamples();
        }
        else {
            numselected += target;
            numsamplesavailable -= target;
        }
    }

    @Override
    protected int getNumSamplesAvailable() {
        return numsamplesavailable;     
    }

    protected boolean hasSamples() {
        return numsamplesavailable > 0;
    }

    protected int getNumselected() {
        return numselected;
    }

    protected int takeAllSamples() {
        int samplestaken = numsamplesavailable;
        numselected += numsamplesavailable;
        numsamplesavailable = 0;
        return samplestaken;
    }
@Override
public boolean takeOneSample() {
    if(hasSamples()) {
        numsamplesavailable--;
        numselected++;
        return true;
    }
    return false;
}
}
RootSampleNode.java:

import java.util.ArrayList;
import java.util.List;

public class RootSampleNode extends SampleNode {    
    private List<SampleNode> children;

    public RootSampleNode(double weight) {
        this.children = new ArrayList<SampleNode>();
        this.weight = weight;
    }

    public void selectSample(int target) {
        int totalsamples = getNumSamplesAvailable();
        if(totalsamples < target) { 
            //not enough samples to meet target, simply take everything
            for(int i = 0; i < children.size(); i++) {
                children.get(i).takeAllSamples();
            }
        }
        else {
            //there are enough samples to meet target, distribute to meet quotas as closely as possible
            sample(target);
        }
    }

    protected void sample(int target) {
        int samplestaken = 0;
        double totalweight = getTotalWeight(children);
        samplestaken +=  sample(totalweight, target, children);
        if(samplestaken < target) {
            sample(target - samplestaken);
        }
    }

    private int sample(double totalweight, int target, List<SampleNode> children) {
        int samplestaken = 0;
        for(int i = 0; i < children.size(); i++) {
            SampleNode child = children.get(i);
            if(child.hasSamples()) {
                int desired = (int) (target * (child.weight / totalweight) + 0.5);
                if(desired >= child.getNumSamplesAvailable()) {
                    samplestaken += child.takeAllSamples();
                }
                else {
                    child.sample(desired);
                    samplestaken += desired;
                }
            }           
        }
    if(samplestaken == 0) { //avoid deadlock / stack overflow...someone just take a sample
        for(int i = 0; i < children.size(); i++) {
            if(children.get(i).takeOneSample()) {
                samplestaken++;
                break;
            }   
        }
    }
        return samplestaken;
    }

@Override
public boolean takeOneSample() {
    if(hasSamples()) {
        for(int i = 0; i < children.size(); i++) {
            if(children.get(i).takeOneSample()) {
                return true;
            }
        }           
    }
    return false;
}

    protected double getTotalWeight(List<SampleNode> children) {
        double totalweight = 0;
        for(int i = 0; i < children.size(); i++) {
            SampleNode child = children.get(i);
            if(child.hasSamples()) {
                totalweight += child.weight;
            }
        }
        return totalweight;
    }

    protected boolean hasSamples() {
        for(int i = 0; i < children.size(); i++) {
            if(children.get(i).hasSamples()) {
                return true;
            }
        }
        return false;
    }

    protected int takeAllSamples() {
        int samplestaken = 0;
        for(int i = 0; i < children.size(); i++) {
            samplestaken += children.get(i).takeAllSamples();
        }
        return samplestaken;
    }

    protected int getNumSamplesAvailable() {
        int numsamplesavailable = 0;
        for(int i = 0; i < children.size(); i++) {
            numsamplesavailable += children.get(i).getNumSamplesAvailable();
        }
        return numsamplesavailable;
    }

    public void addChild(SampleNode sn) {
        this.children.add(sn);
    }
}

希望有人会觉得这很有用。

我的建议是编写一个循环,从当前重量与所需重量相差最远且不为空的桶中取样。下面是一些伪代码。显然,您希望将其推广到更多桶中,但这应该会给您一个想法

set buckets[] = { // original items };
double weights[] = { 0.5, 0.2, 0.3}; // the desired weights
int counts[] = { 0, 0, 0 };  // number of items sampled so far

for (i = 0; i < n; i++) {
  double errors[] = { 0.0, 0.0, 0.0 };
  for (j = 0; j < 3; j++) {
    if (!empty(buckets[j]))
      errors[j] = abs(weights[j] - (counts[j] / n))
    else
      errors[j] = 0;
  }
  // choose the non-empty bucket whose current weight is 
  // furthest from the desired weight
  k = argmax(errors);
  sample(buckets[k]);  // take an item out of that bucket
  counts[k]++;         // increment count
}
set bucket[]={//原始项目};
双权重[]={0.5,0.2,0.3};//所需权重
整数计数[]={0,0,0};//迄今为止抽样的项目数
对于(i=0;i

如果您需要将其转换为有效的Java,我可能会被转换为:)。这将始终产生
n
样本(假设至少有
n
项,否则将对所有项进行采样),其分布尽可能接近所需的权重。

如果S=13,但A仅包含3项,情况如何?如何选择尽可能接近原始权重的样本。显然,A只能得到3/13=0.23,而不是期望的0.5。在这种情况下,如果可能的话,应对B进行采样以获得地板(10*0.2/0.5),并对C进行采样以获得地板(10*0.3/0.5)。我想我看到了算法……如果每个桶都有足够的样本,就按照上面的方法做。否则,从最受约束的桶开始,采集所有样本,称量其他桶的重量,再次采样,直到完成为止。@486DX2-66如果S=13且A包含3项,则A的重量不为0.5。那么首先是什么,样品的重量还是数量?您可以具有目标权重并相应地分布样本,也可以相应地分布样本并计算权重。权重表示最终样本中每个元素类型的相对比例/表示。因此,如果我有两个重量相等的桶,我想选择10个项目,但桶A只有3个项目,而桶B有50个项目,那么最好的重量样本是从A中取出所有3个项目,然后再加上所需的数量,得出总共10个项目(从B中取出50个项目中的7个)。如果我没有很好地描述这个问题,我很抱歉。