R:将每行中的内容四舍五入,使行总数等于我指定的数字

R:将每行中的内容四舍五入,使行总数等于我指定的数字,r,function,rounding,R,Function,Rounding,我有170行小数,需要四舍五入到整数。但是,总行数必须等于我指定的数字 作为一个非常基本的说明,假设我有一个矩阵(1x4),其中包含单元格内容(1.2、3.4、7.7、5.3)。但假设这些数字代表个体,所以我需要将它们四舍五入为整数,这样群体人口等于18个个体的总人口。如果我简单地对矩阵内容进行四舍五入(1,3,8,5),我的总人口是17,我需要它等于18(参见下面的R命令) 我想出了一个相对直接但懒惰的方法来解决你的问题。基本思路是:1。检查第二次需要舍入的额外数字数量;2.动态排序第二次最好

我有170行小数,需要四舍五入到整数。但是,总行数必须等于我指定的数字

作为一个非常基本的说明,假设我有一个矩阵(1x4),其中包含单元格内容(1.2、3.4、7.7、5.3)。但假设这些数字代表个体,所以我需要将它们四舍五入为整数,这样群体人口等于18个个体的总人口。如果我简单地对矩阵内容进行四舍五入(1,3,8,5),我的总人口是17,我需要它等于18(参见下面的R命令)


我想出了一个相对直接但懒惰的方法来解决你的问题。基本思路是:1。检查第二次需要舍入的额外数字数量;2.动态排序第二次最好四舍五入的数字

我使用了您上面引用的数据集“B”,四舍五入的总和为58701;我将指定的轮输出设置为58711

raw <- B
round <- round(B)
data <- data.frame(raw, round)
calc_sum = sum(data$round)
desig_sum = 58711
data$residual = abs(data$raw - data$round)
data$above = ifelse(data$round > data$raw, 1, 0)
data$round2 = 0
data1 <- data[order(data$residual),]

if (calc_sum < desig_sum) {
    diff = desig_sum - calc_sum
    count = 0
    while (count < diff) {
        for (i in 1:nrow(data1)) {
            data_tmp <- subset(data1, round2 == 0 & above == 0)
# Finding out which the next number is for its second rounding
            if (data1[i,4] == 0 & data1[i,3] == max(data_tmp$residual)) {
                data1[i,5] = data1[i,2] + 1
                count = count + 1
            } else {
                count = count
            }
        }
    }
}

data2 <- data1[order(as.numeric(rownames(data1))),]
# Reverting back to the original order

data2$output = 0    
for (i in 1:nrow(data2)) {
    if (data2[i,5] != 0) {
        data2[i,6] = data2[i,5]
    } else {
        data2[i,6] = data2[i,1]
    }
}


data_final = data2[,6]

raw有几种方法可以做到这一点,但请参考我上面的评论:

Round <- function(x, target) {
  r.x <- round(x)
  diff.x <- round(x) - x
  if ((s <- sum(r.x)) == target) {
    return(r.x)
  } else if (s > target) {
    select <- seq(along=x)[diff.x > 0]
    which <- which.max(diff.x[select])
    x[select[which]] <- r.x[select[which]] - 1
    Round(x, target)
  } else {
    select <- seq(along=x)[diff.x < 0]
    which <- which.min(diff.x[select])
    x[select[which]] <- r.x[select[which]] + 1
    Round(x, target)
  }
}

dat <- read.table(header = TRUE, row.names = paste0('place', 1:4),
                  text="race1 race2 total
                        1.2  2.1  3.4
                        3.4  3.6  7.0
                        7.7  0.8  8.5
                        5.3  1.4  6.7")

totals <- c(4.0, 7.0, 8.0, 7.0)
apply
的输出被转置到您想要的位置,因此我们将
t
结果

dat[3] <- totals

t(apply(dat, 1, function(x) Round(x[1:2], x[3])))

#        race1 race2
# place1     2     2
# place2     3     4
# place3     7     1
# place4     5     2
dat[3]对数值进行四舍五入的另一种方法,即总数值等于给定的数值,该方法也适用于中所示的情况

您可以定义是否在以下位置进行调整:

  • 最接近的数字
  • 最大数量
  • 随机分布
并选择小数位数

#Round to given total
#x..numeric vector
#target..sum of rounded x, if not given target = round(sum(x), digits)
#digits..number of decimal places
#closest..Make adjustment by changing closest number
#ref..reference level to calculate probability of adjustment, if ref==NA the probability of an adjustment is equal for all values of x
#random..should the adjustment be done stochastic or randomly
roundt <- function(x, target=NA, digits = 0, closest=TRUE, ref=0, random=FALSE) {
  if(is.na(target)) {target <- round(sum(x), digits)}
  if(all(x == 0)) {
    if(target == 0) {return(x)}
    x <- x + 1
  }
  xr <- round(x, digits)
  if(target == sum(xr)) {return(xr)}
  if(is.na(ref)) {
    wgt <- rep(1/length(x), length(x))
  } else {
    if(closest) {
      tt <- (x - xr) * sign(target - sum(xr)) + 10^-digits / 2
      wgt <- tt / sum(tt)
    } else {wgt <- abs(x-ref)/sum(abs(x-ref))}
  }
  if(random) {adj <- table(sample(factor(1:length(x)), size=abs(target - sum(xr))*10^digits, replace = T, prob=wgt))*sign(target - sum(xr))*10^-digits
  } else {adj <- diff(c(0,round(cumsum((target - sum(xr)) * wgt), digits)))}
  xr + adj
}

dat <- read.table(text="
race1 race2 total
1.2  2.1  4
3.4  3.6  7
7.7  0.8  8
5.3  1.4  7
3.4  3.6  5
7.7  0.8  12
-5  5  1
0    0    3
0    0    0
", header=T)

apply(dat, 1, function(x) roundt(x[1:2], x[3])) #Default round to target
apply(dat[1:6,], 1, function(x) roundt(x[1:2]*x[3]/sum(x[1:2]))) #Preadjust to target by multiplication
apply(dat, 1, function(x) roundt(x[1:2] + (x[3]-sum(x[1:2]))/2)) #Preadjust to target by addition
apply(dat, 1, function(x) roundt(x[1:2], x[3], cl=F)) #Prefer adjustment on large numbers
apply(dat, 1, function(x) roundt(x[1:2], x[3], ref=NA)) #Give all values the same probability of adjustment
apply(dat, 1, function(x) roundt(x[1:2], x[3], dig=1)) #Use one digit
apply(dat, 1, function(x) roundt(x[1:2], x[3], dig=1, random=TRUE)) #Make the adjustment by random sampling
#四舍五入到给定的总数
#x、 .数字向量
#目标..四舍五入x的总和,如果未给定目标=四舍五入(总和(x),位数)
#位数..小数位数
#最近的..通过更改最近的数字进行调整
#ref..计算调整概率的参考水平,如果ref==NA,则调整概率等于所有x值
#随机调整是随机调整还是随机调整

roundt您是否复制/粘贴粘贴(STATA[['b']],collapse=“,”)
的输出以存储为b和round?我不明白你的函数是否按需要工作,为什么你不能将它应用到整个数据集这是一个非常有用的函数,特别是对于percentages@rawr我正在复制“粘贴(STATA[['b']],collapse=“,”)的输出以存储为b。该函数对单个列有效。但我不想为每一列重新运行一个函数,因为我有170个。我希望能够调用所有需要四舍五入的列,并且能够调用最后一行,该行具有总计。列的总和应等于函数基本上定义了一个或多个特定条目为“plug”正确吗?所以和等于特定的数字。哪个条目扮演这个角色重要吗?因为解决这个问题的一个非常简单的方法是在行中指定一个特定的索引,例如第一个条目,作为插入差异的值。为什么以18而不是17结尾如此重要?你的分析可能一开始就误入歧途了!也就是说,为什么不将1到(N-1)元素四舍五入,并将第N个元素设置为18减去其余元素的总和?如果你想捏造数字,用一种简单的方法。这其实没什么用。OP发布了一个在原始问题中有效的round函数。这似乎很有希望,但我在应用到我的数据集时遇到了问题。我正在尝试将do.call命令应用于我的数据集(170行x 60列),但我遇到了以下错误:data.frame中的错误(value,row.names=rn,check.names=FALSE,check.rows=FALSE):“row.names”应指定其中一个变量。我没有像您使用dat那样使用命令手动输入数据,因此我的命令是:
dat2您不必更改row.names。我这么做只是为了让它看起来像你的例子。您也不需要
dat2[x,-3]
中的-3。我用它来删除示例中的第三列,即前两列的总和。如果所有60列都需要四舍五入,您可以只保留
dat2[x,]
。我不太清楚这个错误是怎么说的,如果你看这个例子,这个错误来自于行
select如果有人可以帮助解决这个相关的问题,我在原始代码中发现了一个bug。。。
raw <- B
round <- round(B)
data <- data.frame(raw, round)
calc_sum = sum(data$round)
desig_sum = 58711
data$residual = abs(data$raw - data$round)
data$above = ifelse(data$round > data$raw, 1, 0)
data$round2 = 0
data1 <- data[order(data$residual),]

if (calc_sum < desig_sum) {
    diff = desig_sum - calc_sum
    count = 0
    while (count < diff) {
        for (i in 1:nrow(data1)) {
            data_tmp <- subset(data1, round2 == 0 & above == 0)
# Finding out which the next number is for its second rounding
            if (data1[i,4] == 0 & data1[i,3] == max(data_tmp$residual)) {
                data1[i,5] = data1[i,2] + 1
                count = count + 1
            } else {
                count = count
            }
        }
    }
}

data2 <- data1[order(as.numeric(rownames(data1))),]
# Reverting back to the original order

data2$output = 0    
for (i in 1:nrow(data2)) {
    if (data2[i,5] != 0) {
        data2[i,6] = data2[i,5]
    } else {
        data2[i,6] = data2[i,1]
    }
}


data_final = data2[,6]
Round <- function(x, target) {
  r.x <- round(x)
  diff.x <- round(x) - x
  if ((s <- sum(r.x)) == target) {
    return(r.x)
  } else if (s > target) {
    select <- seq(along=x)[diff.x > 0]
    which <- which.max(diff.x[select])
    x[select[which]] <- r.x[select[which]] - 1
    Round(x, target)
  } else {
    select <- seq(along=x)[diff.x < 0]
    which <- which.min(diff.x[select])
    x[select[which]] <- r.x[select[which]] + 1
    Round(x, target)
  }
}

dat <- read.table(header = TRUE, row.names = paste0('place', 1:4),
                  text="race1 race2 total
                        1.2  2.1  3.4
                        3.4  3.6  7.0
                        7.7  0.8  8.5
                        5.3  1.4  6.7")

totals <- c(4.0, 7.0, 8.0, 7.0)
do.call(rbind, lapply(1:nrow(dat), function(x) Round(dat[x, -3], totals[x])))

#        race1 race2
# place1     2     2
# place2     3     4
# place3     7     1
# place4     5     2
dat[3] <- totals

t(apply(dat, 1, function(x) Round(x[1:2], x[3])))

#        race1 race2
# place1     2     2
# place2     3     4
# place3     7     1
# place4     5     2
#Round to given total
#x..numeric vector
#target..sum of rounded x, if not given target = round(sum(x), digits)
#digits..number of decimal places
#closest..Make adjustment by changing closest number
#ref..reference level to calculate probability of adjustment, if ref==NA the probability of an adjustment is equal for all values of x
#random..should the adjustment be done stochastic or randomly
roundt <- function(x, target=NA, digits = 0, closest=TRUE, ref=0, random=FALSE) {
  if(is.na(target)) {target <- round(sum(x), digits)}
  if(all(x == 0)) {
    if(target == 0) {return(x)}
    x <- x + 1
  }
  xr <- round(x, digits)
  if(target == sum(xr)) {return(xr)}
  if(is.na(ref)) {
    wgt <- rep(1/length(x), length(x))
  } else {
    if(closest) {
      tt <- (x - xr) * sign(target - sum(xr)) + 10^-digits / 2
      wgt <- tt / sum(tt)
    } else {wgt <- abs(x-ref)/sum(abs(x-ref))}
  }
  if(random) {adj <- table(sample(factor(1:length(x)), size=abs(target - sum(xr))*10^digits, replace = T, prob=wgt))*sign(target - sum(xr))*10^-digits
  } else {adj <- diff(c(0,round(cumsum((target - sum(xr)) * wgt), digits)))}
  xr + adj
}

dat <- read.table(text="
race1 race2 total
1.2  2.1  4
3.4  3.6  7
7.7  0.8  8
5.3  1.4  7
3.4  3.6  5
7.7  0.8  12
-5  5  1
0    0    3
0    0    0
", header=T)

apply(dat, 1, function(x) roundt(x[1:2], x[3])) #Default round to target
apply(dat[1:6,], 1, function(x) roundt(x[1:2]*x[3]/sum(x[1:2]))) #Preadjust to target by multiplication
apply(dat, 1, function(x) roundt(x[1:2] + (x[3]-sum(x[1:2]))/2)) #Preadjust to target by addition
apply(dat, 1, function(x) roundt(x[1:2], x[3], cl=F)) #Prefer adjustment on large numbers
apply(dat, 1, function(x) roundt(x[1:2], x[3], ref=NA)) #Give all values the same probability of adjustment
apply(dat, 1, function(x) roundt(x[1:2], x[3], dig=1)) #Use one digit
apply(dat, 1, function(x) roundt(x[1:2], x[3], dig=1, random=TRUE)) #Make the adjustment by random sampling