按组划分的频率表,加权数据为R

按组划分的频率表,加权数据为R,r,group-by,frequency,weighted,frequency-distribution,R,Group By,Frequency,Weighted,Frequency Distribution,我想用加权数据分组计算两种频率表 您可以使用以下代码生成可复制的数据: Data <- data.frame( country = sample(c("France", "USA", "UK"), 100, replace = TRUE), migrant = sample(c("Native", "Foreign-born"), 100, replace = TRUE), gender = sample (c("men", "women"), 100, re

我想用加权数据分组计算两种频率表

您可以使用以下代码生成可复制的数据:

Data <- data.frame(
     country = sample(c("France", "USA", "UK"), 100, replace = TRUE),
     migrant = sample(c("Native", "Foreign-born"), 100, replace = TRUE),
     gender = sample (c("men", "women"), 100, replace = TRUE),
     wgt = sample(100),
     year = sample(2006:2007)
     )
在我的真实数据库中,我有10年的时间,所以需要花很多时间来应用这些代码。有人知道更快的方法吗

我还希望按国家和年份计算移民身份中的男女比例。我正在寻找类似于:

Var1            Var2     Var3     y2006   y2007
Foreign born    France   men        52     55
Foreign born    France   women      48     45
Native          France   men        51     52
Native          France   women      49     48
Foreign born    UK       men        60     65
Foreign born    UK       women      40     35
Native          UK       men        48     50
Native          UK       women      52     50

有人知道我如何得到这些结果吗?

你可以这样做:用你已经编写的代码生成一个函数;使用
lappy
在数据中的所有年份中迭代该函数;然后使用
Reduce
merge
将结果列表折叠为一个数据帧。像这样:

# let's make your code into a function called 'tallyho'
tallyho <- function(yr, data) {

  require(dplyr)
  require(questionr)

  DF <- filter(data, year == yr)

  result <- with(DF, as.data.frame(cprop(wtd.table(migrant, country, weights = wgt), total = FALSE)))

  # rename the last column by year
  names(result)[length(names(result))] <- sprintf("y%s", year)

  return(result)

}

# now iterate that function over all years in your original data set, then 
# use Reduce and merge to collapse the resulting list into a data frame
NewData <- lapply(unique(Data$year), function(x) tallyho(x, Data)) %>%
  Reduce(function(...) merge(..., all=T), .)
#让我们将代码生成一个名为“tallyho”的函数

tallyho TIL about
Reduce()
非常感谢@ulfeld的回答,但我遇到了一些麻烦。当我运行代码时,我得到了与2006年和2007年完全相同的结果,这是不正确的……你知道我如何改进它吗?你知道我如何添加关于性别的信息吗?对不起,试试我刚刚发布的编辑版本。我想我给函数输入起了与列相同的名称,这让我很困惑。不幸的是,我不认为可以在这种方法中添加性别,因为
wtd.table
只允许双向交叉表。我对这些权重的作用了解不够,无法提出替代方案。
# let's make your code into a function called 'tallyho'
tallyho <- function(yr, data) {

  require(dplyr)
  require(questionr)

  DF <- filter(data, year == yr)

  result <- with(DF, as.data.frame(cprop(wtd.table(migrant, country, weights = wgt), total = FALSE)))

  # rename the last column by year
  names(result)[length(names(result))] <- sprintf("y%s", year)

  return(result)

}

# now iterate that function over all years in your original data set, then 
# use Reduce and merge to collapse the resulting list into a data frame
NewData <- lapply(unique(Data$year), function(x) tallyho(x, Data)) %>%
  Reduce(function(...) merge(..., all=T), .)