使用dplyr进行条件匹配和计数_R_Dplyr_Match

使用dplyr进行条件匹配和计数

使用dplyr进行条件匹配和计数,r,dplyr,match,R,Dplyr,Match,想象一下，我的朋友根据我的预算向我推荐一些汽车。我想为每个预算确定所有朋友推荐相同品牌的次数，以及所有朋友推荐相同品牌和型号的次数 budget <- c(rep(c("broke", "modest", "dreaming"), each = 3), rep("broke", 3)) friend <- (rep(c("mark", "mary", "monelle"), 4)) make <- c(rep("ford", 3), rep("honda", 3), "porc

想象一下，我的朋友根据我的预算向我推荐一些汽车。我想为每个预算确定所有朋友推荐相同品牌的次数，以及所有朋友推荐相同品牌和型号的次数

budget <- c(rep(c("broke", "modest", "dreaming"), each = 3), rep("broke", 3))
friend <- (rep(c("mark", "mary", "monelle"), 4))
make <- c(rep("ford", 3), rep("honda", 3), "porche", rep("bmw",2), rep("bicycle", 3))
model <- c(rep("fiesta", 3), rep("civic", 2), "tacoma", "911", "i3", "Z4", rep("used", 3))

df <- data.frame(budget, friend, make, model)

     budget  friend    make  model
1     broke    mark    ford fiesta
2     broke    mary    ford fiesta
3     broke monelle    ford fiesta
4    modest    mark   honda  civic
5    modest    mary   honda  civic
6    modest monelle   honda tacoma
7  dreaming    mark  porche    911
8  dreaming    mary     bmw     i3
9  dreaming monelle     bmw     Z4
10    broke    mark bicycle   used
11    broke    mary bicycle   used
12    broke monelle bicycle   used

这里有一种使用dplyr的方法

df %>% 
  spread(friend, model) %>% 
  mutate(
    matchMake = apply(.[3:5], 1, function(x) !anyNA(x)),
    matchMake_Model = apply(.[3:5], 1, function(x) all(x[1] == x))
  ) %>% 
  group_by(budget) %>% 
  summarise(
    matchMake = sum(matchMake, na.rm = T),
    matchMake_Model = sum(matchMake_Model, na.rm = T)
  ) 

# A tibble: 3 x 3
  budget   matchMake matchMake_Model
  <fct>        <int>           <int>
1 broke            2               2
2 dreaming         0               0
3 modest           1               0

df%>%
传播（朋友、型号）%>%
变异(
matchMake=apply（[3:5]，1，函数（x）！anyNA（x）），
matchMake_Model=apply（[3:5]，1，函数（x）all（x[1]==x））
) %>% 
分组依据（预算）%>%
总结(
匹配=总和（匹配，na.rm=T），
配对模型=总和（配对模型，na.rm=T）
) 
#一个tibble:3x3
预算匹配模型
1破2破2
2.0.0
3.1.0

使用plyr的拆分应用联合收割机（根据变量“预算”进行拆分）和使用

计数来测试指定相同品牌或品牌/型号的次数
ddply(df, .(budget), function(df_budget) 
      c(matchMake = sum(count(df_budget, "make")$freq > 1), 
        matchMakeModel = sum(count(df_budget, c("make", "model"))$freq > 1)))

非常有用。我想要数字，因为我们可以想象这样的场景：朋友们为每种预算类型推荐>1辆车。然后我试着计算每预算的T/F频率。。。我更新了问题以扩展reprex：是否也为更新的示例添加所需的输出。希望这有意义？谢谢你的帮助！您的预期结果与您的reprex输入不符-如果您正在寻找更多答案，请确保您的reprex从端到端都是合理的。至于这个问题，看看dplyr:：count你的问题是不明确的。你说的“我的朋友告诉我买同一辆车的频率”是什么意思？如果有人告诉你买一次相同的牌子，那是0还是1？如果他们告诉你两次，是1还是2？
ddply(df, .(budget), function(df_budget) 
      c(matchMake = sum(count(df_budget, "make")$freq > 1), 
        matchMakeModel = sum(count(df_budget, c("make", "model"))$freq > 1)))