使用dplyr进行条件匹配和计数

使用dplyr进行条件匹配和计数,r,dplyr,match,R,Dplyr,Match,想象一下,我的朋友根据我的预算向我推荐一些汽车。我想为每个预算确定所有朋友推荐相同品牌的次数,以及所有朋友推荐相同品牌和型号的次数 budget <- c(rep(c("broke", "modest", "dreaming"), each = 3), rep("broke", 3)) friend <- (rep(c("mark", "mary", "monelle"), 4)) make <- c(rep("ford", 3), rep("honda", 3), "porc

想象一下,我的朋友根据我的预算向我推荐一些汽车。我想为每个预算确定所有朋友推荐相同品牌的次数,以及所有朋友推荐相同品牌和型号的次数

budget <- c(rep(c("broke", "modest", "dreaming"), each = 3), rep("broke", 3))
friend <- (rep(c("mark", "mary", "monelle"), 4))
make <- c(rep("ford", 3), rep("honda", 3), "porche", rep("bmw",2), rep("bicycle", 3))
model <- c(rep("fiesta", 3), rep("civic", 2), "tacoma", "911", "i3", "Z4", rep("used", 3))

df <- data.frame(budget, friend, make, model)

     budget  friend    make  model
1     broke    mark    ford fiesta
2     broke    mary    ford fiesta
3     broke monelle    ford fiesta
4    modest    mark   honda  civic
5    modest    mary   honda  civic
6    modest monelle   honda tacoma
7  dreaming    mark  porche    911
8  dreaming    mary     bmw     i3
9  dreaming monelle     bmw     Z4
10    broke    mark bicycle   used
11    broke    mary bicycle   used
12    broke monelle bicycle   used

这里有一种使用dplyr的方法

df %>% 
  spread(friend, model) %>% 
  mutate(
    matchMake = apply(.[3:5], 1, function(x) !anyNA(x)),
    matchMake_Model = apply(.[3:5], 1, function(x) all(x[1] == x))
  ) %>% 
  group_by(budget) %>% 
  summarise(
    matchMake = sum(matchMake, na.rm = T),
    matchMake_Model = sum(matchMake_Model, na.rm = T)
  ) 

# A tibble: 3 x 3
  budget   matchMake matchMake_Model
  <fct>        <int>           <int>
1 broke            2               2
2 dreaming         0               0
3 modest           1               0
df%>%
传播(朋友、型号)%>%
变异(
matchMake=apply([3:5],1,函数(x)!anyNA(x)),
matchMake_Model=apply([3:5],1,函数(x)all(x[1]==x))
) %>% 
分组依据(预算)%>%
总结(
匹配=总和(匹配,na.rm=T),
配对模型=总和(配对模型,na.rm=T)
) 
#一个tibble:3x3
预算匹配模型
1破2破2
2.0.0
3.1.0

使用plyr的拆分应用联合收割机(根据变量“预算”进行拆分)和使用
计数来测试指定相同品牌或品牌/型号的次数

ddply(df, .(budget), function(df_budget) 
      c(matchMake = sum(count(df_budget, "make")$freq > 1), 
        matchMakeModel = sum(count(df_budget, c("make", "model"))$freq > 1)))

非常有用。我想要数字,因为我们可以想象这样的场景:朋友们为每种预算类型推荐>1辆车。然后我试着计算每预算的T/F频率。。。我更新了问题以扩展reprex:是否也为更新的示例添加所需的输出。希望这有意义?谢谢你的帮助!您的预期结果与您的reprex输入不符-如果您正在寻找更多答案,请确保您的reprex从端到端都是合理的。至于这个问题,看看
dplyr::count
你的问题是不明确的。你说的“我的朋友告诉我买同一辆车的频率”是什么意思?如果有人告诉你买一次相同的牌子,那是0还是1?如果他们告诉你两次,是1还是2?
ddply(df, .(budget), function(df_budget) 
      c(matchMake = sum(count(df_budget, "make")$freq > 1), 
        matchMakeModel = sum(count(df_budget, c("make", "model"))$freq > 1)))