使用dplyr进行条件匹配和计数
想象一下,我的朋友根据我的预算向我推荐一些汽车。我想为每个预算确定所有朋友推荐相同品牌的次数,以及所有朋友推荐相同品牌和型号的次数使用dplyr进行条件匹配和计数,r,dplyr,match,R,Dplyr,Match,想象一下,我的朋友根据我的预算向我推荐一些汽车。我想为每个预算确定所有朋友推荐相同品牌的次数,以及所有朋友推荐相同品牌和型号的次数 budget <- c(rep(c("broke", "modest", "dreaming"), each = 3), rep("broke", 3)) friend <- (rep(c("mark", "mary", "monelle"), 4)) make <- c(rep("ford", 3), rep("honda", 3), "porc
budget <- c(rep(c("broke", "modest", "dreaming"), each = 3), rep("broke", 3))
friend <- (rep(c("mark", "mary", "monelle"), 4))
make <- c(rep("ford", 3), rep("honda", 3), "porche", rep("bmw",2), rep("bicycle", 3))
model <- c(rep("fiesta", 3), rep("civic", 2), "tacoma", "911", "i3", "Z4", rep("used", 3))
df <- data.frame(budget, friend, make, model)
budget friend make model
1 broke mark ford fiesta
2 broke mary ford fiesta
3 broke monelle ford fiesta
4 modest mark honda civic
5 modest mary honda civic
6 modest monelle honda tacoma
7 dreaming mark porche 911
8 dreaming mary bmw i3
9 dreaming monelle bmw Z4
10 broke mark bicycle used
11 broke mary bicycle used
12 broke monelle bicycle used
这里有一种使用dplyr的方法
df %>%
spread(friend, model) %>%
mutate(
matchMake = apply(.[3:5], 1, function(x) !anyNA(x)),
matchMake_Model = apply(.[3:5], 1, function(x) all(x[1] == x))
) %>%
group_by(budget) %>%
summarise(
matchMake = sum(matchMake, na.rm = T),
matchMake_Model = sum(matchMake_Model, na.rm = T)
)
# A tibble: 3 x 3
budget matchMake matchMake_Model
<fct> <int> <int>
1 broke 2 2
2 dreaming 0 0
3 modest 1 0
df%>%
传播(朋友、型号)%>%
变异(
matchMake=apply([3:5],1,函数(x)!anyNA(x)),
matchMake_Model=apply([3:5],1,函数(x)all(x[1]==x))
) %>%
分组依据(预算)%>%
总结(
匹配=总和(匹配,na.rm=T),
配对模型=总和(配对模型,na.rm=T)
)
#一个tibble:3x3
预算匹配模型
1破2破2
2.0.0
3.1.0
使用plyr的拆分应用联合收割机(根据变量“预算”进行拆分)和使用计数来测试指定相同品牌或品牌/型号的次数
ddply(df, .(budget), function(df_budget)
c(matchMake = sum(count(df_budget, "make")$freq > 1),
matchMakeModel = sum(count(df_budget, c("make", "model"))$freq > 1)))
非常有用。我想要数字,因为我们可以想象这样的场景:朋友们为每种预算类型推荐>1辆车。然后我试着计算每预算的T/F频率。。。我更新了问题以扩展reprex:是否也为更新的示例添加所需的输出。希望这有意义?谢谢你的帮助!您的预期结果与您的reprex输入不符-如果您正在寻找更多答案,请确保您的reprex从端到端都是合理的。至于这个问题,看看dplyr::count
你的问题是不明确的。你说的“我的朋友告诉我买同一辆车的频率”是什么意思?如果有人告诉你买一次相同的牌子,那是0还是1?如果他们告诉你两次,是1还是2?
ddply(df, .(budget), function(df_budget)
c(matchMake = sum(count(df_budget, "make")$freq > 1),
matchMakeModel = sum(count(df_budget, c("make", "model"))$freq > 1)))