R 一条管道中的多个清理操作
这更像是我现在正在做的代码清理练习。我的初始数据如下所示:R 一条管道中的多个清理操作,r,dplyr,pipeline,R,Dplyr,Pipeline,这更像是我现在正在做的代码清理练习。我的初始数据如下所示: Year County Town ... Funding Received ... (90+ Variables total) 2016 a x Yes 2015 a y No 2014 a x Yes 2016 b z
Year County Town ... Funding Received ... (90+ Variables total)
2016 a x Yes
2015 a y No
2014 a x Yes
2016 b z Yes
我不知道如何从中获得已提交和已批准申请的数量,因此我将其转换为指标变量,并使用以下代码进行计数:
counties <- original_data %>%
select(county, funded, year) %>%
mutate(
a=ifelse(county == "a", 1,0),
b=ifelse(county == "b", 1,0),
c=ifelse(county == "c", 1,0),
... etc ...
)
countysum <- counties %>%
select(-funded) %>%
group_by(county, year) %>%
summarise_all(sum, na.rm = T)
然后将这些数据转换为两个数据框架(提交和资助),以使用以下代码获得每年每个县提交和资助的申请数量:
counties <- original_data %>%
select(county, funded, year) %>%
mutate(
a=ifelse(county == "a", 1,0),
b=ifelse(county == "b", 1,0),
c=ifelse(county == "c", 1,0),
... etc ...
)
countysum <- counties %>%
select(-funded) %>%
group_by(county, year) %>%
summarise_all(sum, na.rm = T)
但为了以更整洁的格式获取数据,我又使用了一些命令:
countysum$submitted <- rowSums(countysum[,3:15, na.rm = T) #3:15 are county indicator vars
countysum <- countysum[,-c(3:19)]
countysum$submitted我不太清楚您最终想要的输出是什么样子的,但我认为您可以利用逻辑值强制为整数的事实,跳过伪列的创建
library(dplyr)
byyear <- original_data %>%
group_by(county, year) %>%
summarize(
wasfunded = any(funded == "Yes", na.rm = T)
, submittedapplication = any(submittedapp == "Yes", na.rm = T) # I'm assuming did/didn't submit is one of the other variables
)
# if you don't need the byyear data for something else (I always seem to),
# you can pipe that straight into this next line
yrs_funded_by_county <- byyear %>%
summarize(
n_yrs_funded = sum(wasfunded)
, n_yrs_submitted = sum(submittedapplication)
, pct_awarded = n_yrs_funded/n_yrs_submitted # maybe you don't need a award rate, but I threw it it b/c it's the kind of stuff my grant person cares about
)
库(dplyr)
按年%
组别按(县、年)%>%
总结(
wasfunded=any(funded=Yes,na.rm=T)
,submittedapplication=any(submittedapp==“Yes”,na.rm=T)#我假设did/not submit是其他变量之一
)
#如果您不需要其他方面的byyear数据(我似乎总是这样),
#你可以直接把它输送到下一行
yrs_由_县提供资金%
总结(
n_年资金=总额(wasfunded)
,n_yrs_submitted=总和(提交申请)
,pct_dewarded=n_yrs_funded/n_yrs_submitted#也许你不需要奖励率,但我把它扔掉了b/c这是我的资助人关心的东西
)
我不太清楚您最终想要的输出是什么样子的,但我认为您可以利用逻辑值强制为整数的事实,跳过伪列的创建
library(dplyr)
byyear <- original_data %>%
group_by(county, year) %>%
summarize(
wasfunded = any(funded == "Yes", na.rm = T)
, submittedapplication = any(submittedapp == "Yes", na.rm = T) # I'm assuming did/didn't submit is one of the other variables
)
# if you don't need the byyear data for something else (I always seem to),
# you can pipe that straight into this next line
yrs_funded_by_county <- byyear %>%
summarize(
n_yrs_funded = sum(wasfunded)
, n_yrs_submitted = sum(submittedapplication)
, pct_awarded = n_yrs_funded/n_yrs_submitted # maybe you don't need a award rate, but I threw it it b/c it's the kind of stuff my grant person cares about
)
库(dplyr)
按年%
组别按(县、年)%>%
总结(
wasfunded=any(funded=Yes,na.rm=T)
,submittedapplication=any(submittedapp==“Yes”,na.rm=T)#我假设did/not submit是其他变量之一
)
#如果您不需要其他方面的byyear数据(我似乎总是这样),
#你可以直接把它输送到下一行
yrs_由_县提供资金%
总结(
n_年资金=总额(wasfunded)
,n_yrs_submitted=总和(提交申请)
,pct_dewarded=n_yrs_funded/n_yrs_submitted#也许你不需要奖励率,但我把它扔掉了b/c这是我的资助人关心的东西
)
看一看tidyr::spread
-我认为这就是您在第一节中试图做的,请展示一个小的可复制示例。在您的代码中,有资金
,但在示例中没有showed@akrun我的错误是,funded
对应于原始帖子中的“Funding Received”。请看一下tidyr::spread
-我认为这就是您在第一节中尝试做的,请展示一个小的可复制示例。在您的代码中,有资金
,但在示例中没有showed@akrun我的错误是,funded
对应于原始帖子中的“Funding Received”。