Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/node.js/37.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 一条管道中的多个清理操作_R_Dplyr_Pipeline - Fatal编程技术网

R 一条管道中的多个清理操作

R 一条管道中的多个清理操作,r,dplyr,pipeline,R,Dplyr,Pipeline,这更像是我现在正在做的代码清理练习。我的初始数据如下所示: Year County Town ... Funding Received ... (90+ Variables total) 2016 a x Yes 2015 a y No 2014 a x Yes 2016 b z

这更像是我现在正在做的代码清理练习。我的初始数据如下所示:

Year    County    Town  ...  Funding Received ... (90+ Variables total)
2016      a        x               Yes
2015      a        y               No
2014      a        x               Yes
2016      b        z               Yes
我不知道如何从中获得已提交和已批准申请的数量,因此我将其转换为指标变量,并使用以下代码进行计数:

counties <- original_data %>%
  select(county, funded, year) %>%
  mutate(
    a=ifelse(county == "a", 1,0),
    b=ifelse(county == "b", 1,0),
    c=ifelse(county == "c", 1,0),
    ... etc ...
  )
countysum <- counties %>%
  select(-funded) %>%
  group_by(county, year) %>%
  summarise_all(sum, na.rm = T)
然后将这些数据转换为两个数据框架(提交和资助),以使用以下代码获得每年每个县提交和资助的申请数量:

counties <- original_data %>%
  select(county, funded, year) %>%
  mutate(
    a=ifelse(county == "a", 1,0),
    b=ifelse(county == "b", 1,0),
    c=ifelse(county == "c", 1,0),
    ... etc ...
  )
countysum <- counties %>%
  select(-funded) %>%
  group_by(county, year) %>%
  summarise_all(sum, na.rm = T)
但为了以更整洁的格式获取数据,我又使用了一些命令:

countysum$submitted <- rowSums(countysum[,3:15, na.rm = T) #3:15 are county indicator vars
countysum <- countysum[,-c(3:19)]

countysum$submitted我不太清楚您最终想要的输出是什么样子的,但我认为您可以利用逻辑值强制为整数的事实,跳过伪列的创建

library(dplyr)

byyear  <- original_data %>% 
   group_by(county, year) %>% 
   summarize(
       wasfunded = any(funded == "Yes", na.rm = T)
     , submittedapplication = any(submittedapp == "Yes", na.rm = T) # I'm assuming did/didn't submit is one of the other variables
   ) 

# if you don't need the byyear data for something else (I always seem to), 
# you can pipe that straight into this next line
yrs_funded_by_county  <- byyear %>% 
  summarize(
      n_yrs_funded = sum(wasfunded)
    , n_yrs_submitted = sum(submittedapplication)
    , pct_awarded = n_yrs_funded/n_yrs_submitted  # maybe you don't need a award rate, but I threw it it b/c it's the kind of stuff my grant person cares about
  )
库(dplyr)
按年%
组别按(县、年)%>%
总结(
wasfunded=any(funded=Yes,na.rm=T)
,submittedapplication=any(submittedapp==“Yes”,na.rm=T)#我假设did/not submit是其他变量之一
) 
#如果您不需要其他方面的byyear数据(我似乎总是这样),
#你可以直接把它输送到下一行
yrs_由_县提供资金%
总结(
n_年资金=总额(wasfunded)
,n_yrs_submitted=总和(提交申请)
,pct_dewarded=n_yrs_funded/n_yrs_submitted#也许你不需要奖励率,但我把它扔掉了b/c这是我的资助人关心的东西
)

我不太清楚您最终想要的输出是什么样子的,但我认为您可以利用逻辑值强制为整数的事实,跳过伪列的创建

library(dplyr)

byyear  <- original_data %>% 
   group_by(county, year) %>% 
   summarize(
       wasfunded = any(funded == "Yes", na.rm = T)
     , submittedapplication = any(submittedapp == "Yes", na.rm = T) # I'm assuming did/didn't submit is one of the other variables
   ) 

# if you don't need the byyear data for something else (I always seem to), 
# you can pipe that straight into this next line
yrs_funded_by_county  <- byyear %>% 
  summarize(
      n_yrs_funded = sum(wasfunded)
    , n_yrs_submitted = sum(submittedapplication)
    , pct_awarded = n_yrs_funded/n_yrs_submitted  # maybe you don't need a award rate, but I threw it it b/c it's the kind of stuff my grant person cares about
  )
库(dplyr)
按年%
组别按(县、年)%>%
总结(
wasfunded=any(funded=Yes,na.rm=T)
,submittedapplication=any(submittedapp==“Yes”,na.rm=T)#我假设did/not submit是其他变量之一
) 
#如果您不需要其他方面的byyear数据(我似乎总是这样),
#你可以直接把它输送到下一行
yrs_由_县提供资金%
总结(
n_年资金=总额(wasfunded)
,n_yrs_submitted=总和(提交申请)
,pct_dewarded=n_yrs_funded/n_yrs_submitted#也许你不需要奖励率,但我把它扔掉了b/c这是我的资助人关心的东西
)

看一看
tidyr::spread
-我认为这就是您在第一节中试图做的,请展示一个小的可复制示例。在您的代码中,有
资金
,但在示例中没有showed@akrun我的错误是,
funded
对应于原始帖子中的“Funding Received”。请看一下
tidyr::spread
-我认为这就是您在第一节中尝试做的,请展示一个小的可复制示例。在您的代码中,有
资金
,但在示例中没有showed@akrun我的错误是,
funded
对应于原始帖子中的“Funding Received”。