R 数据集中的变量可以用作函数中的参数吗?
让数据为:R 数据集中的变量可以用作函数中的参数吗?,r,function,dplyr,R,Function,Dplyr,让数据为: > dput(df) structure(list(NAME.x = c("ANNE", "BOB", "CATHY", "DIANNE", "EMILY" ), NAME.y = c(NA, "BOB", "CATHY", "DIANNE", NA), AGE.x = c("81", "47", "47", "47", "37"), AGE.y = c(NA, "47", "47", "47", NA), ADMISSIONDATE.x = structure(c
> dput(df)
structure(list(NAME.x = c("ANNE", "BOB", "CATHY", "DIANNE", "EMILY"
), NAME.y = c(NA, "BOB", "CATHY", "DIANNE", NA), AGE.x = c("81",
"47", "47", "47", "37"), AGE.y = c(NA, "47", "47", "47", NA),
ADMISSIONDATE.x = structure(c(1380751296, 1382088000, 1382088000,
1382088000, 1383207720), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
ADMISSIONDATE.y = structure(c(NA, 1382088000, 1382088000,
1382088000, NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
DISCHARGEDDATE.x = structure(c(1381172735, 1382189165, 1382189165,
1382189165, 1383250549), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
DISCHARGEDDATE.y = structure(c(NA, 1382189165, 1382189165,
1382189165, NA), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,
-5L), .Names = c("NAME.x", "NAME.y", "AGE.x", "AGE.y", "ADMISSIONDATE.x",
"ADMISSIONDATE.y", "DISCHARGEDDATE.x", "DISCHARGEDDATE.y"), class = "data.frame")
我想检查这个数据集中常见变量之间的相似性和差异。我试图编写一个函数,其中3个参数是数据集,数据集中的2个变量
check<-function(data,var1,var2){
# X1: x and y are equal
# X2: x and y are not equal
# Y1: x and y are non-empty
# Y2: x and y are empty
# Z1: x is non-empty and y is empty
# Z2: x is empty and y is non-empty
cnt_each<-data %>%
mutate(X1 = (var1==var2),
X2 = (var1!=var2),
Y1 = (!is.na(var1) & !is.na(var2)),
Y2 = (is.na(var1) & is.na(var2)),
Z1 = (!is.na(var1) & is.na(var2)),
Z2 = (is.na(var1) & !is.na(var2))) %>%
summarise_at("X1:Z2",funs(sum(.))) %>%
mutate(sum_all=sum(.,na.rm=TRUE))
return(cnt_each)
}
mutate_impl(.data,dots)中出错:找不到对象“NAME.x”
我们可以利用devel版本的
dplyr
(即将发布的0.6.0
来实现这一点)。enquo
接受输入参数并转换为quosure
。在mutate/summary/group_by
中,quosures是不带引号的(!!
或UQ
)用于评估
check<-function(data,var1,var2){
var1 <- enquo(var1)
var2 <- enquo(var2)
data %>%
mutate(X1 = UQ(var1)==UQ(var2),
X2 = UQ(var1) != UQ(var2),
Y1 = !is.na(UQ(var1)) & !is.na(UQ(var2)),
Y2 = is.na(UQ(var1)) & is.na(UQ(var2)),
Z1 = !is.na(UQ(var1)) & !is.na(UQ(var2)),
Z2 = is.na(UQ(var1)) & !is.na(UQ(var2))) %>%
summarise_at(vars(X1:Z2), funs(sum(., na.rm = TRUE))) %>%
mutate(sum_all = rowSums(., na.rm = TRUE))
}
check(df, NAME.x, NAME.y)
# X1 X2 Y1 Y2 Z1 Z2 sum_all
#1 3 0 3 0 3 0 9
检查%
变异(sum_all=rowSums(,na.rm=TRUE))
}
检查(df,NAME.x,NAME.y)
#X1 X2 Y1 Y2 Z1 Z2总和
#1 3 0 3 0 3 0 9
请参见mutate\uu
以将字符串作为列名传递。因此enquo
和UQ
都将出现在新版本的dplyr
?@HNSKD Yes,以及quo\u name
和其他类似的东西中
,quo
或quos
等。看起来xxx\uU
类型的函数在下一个版本中将被淘汰?@zx8754,这些函数已被标记为“主要动词的不推荐SE版本”。在开发版本0.5.0.9004中。@zx8754似乎是这样的
check<-function(data,var1,var2){
var1 <- enquo(var1)
var2 <- enquo(var2)
data %>%
mutate(X1 = UQ(var1)==UQ(var2),
X2 = UQ(var1) != UQ(var2),
Y1 = !is.na(UQ(var1)) & !is.na(UQ(var2)),
Y2 = is.na(UQ(var1)) & is.na(UQ(var2)),
Z1 = !is.na(UQ(var1)) & !is.na(UQ(var2)),
Z2 = is.na(UQ(var1)) & !is.na(UQ(var2))) %>%
summarise_at(vars(X1:Z2), funs(sum(., na.rm = TRUE))) %>%
mutate(sum_all = rowSums(., na.rm = TRUE))
}
check(df, NAME.x, NAME.y)
# X1 X2 Y1 Y2 Z1 Z2 sum_all
#1 3 0 3 0 3 0 9