R 创建子组作为新变量
是否有一种方便的方法来生成包含子类型的新变量。(供分析) 例如,我们有R 创建子组作为新变量,r,R,是否有一种方便的方法来生成包含子类型的新变量。(供分析) 例如,我们有吸烟者状态、性别和生活质量 假设我们想测试吸烟者女性与非吸烟者女性在生活质量方面的差异 有没有一个方便快捷的方法!总路线!要获得我想要的亚组(femaleSmoker和maleSmoker) set.seed(1337) df您可以从dplyr尝试case\u: library(dplyr) df <- data.frame(smoker=sample(c("yes","no"),10,replace = T),sex=
吸烟者
状态、性别
和生活质量
假设我们想测试吸烟者女性与非吸烟者女性在生活质量方面的差异
有没有一个方便快捷的方法!总路线!要获得我想要的亚组(femaleSmoker
和maleSmoker
)
set.seed(1337)
df您可以从dplyr
尝试case\u:
library(dplyr)
df <- data.frame(smoker=sample(c("yes","no"),10,replace = T),sex=sample(c("male","female"),10,replace = T),lifeQuality=rnorm(10))
df%>%
mutate(subcat=case_when(
.$smoker == "yes" & .$sex == "male" ~ "maleSmoker",
.$smoker == "no" & .$sex == "male" ~ "maleNonSmoker",
.$smoker == "yes" & .$sex == "female" ~ "femaleSmoker",
. $smoker == "no" & .$sex == "female" ~ "femaleNonSmoker"))
smoker sex lifeQuality subcat
1 no male 1.969426 maleNonSmoker
2 yes male 1.192345 maleSmoker
3 yes male -0.762863 maleSmoker
4 no male -1.259429 maleNonSmoker
5 yes female -2.423066 femaleSmoker
6 no male 0.249120 maleNonSmoker
7 no female -0.455351 femaleNonSmoker
8 yes female -1.623958 femaleSmoker
9 no male 0.680503 maleNonSmoker
10 yes male -1.374085 maleSmoker
您可以从dplyr
尝试case\u:
library(dplyr)
df <- data.frame(smoker=sample(c("yes","no"),10,replace = T),sex=sample(c("male","female"),10,replace = T),lifeQuality=rnorm(10))
df%>%
mutate(subcat=case_when(
.$smoker == "yes" & .$sex == "male" ~ "maleSmoker",
.$smoker == "no" & .$sex == "male" ~ "maleNonSmoker",
.$smoker == "yes" & .$sex == "female" ~ "femaleSmoker",
. $smoker == "no" & .$sex == "female" ~ "femaleNonSmoker"))
smoker sex lifeQuality subcat
1 no male 1.969426 maleNonSmoker
2 yes male 1.192345 maleSmoker
3 yes male -0.762863 maleSmoker
4 no male -1.259429 maleNonSmoker
5 yes female -2.423066 femaleSmoker
6 no male 0.249120 maleNonSmoker
7 no female -0.455351 femaleNonSmoker
8 yes female -1.623958 femaleSmoker
9 no male 0.680503 maleNonSmoker
10 yes male -1.374085 maleSmoker
通解
fast.subgroups <- function(x,groups) {
groupsList <- strsplit(groups, "\\+")
for (i in length(groupsList):1) {
var <- groupsList[[i]]
lvl1 <- levels(factor(x[var[1]][,1]))
for(ii in length(lvl1):1) {
tmp <- paste(x[,var[1]],var[2],x[,var[2]],sep="_")
tmp[!(x[var[1]]==lvl1[ii])] <- NA
strCmd <- paste0("x <- cbind(",var[1],"_",lvl1[ii],"_",var[2],"=","tmp,x,stringsAsFactors = F)")
eval(parse(text = strCmd))
}
}
return(x)
}
结果:
sex_female_smoker sex_male_smoker ill_mild_sex ill_moderate_sex ill_severe_sex smoker sex ill lifeQuality
1 <NA> male_smoker_no <NA> <NA> severe_sex_male no male severe -1.32964336
2 female_smoker_no <NA> mild_sex_female <NA> <NA> no female mild -0.18078626
3 female_smoker_yes <NA> <NA> <NA> severe_sex_female yes female severe -0.32265873
4 <NA> male_smoker_yes mild_sex_male <NA> <NA> yes male mild 0.55766293
5 <NA> male_smoker_yes <NA> <NA> severe_sex_male yes male severe -0.23733258
6 female_smoker_yes <NA> <NA> moderate_sex_female <NA> yes female moderate -0.58239712
7 female_smoker_no <NA> <NA> <NA> severe_sex_female no female severe 0.22477526
8 <NA> male_smoker_yes <NA> <NA> severe_sex_male yes male severe 0.42577251
9 <NA> male_smoker_yes mild_sex_male <NA> <NA> yes male mild -0.66224169
10 female_smoker_yes <NA> mild_sex_female <NA> <NA> yes female mild 1.49037322
11 female_smoker_no <NA> <NA> <NA> severe_sex_female no female severe -1.11923261
12 female_smoker_no <NA> <NA> <NA> severe_sex_female no female severe 0.06867219
13 female_smoker_no <NA> <NA> moderate_sex_female <NA> no female moderate 0.12729929
14 <NA> male_smoker_yes <NA> moderate_sex_male <NA> yes male moderate 0.83248241
15 female_smoker_no <NA> mild_sex_female <NA> <NA> no female mild -1.51970610
>
sex\u女性\u吸烟者sex\u男性\u吸烟者疾病\u轻度\u性疾病\u中度\u性疾病\u重度\u吸烟者sex生活质量不佳
1名男性(吸烟者)(不严重)(性别)(男性)不严重(严重)-1.32964336
2女性吸烟者无轻度性行为女性无轻度女性-0.18078626
3女性吸烟者是严重性女性是女性严重-0.32265873
4男性吸烟者是轻度性别男性是男性轻度0.55766293
5男性吸烟者严重性男性严重性-0.23733258
6女性吸烟者是中等性别女性是女性中等-0.58239712
7女性吸烟者无严重性行为女性无严重性行为0.22477526
8男性\吸烟者\是严重\性别\男性是男性严重0.42577251
9男性吸烟者是轻度性别男性是男性轻度-0.66224169
10女性吸烟者是轻度性别女性是女性轻度1.49037322
11女性吸烟者不严重性女性不严重-1.11923261
12女性吸烟者无严重性行为女性无严重性行为0.06867219
13女性\吸烟者\无中度\性别\女性\无女性中度0.12729929
14男性-吸烟者-是中等-性别-男性-是中等0.83248241
15女性吸烟者无轻度性行为女性无轻度女性-1.51970610
>
通用解决方案
fast.subgroups <- function(x,groups) {
groupsList <- strsplit(groups, "\\+")
for (i in length(groupsList):1) {
var <- groupsList[[i]]
lvl1 <- levels(factor(x[var[1]][,1]))
for(ii in length(lvl1):1) {
tmp <- paste(x[,var[1]],var[2],x[,var[2]],sep="_")
tmp[!(x[var[1]]==lvl1[ii])] <- NA
strCmd <- paste0("x <- cbind(",var[1],"_",lvl1[ii],"_",var[2],"=","tmp,x,stringsAsFactors = F)")
eval(parse(text = strCmd))
}
}
return(x)
}
结果:
sex_female_smoker sex_male_smoker ill_mild_sex ill_moderate_sex ill_severe_sex smoker sex ill lifeQuality
1 <NA> male_smoker_no <NA> <NA> severe_sex_male no male severe -1.32964336
2 female_smoker_no <NA> mild_sex_female <NA> <NA> no female mild -0.18078626
3 female_smoker_yes <NA> <NA> <NA> severe_sex_female yes female severe -0.32265873
4 <NA> male_smoker_yes mild_sex_male <NA> <NA> yes male mild 0.55766293
5 <NA> male_smoker_yes <NA> <NA> severe_sex_male yes male severe -0.23733258
6 female_smoker_yes <NA> <NA> moderate_sex_female <NA> yes female moderate -0.58239712
7 female_smoker_no <NA> <NA> <NA> severe_sex_female no female severe 0.22477526
8 <NA> male_smoker_yes <NA> <NA> severe_sex_male yes male severe 0.42577251
9 <NA> male_smoker_yes mild_sex_male <NA> <NA> yes male mild -0.66224169
10 female_smoker_yes <NA> mild_sex_female <NA> <NA> yes female mild 1.49037322
11 female_smoker_no <NA> <NA> <NA> severe_sex_female no female severe -1.11923261
12 female_smoker_no <NA> <NA> <NA> severe_sex_female no female severe 0.06867219
13 female_smoker_no <NA> <NA> moderate_sex_female <NA> no female moderate 0.12729929
14 <NA> male_smoker_yes <NA> moderate_sex_male <NA> yes male moderate 0.83248241
15 female_smoker_no <NA> mild_sex_female <NA> <NA> no female mild -1.51970610
>
sex\u女性\u吸烟者sex\u男性\u吸烟者疾病\u轻度\u性疾病\u中度\u性疾病\u重度\u吸烟者sex生活质量不佳
1名男性(吸烟者)(不严重)(性别)(男性)不严重(严重)-1.32964336
2女性吸烟者无轻度性行为女性无轻度女性-0.18078626
3女性吸烟者是严重性女性是女性严重-0.32265873
4男性吸烟者是轻度性别男性是男性轻度0.55766293
5男性吸烟者严重性男性严重性-0.23733258
6女性吸烟者是中等性别女性是女性中等-0.58239712
7女性吸烟者无严重性行为女性无严重性行为0.22477526
8男性\吸烟者\是严重\性别\男性是男性严重0.42577251
9男性吸烟者是轻度性别男性是男性轻度-0.66224169
10女性吸烟者是轻度性别女性是女性轻度1.49037322
11女性吸烟者不严重性女性不严重-1.11923261
12女性吸烟者无严重性行为女性无严重性行为0.06867219
13女性\吸烟者\无中度\性别\女性\无女性中度0.12729929
14男性-吸烟者-是中等-性别-男性-是中等0.83248241
15女性吸烟者无轻度性行为女性无轻度女性-1.51970610
>
我不理解这个问题。电流输出有什么问题?没有问题。我正在寻找一种快速的方法来生成具有子类型的新列,如上面的示例所示。我想知道是否已经有了解决这个问题的办法。我做了一个函数,用最少的努力完成上面的工作。我不理解这个问题。电流输出有什么问题?没有问题。我正在寻找一种快速的方法来生成具有子类型的新列,如上面的示例所示。我想知道是否已经有了解决这个问题的办法。我做了一个函数,它以最小的努力完成上面的工作。
sex_female_smoker sex_male_smoker ill_mild_sex ill_moderate_sex ill_severe_sex smoker sex ill lifeQuality
1 <NA> male_smoker_no <NA> <NA> severe_sex_male no male severe -1.32964336
2 female_smoker_no <NA> mild_sex_female <NA> <NA> no female mild -0.18078626
3 female_smoker_yes <NA> <NA> <NA> severe_sex_female yes female severe -0.32265873
4 <NA> male_smoker_yes mild_sex_male <NA> <NA> yes male mild 0.55766293
5 <NA> male_smoker_yes <NA> <NA> severe_sex_male yes male severe -0.23733258
6 female_smoker_yes <NA> <NA> moderate_sex_female <NA> yes female moderate -0.58239712
7 female_smoker_no <NA> <NA> <NA> severe_sex_female no female severe 0.22477526
8 <NA> male_smoker_yes <NA> <NA> severe_sex_male yes male severe 0.42577251
9 <NA> male_smoker_yes mild_sex_male <NA> <NA> yes male mild -0.66224169
10 female_smoker_yes <NA> mild_sex_female <NA> <NA> yes female mild 1.49037322
11 female_smoker_no <NA> <NA> <NA> severe_sex_female no female severe -1.11923261
12 female_smoker_no <NA> <NA> <NA> severe_sex_female no female severe 0.06867219
13 female_smoker_no <NA> <NA> moderate_sex_female <NA> no female moderate 0.12729929
14 <NA> male_smoker_yes <NA> moderate_sex_male <NA> yes male moderate 0.83248241
15 female_smoker_no <NA> mild_sex_female <NA> <NA> no female mild -1.51970610
>