R 从数据框中删除括号中的数据
因此,我有一个数据集,有49行和109个特征,其中的数据被格式化,以便每个条目都有一个平均值和sd值。以下是一个示例:R 从数据框中删除括号中的数据,r,dplyr,substring,character,tidyr,R,Dplyr,Substring,Character,Tidyr,因此,我有一个数据集,有49行和109个特征,其中的数据被格式化,以便每个条目都有一个平均值和sd值。以下是一个示例: > head(score_data[,1:4]) # A tibble: 6 x 4 Variable Overall `18 to 29` `30 to 39` <chr>
> head(score_data[,1:4])
# A tibble: 6 x 4
Variable Overall `18 to 29` `30 to 39`
<chr> <chr> <chr> <chr>
1 ts.tsmart_partisan_score (mean (sd)) 94.01 (9.73) 92.56 (10.82) 94.14 (9.55)
2 ts.tsmart_presidential_general_turnout_score (mean (sd)) 66.23 (24.38) 51.56 (20.02) 58.44 (24.36)
3 ts.tsmart_midterm_general_turnout_score (mean (sd)) 50.29 (29.05) 31.09 (18.81) 34.82 (22.15)
4 ts.tsmart_offyear_general_turnout_score (mean (sd)) 20.71 (15.08) 25.38 (17.36) 18.84 (14.35)
5 ts.tsmart_presidential_primary_turnout_score (mean (sd)) 48.34 (28.12) 38.26 (22.26) 36.19 (22.72)
6 ts.tsmart_non_presidential_primary_turnout_score (mean (sd)) 40.21 (29.00) 27.03 (20.14) 23.52 (19.32)
像这样的
means <- sapply(score_data[, -1], function(x) as.numeric(substr(x, 1,
regexpr(" ", x) - 1)))
means
# Overall 18 to 29 30 to 39
# [1,] 94.01 92.56 94.14
# [2,] 66.23 51.56 58.44
# [3,] 50.29 31.09 34.82
# [4,] 20.71 25.38 18.84
# [5,] 48.34 38.26 36.19
# [6,] 40.21 27.03 23.52
意味着一个简单的正则表达式应该做到这一点:
for (i in names(score_data)[-(1)]) {
score_data[[i]] <- as.numeric(gsub( " .*$", "", score_data[[i]] ))
}
for(i在名称中(分数数据)[-(1)]{
score_data[[i]]您可以使用gsub()和正则表达式删除括号内的任何字符,如下所示:
test <- score_data %>% mutate_at(vars(-Variable),funs(gsub("\\([^\\)]+\\)", "", ., perl = T)))
Variable Overall X18.to.29 X30.to.39
1 ts.tsmart_partisan_score (mean (sd)) 94.01 92.56 94.14
2 ts.tsmart_presidential_general_turnout_score (mean (sd)) 66.23 51.56 58.44
3 ts.tsmart_midterm_general_turnout_score (mean (sd)) 50.29 31.09 34.82
4 ts.tsmart_offyear_general_turnout_score (mean (sd)) 20.71 25.38 18.84
5 ts.tsmart_presidential_primary_turnout_score (mean (sd)) 48.34 38.26 36.19
6 ts.tsmart_non_presidential_primary_turnout_score (mean (sd)) 40.21 27.03 23.52
test%mutate\u在(vars(-Variable)、funs(gsub(“\\([^\\)]+\\”、“”、,,perl=T)))
变量总体X18.29至X10.29至X10.39
1 ts.tsmart_partisan_得分(平均值(sd))94.01 92.56 94.14
2 ts.tsmart_总统(一般)投票率(中位数)66.23 51.56 58.44
3 ts.tsmart_期中_一般_投票率_分数(平均值(sd))50.29 31.09 34.82
4 ts.tsmart_非年度(一般)投票率(平均分)20.71 25.38 18.84
5 ts.tsmart_总统_小学_投票率_分数(平均值(sd))48.34 38.26 36.19
6 ts.tsmart_非总统_小学_投票率_分数(平均值(sd))40.21 27.03 23.52
你的目的是将它们分开还是删除/删除偏执中的?@Onyambu我的意思是删除括号中的那些和括号前的前导空格。你能发布dput(head(score_data[,1:4])吗
刚刚编辑了文章以将其包括在内。您需要注意的一点是,这些列是数字列。因此,您可以按照自己的方式操作它们
for (i in names(score_data)[-(1)]) {
score_data[[i]] <- as.numeric(gsub( " .*$", "", score_data[[i]] ))
}
read.table(stringsAsFactors = F,text=gsub("\\(.*?\\)|\\)","",do.call(paste,dat)))
V1 V3 V4 V5
1 ts.tsmart_partisan_score 94.01 92.56 94.14
2 ts.tsmart_presidential_general_turnout_score 66.23 51.56 58.44
3 ts.tsmart_midterm_general_turnout_score 50.29 31.09 34.82
4 ts.tsmart_offyear_general_turnout_score 20.71 25.38 18.84
5 ts.tsmart_presidential_primary_turnout_score 48.34 38.26 36.19
6 ts.tsmart_non_presidential_primary_turnout_score 40.21 27.03 23.52
test <- score_data %>% mutate_at(vars(-Variable),funs(gsub("\\([^\\)]+\\)", "", ., perl = T)))
Variable Overall X18.to.29 X30.to.39
1 ts.tsmart_partisan_score (mean (sd)) 94.01 92.56 94.14
2 ts.tsmart_presidential_general_turnout_score (mean (sd)) 66.23 51.56 58.44
3 ts.tsmart_midterm_general_turnout_score (mean (sd)) 50.29 31.09 34.82
4 ts.tsmart_offyear_general_turnout_score (mean (sd)) 20.71 25.38 18.84
5 ts.tsmart_presidential_primary_turnout_score (mean (sd)) 48.34 38.26 36.19
6 ts.tsmart_non_presidential_primary_turnout_score (mean (sd)) 40.21 27.03 23.52