抽出主队和客队,以“分开”;at";在R
我有一个关于大学篮球比赛的向量:抽出主队和客队,以“分开”;at";在R,r,R,我有一个关于大学篮球比赛的向量: c("#34 Colorado at #36 California", "#31 Utah at #87 Stanford", "#26 USC at #112 Wash State", "#56 UCLA at #134 Washington", "#187 W Illinois at #116 Neb Omaha", "#222 Denver at #58 S Dakota St", "#245 IUPUI at #170
c("#34 Colorado at #36 California", "#31 Utah at #87 Stanford",
"#26 USC at #112 Wash State", "#56 UCLA at #134 Washington",
"#187 W Illinois at #116 Neb Omaha", "#222 Denver at #58 S Dakota St",
"#245 IUPUI at #170 South Dakota", "#268 Rice at #208 TX El Paso",
"#274 North Texas at #344 TX-San Ant", "#14 Iowa at #3 Purdue"
)
我想要两个单独的向量:一个用于在
之前的团队,另一个用于在
之后出现的团队。例如)第一个向量将有科罗拉多州
,犹他州
,南加州
,等等,第二个向量将有加利福尼亚州
,斯坦福州
,华盛顿州
,等等
注意,我多么不想要排名。我只想要球队的名字。我尝试了
str\u split
ing,但效果不太好,因为间距都不一致 我们可以使用strsplit并在“at”上拆分,这将为我们提供字符串的两部分,我们从每个部分中删除“#”,后跟数字,并将其放入数据帧中
data.frame(t(sapply(strsplit(string, "\\bat\\b"),
function(x) trimws(sub("#[0-9]+", "", x)))))
# X1 X2
#1 Colorado California
#2 Utah Stanford
#3 USC Wash State
#4 UCLA Washington
#5 W Illinois Neb Omaha
#6 Denver S Dakota St
#7 IUPUI South Dakota
#8 Rice TX El Paso
#9 North Texas TX-San Ant
#10 Iowa Purdue
或者使用
tidyr::separate
tidyr::separate(data.frame(col = trimws(gsub("#[0-9]+", "", string))),
col, into = c("T1", "T2"), sep = "\\bat\\b")
# T1 T2
#1 Colorado California
#2 Utah Stanford
#3 USC Wash State
#4 UCLA Washington
#5 W Illinois Neb Omaha
#6 Denver S Dakota St
#7 IUPUI South Dakota
#8 Rice TX El Paso
#9 North Texas TX-San Ant
#10 Iowa Purdue
我们可以使用
strsplit
并在“at”上拆分,这将为我们提供字符串的两个部分,我们从每个部分中删除“#”,后跟数字,并将其放入数据帧中
data.frame(t(sapply(strsplit(string, "\\bat\\b"),
function(x) trimws(sub("#[0-9]+", "", x)))))
# X1 X2
#1 Colorado California
#2 Utah Stanford
#3 USC Wash State
#4 UCLA Washington
#5 W Illinois Neb Omaha
#6 Denver S Dakota St
#7 IUPUI South Dakota
#8 Rice TX El Paso
#9 North Texas TX-San Ant
#10 Iowa Purdue
或者使用
tidyr::separate
tidyr::separate(data.frame(col = trimws(gsub("#[0-9]+", "", string))),
col, into = c("T1", "T2"), sep = "\\bat\\b")
# T1 T2
#1 Colorado California
#2 Utah Stanford
#3 USC Wash State
#4 UCLA Washington
#5 W Illinois Neb Omaha
#6 Denver S Dakota St
#7 IUPUI South Dakota
#8 Rice TX El Paso
#9 North Texas TX-San Ant
#10 Iowa Purdue
另一个解决方案是使用
str\u extract\u all()
df%
使用str\u extract\u all()
df%
mutate(team_a=str_extract_all(text),(?我们可以在base R
中通过从“text”列中删除子字符串并使用read.csv
read.csv(text = trimws(gsub("#\\d+", "", gsub("\\s+at\\s+", ",", df$text))),
header = FALSE, col.names = c("T1", "T2"), stringsAsFactors = FALSE)
# T1 T2
#1 Colorado California
#2 Utah Stanford
#3 USC Wash State
#4 UCLA Washington
#5 W Illinois Neb Omaha
#6 Denver S Dakota St
#7 IUPUI South Dakota
#8 Rice TX El Paso
#9 North Texas TX-San Ant
#10 Iowa Purdue
我们可以在base R
中通过从“text”列中删除子字符串并使用read.csv
read.csv(text = trimws(gsub("#\\d+", "", gsub("\\s+at\\s+", ",", df$text))),
header = FALSE, col.names = c("T1", "T2"), stringsAsFactors = FALSE)
# T1 T2
#1 Colorado California
#2 Utah Stanford
#3 USC Wash State
#4 UCLA Washington
#5 W Illinois Neb Omaha
#6 Denver S Dakota St
#7 IUPUI South Dakota
#8 Rice TX El Paso
#9 North Texas TX-San Ant
#10 Iowa Purdue