抽出主队和客队,以“分开”;at";在R

抽出主队和客队,以“分开”;at";在R,r,R,我有一个关于大学篮球比赛的向量: c("#34 Colorado at #36 California", "#31 Utah at #87 Stanford", "#26 USC at #112 Wash State", "#56 UCLA at #134 Washington", "#187 W Illinois at #116 Neb Omaha", "#222 Denver at #58 S Dakota St", "#245 IUPUI at #170

我有一个关于大学篮球比赛的向量:

c("#34 Colorado  at  #36 California", "#31 Utah  at  #87 Stanford", 
"#26 USC  at  #112 Wash State", "#56 UCLA  at  #134 Washington", 
"#187 W Illinois  at  #116 Neb Omaha", "#222 Denver  at  #58 S Dakota St", 
"#245 IUPUI  at  #170 South Dakota", "#268 Rice  at  #208 TX El Paso", 
"#274 North Texas  at  #344 TX-San Ant", "#14 Iowa  at  #3 Purdue"
)
我想要两个单独的向量:一个用于
之前的团队,另一个用于
之后出现的团队。例如)第一个向量将有
科罗拉多州
犹他州
南加州
,等等,第二个向量将有
加利福尼亚州
斯坦福州
华盛顿州
,等等


注意,我多么不想要排名。我只想要球队的名字。我尝试了
str\u split
ing,但效果不太好,因为间距都不一致

我们可以使用strsplit并在“at”上拆分,这将为我们提供字符串的两部分,我们从每个部分中删除“#”,后跟数字,并将其放入数据帧中

data.frame(t(sapply(strsplit(string, "\\bat\\b"), 
             function(x) trimws(sub("#[0-9]+", "", x)))))


#            X1           X2
#1     Colorado   California
#2         Utah     Stanford
#3          USC   Wash State
#4         UCLA   Washington
#5    W Illinois    Neb Omaha
#6       Denver  S Dakota St
#7        IUPUI South Dakota
#8         Rice   TX El Paso
#9  North Texas   TX-San Ant
#10        Iowa       Purdue

或者使用
tidyr::separate

tidyr::separate(data.frame(col = trimws(gsub("#[0-9]+", "", string))),
        col, into = c("T1", "T2"), sep = "\\bat\\b")


#            T1                T2
#1     Colorado        California
#2         Utah          Stanford
#3          USC        Wash State
#4         UCLA        Washington
#5   W Illinois         Neb Omaha
#6       Denver       S Dakota St
#7        IUPUI      South Dakota
#8         Rice        TX El Paso
#9  North Texas        TX-San Ant
#10        Iowa            Purdue

我们可以使用
strsplit
并在“at”上拆分,这将为我们提供字符串的两个部分,我们从每个部分中删除“#”,后跟数字,并将其放入数据帧中

data.frame(t(sapply(strsplit(string, "\\bat\\b"), 
             function(x) trimws(sub("#[0-9]+", "", x)))))


#            X1           X2
#1     Colorado   California
#2         Utah     Stanford
#3          USC   Wash State
#4         UCLA   Washington
#5    W Illinois    Neb Omaha
#6       Denver  S Dakota St
#7        IUPUI South Dakota
#8         Rice   TX El Paso
#9  North Texas   TX-San Ant
#10        Iowa       Purdue

或者使用
tidyr::separate

tidyr::separate(data.frame(col = trimws(gsub("#[0-9]+", "", string))),
        col, into = c("T1", "T2"), sep = "\\bat\\b")


#            T1                T2
#1     Colorado        California
#2         Utah          Stanford
#3          USC        Wash State
#4         UCLA        Washington
#5   W Illinois         Neb Omaha
#6       Denver       S Dakota St
#7        IUPUI      South Dakota
#8         Rice        TX El Paso
#9  North Texas        TX-San Ant
#10        Iowa            Purdue

另一个解决方案是使用
str\u extract\u all()

df%

使用
str\u extract\u all()

df%

mutate(team_a=str_extract_all(text),(?我们可以在
base R
中通过从“text”列中删除子字符串并使用
read.csv

read.csv(text = trimws(gsub("#\\d+", "", gsub("\\s+at\\s+", ",", df$text))),
        header = FALSE, col.names = c("T1", "T2"), stringsAsFactors = FALSE)
#            T1            T2
#1     Colorado    California
#2         Utah      Stanford
#3          USC    Wash State
#4         UCLA    Washington
#5   W Illinois     Neb Omaha
#6       Denver   S Dakota St
#7        IUPUI  South Dakota
#8         Rice    TX El Paso
#9  North Texas    TX-San Ant
#10        Iowa        Purdue

我们可以在
base R
中通过从“text”列中删除子字符串并使用
read.csv

read.csv(text = trimws(gsub("#\\d+", "", gsub("\\s+at\\s+", ",", df$text))),
        header = FALSE, col.names = c("T1", "T2"), stringsAsFactors = FALSE)
#            T1            T2
#1     Colorado    California
#2         Utah      Stanford
#3          USC    Wash State
#4         UCLA    Washington
#5   W Illinois     Neb Omaha
#6       Denver   S Dakota St
#7        IUPUI  South Dakota
#8         Rice    TX El Paso
#9  North Texas    TX-San Ant
#10        Iowa        Purdue