R 将不精确值与不精确值相结合
我有两个tibble,我想结合他们的基础上击球手列。但是,两列中的值并不完全相同,即“V Kohli”与“Virat Kohli(IND)”。如何根据这些不精确的匹配来组合TIBLES 谢谢大家!R 将不精确值与不精确值相结合,r,R,我有两个tibble,我想结合他们的基础上击球手列。但是,两列中的值并不完全相同,即“V Kohli”与“Virat Kohli(IND)”。如何根据这些不精确的匹配来组合TIBLES 谢谢大家! x1 <- tibble(Batsman=c("V Kohli (INDIA)","RG Sharma (INDIA)","Babar Azam (PAK)","GJ Maxwell (AUS)"), Runs=c(500,400,300,200),
x1 <- tibble(Batsman=c("V Kohli (INDIA)","RG Sharma (INDIA)","Babar Azam (PAK)","GJ Maxwell (AUS)"),
Runs=c(500,400,300,200),
Matches=c(67,54,47,23)
x2 <- tibble(Rank=c(1,2,3,4),
Batsman=c("Virat Kohli", "Rohit Sharma", "Glenn Maxwell","Babar Azam"),
Rating=c(853,820,640,500))
x1所以你想连接两个文本字符串
> x1$Batsman
[1] "V Kohli (INDIA)" "RG Sharma (INDIA)" "Babar Azam (PAK)" "GJ Maxwell (AUS)"
> x2$Batsman
[1] "Virat Kohli" "Rohit Sharma" "Glenn Maxwell" "Babar Azam"
我猜你的名字比这四个多得多?
这绝对是一项棘手的任务,计算机在完成这类任务方面是出了名的差劲。(这里有一些非常长的函数的著名例子,仅用于读取电话号码)。从您提供的字符串中,我可以看到它们总是有相似的名称
我将使用regexp来提取名称
完整代码:
library(tibble)
library(stringr)
x1 <- tibble(Batsman=c("V Kohli (INDIA)","RG Sharma (INDIA)","Babar Azam (PAK)","GJ Maxwell (AUS)"),
Runs=c(500,400,300,200),
Matches=c(67,54,47,23) )
x2 <- tibble(Rank=c(1,2,3,4),
Batsman=c("Virat Kohli", "Rohit Sharma", "Glenn Maxwell","Babar Azam"),
Rating=c(853,820,640,500))
AA <- str_sub(x1$Batsman, start = str_locate(x1$Batsman, " ")[,1]+1, 20)
AA <- str_sub(AA, start = 1, end = str_locate(AA, " ")[,1]-1) %>%
str_to_lower()
BB <- str_sub(x2$Batsman, start = str_locate(x2$Batsman, " ")[,1]+1, 20) %>%
str_to_lower()
match(AA, BB)
库(TIBLE)
图书馆(stringr)
你不是又问了吗?