使用lookaround从刮取的数据中提取正则表达式无效
我试图弄明白为什么我的一个regex命令可以工作,而另一个不行。这是它将从中提取的两条字符串的示例。由于刮擦而产生的新线垃圾具有一致性,因此我尽可能利用这一点:使用lookaround从刮取的数据中提取正则表达式无效,r,regex,R,Regex,我试图弄明白为什么我的一个regex命令可以工作,而另一个不行。这是它将从中提取的两条字符串的示例。由于刮擦而产生的新线垃圾具有一致性,因此我尽可能利用这一点: "\n\tMenghe a'Nyam\n\t\n\n \n\n \n\n \n\n \n Position:\n \n Forward\n\n\n\n 6-5, 215lb (196cm, 97kg) \n \n\n \n\n \n \n \n\n School: Canisius\n\n\n\n\n\n
"\n\tMenghe a'Nyam\n\t\n\n \n\n \n\n \n\n \n Position:\n \n Forward\n\n\n\n 6-5, 215lb (196cm,
97kg) \n \n\n \n\n \n \n \n\n School: Canisius\n\n\n\n\n\n More player info\n\n\n\n\n\n"
"\n\tJordan Aaberg\n\t\n\n \n\n \n\n \n\n \n Position:\n \n Guard\n\n\n\n 6-9, 225lb (206cm,
102kg) \n \n\n Hometown: Rothsay, MN\n\n\n\n \n\n High School: Rothsay\n\n\n\n \n \n \n\n
School: North Dakota State\n\n\n\n\n\n More player info\n\n\n\n\n\n"
我的目标是从中提取所需的数据,如位置(分别为前锋、后卫)和最重要的高度(分别为6-5、6-9)。我通过以下几点成功地保住了职位:
test <- df %>%
mutate(position = str_extract(player, "(?<=Position:\n \n ).*?(?=\n\n\n\n \\d-\\d)"))
下面是一种使用
stringr
和tidyr
的方法。首先,我删除了所有的\n
和\t
,因为它们确实让我恼火
test%
变异(player=str|u replace|u all(player,“\n |\r |\t”,”),
position=str_extract(player),(?您可以在\w
之后删除+
,因为ICU正则表达式引擎不支持lookbehinds内部的无限长字符串匹配模式,并使用\s
匹配任何空格:
test <- df %>%
mutate(position = str_extract(player, "(?<=Position:\n \n ).*?(?=\n\n\n\n \\d-\\d)")) %>%
mutate(height = str_extract(player, "(?<=\\w\n{4}\\s{2}).*?(?=,\\s+\\d{3}lb)"))
test%
变异(位置=str_)提取(玩家,“(?这很接近-但足以让我学习和提高。非常感谢!你仍然是我心目中的英雄Wiktor一如既往,谢谢!我知道我是一个regex新手,但每次你慷慨提供如此详细和周到的回答时,你都会继续帮助我理解一些新的细微差别。希望我的问题措辞至少有所改进!干杯
structure(list(player = c("\n\tMenghe a'Nyam\n\t\n\n \n\n \n\n \n\n \n Position:\n \n Forward\n\n\n\n 6-5, 215lb (196cm, 97kg) \n \n\n \n\n \n \n \n\n School: Canisius\n\n\n\n\n\n More player info\n\n\n\n\n\n" ,
"\n\tJordan Aaberg\n\t\n\n \n\n \n\n \n\n \n Position:\n \n Forward\n\n\n\n 6-9, 225lb (206cm, 102kg) \n \n\n Hometown: Rothsay, MN\n\n\n\n \n\n High School: Rothsay\n\n\n\n \n \n \n\n School: North Dakota State\n\n\n\n\n\n More player info\n\n\n\n\n\n" ,
"\n\tKarl Aaker\n\t\n\n \n\n \n\n \n\n \n Position:\n \n Forward\n\n\n\n 6-5, 210lb (196cm, 95kg) \n \n\n Hometown: Reno, NV\n\n\n\n \n\n \n \n \n\n School: Portland\n\n\n\n\n\n More player info\n\n\n\n\n\n"
), position = c("Forward", "Forward", "Forward"), height = c(NA_character_,
NA_character_, NA_character_)), row.names = c(NA, 3L), class = "data.frame")
test <- df %>%
mutate(position = str_extract(player, "(?<=Position:\n \n ).*?(?=\n\n\n\n \\d-\\d)")) %>%
mutate(height = str_extract(player, "(?<=\\w\n{4}\\s{2}).*?(?=,\\s+\\d{3}lb)"))