Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/66.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用lookaround从刮取的数据中提取正则表达式无效_R_Regex - Fatal编程技术网

使用lookaround从刮取的数据中提取正则表达式无效

使用lookaround从刮取的数据中提取正则表达式无效,r,regex,R,Regex,我试图弄明白为什么我的一个regex命令可以工作,而另一个不行。这是它将从中提取的两条字符串的示例。由于刮擦而产生的新线垃圾具有一致性,因此我尽可能利用这一点: "\n\tMenghe a'Nyam\n\t\n\n \n\n \n\n \n\n \n Position:\n \n Forward\n\n\n\n 6-5, 215lb (196cm, 97kg) \n \n\n \n\n \n \n \n\n School: Canisius\n\n\n\n\n\n

我试图弄明白为什么我的一个regex命令可以工作,而另一个不行。这是它将从中提取的两条字符串的示例。由于刮擦而产生的新线垃圾具有一致性,因此我尽可能利用这一点:

"\n\tMenghe a'Nyam\n\t\n\n  \n\n  \n\n  \n\n  \n  Position:\n  \n  Forward\n\n\n\n  6-5, 215lb (196cm, 
97kg) \n  \n\n  \n\n  \n  \n  \n\n  School: Canisius\n\n\n\n\n\n  More player info\n\n\n\n\n\n"

"\n\tJordan Aaberg\n\t\n\n  \n\n  \n\n  \n\n  \n  Position:\n  \n  Guard\n\n\n\n  6-9, 225lb (206cm, 
102kg) \n  \n\n  Hometown: Rothsay, MN\n\n\n\n  \n\n  High School: Rothsay\n\n\n\n  \n  \n  \n\n  
School: North Dakota State\n\n\n\n\n\n  More player info\n\n\n\n\n\n"
我的目标是从中提取所需的数据,如位置(分别为前锋、后卫)和最重要的高度(分别为6-5、6-9)。我通过以下几点成功地保住了职位:

test <- df %>%
  mutate(position = str_extract(player, "(?<=Position:\n  \n  ).*?(?=\n\n\n\n  \\d-\\d)")) 

下面是一种使用
stringr
tidyr
的方法。首先,我删除了所有的
\n
\t
,因为它们确实让我恼火

test%
变异(player=str|u replace|u all(player,“\n |\r |\t”,”),

position=str_extract(player),(?您可以在
\w
之后删除
+
,因为ICU正则表达式引擎不支持lookbehinds内部的无限长字符串匹配模式,并使用
\s
匹配任何空格:

test <- df %>%
  mutate(position = str_extract(player, "(?<=Position:\n  \n  ).*?(?=\n\n\n\n  \\d-\\d)")) %>%
  mutate(height = str_extract(player, "(?<=\\w\n{4}\\s{2}).*?(?=,\\s+\\d{3}lb)"))
test%

变异(位置=str_)提取(玩家,“(?这很接近-但足以让我学习和提高。非常感谢!你仍然是我心目中的英雄Wiktor一如既往,谢谢!我知道我是一个regex新手,但每次你慷慨提供如此详细和周到的回答时,你都会继续帮助我理解一些新的细微差别。希望我的问题措辞至少有所改进!干杯
structure(list(player = c("\n\tMenghe a'Nyam\n\t\n\n  \n\n  \n\n  \n\n  \n  Position:\n  \n  Forward\n\n\n\n  6-5, 215lb (196cm, 97kg) \n  \n\n  \n\n  \n  \n  \n\n  School: Canisius\n\n\n\n\n\n  More player info\n\n\n\n\n\n"  , 
"\n\tJordan Aaberg\n\t\n\n  \n\n  \n\n  \n\n  \n  Position:\n  \n  Forward\n\n\n\n  6-9, 225lb (206cm, 102kg) \n  \n\n  Hometown: Rothsay, MN\n\n\n\n  \n\n  High School: Rothsay\n\n\n\n  \n  \n  \n\n  School: North Dakota State\n\n\n\n\n\n  More player info\n\n\n\n\n\n"  , 
"\n\tKarl Aaker\n\t\n\n  \n\n  \n\n  \n\n  \n  Position:\n  \n  Forward\n\n\n\n  6-5, 210lb (196cm, 95kg) \n  \n\n  Hometown: Reno, NV\n\n\n\n  \n\n  \n  \n  \n\n  School: Portland\n\n\n\n\n\n  More player info\n\n\n\n\n\n"  
), position = c("Forward", "Forward", "Forward"), height = c(NA_character_, 
NA_character_, NA_character_)), row.names = c(NA, 3L), class = "data.frame")    
test <- df %>%
  mutate(position = str_extract(player, "(?<=Position:\n  \n  ).*?(?=\n\n\n\n  \\d-\\d)")) %>%
  mutate(height = str_extract(player, "(?<=\\w\n{4}\\s{2}).*?(?=,\\s+\\d{3}lb)"))