R 通过提取与正则表达式匹配的al组将字符串分隔为列

R 通过提取与正则表达式匹配的al组将字符串分隔为列,r,tidyr,R,Tidyr,我在一列的每一行中都有这些字符串 example_df <- tibble(string = c("[{\"positieVergelekenMetSchooladvies\":\"boven niveau\",\"percentage\":9.090909090909092,\"percentageVergelijking\":19.843418733556412,\"volgorde\&

我在一列的每一行中都有这些字符串

example_df <- tibble(string = c("[{\"positieVergelekenMetSchooladvies\":\"boven niveau\",\"percentage\":9.090909090909092,\"percentageVergelijking\":19.843418733556412,\"volgorde\":10},{\"positieVergelekenMetSchooladvies\":\"op niveau\",\"percentage\":81.81818181818181,\"percentageVergelijking\":78.58821425834631,\"volgorde\":20},{\"positieVergelekenMetSchooladvies\":\"onder niveau\",\"percentage\":9.090909090909092,\"percentageVergelijking\":1.5683670080972694,\"volgorde\":30}]"))
我想使用
extract()
函数,而不是使用
separate()
函数。我的理解是,它与
separate()
的不同之处在于
extract()
匹配要填充新列的正则表达式
separate()
当然匹配分隔字符串。但是其中
sep()
匹配您在
sep=
extract()
中填写的所有字符串只匹配一个组

example_df %>% 
  extract(string, 
           into = c("boven_niveau_school",
                    "boven_niveau_verg",
                    "op_niveau_school",
                    "op_niveau_verg",
                    "onder_niveau_school",
                    "onder_niveau_verg"),
           regex = "([0-9]+\\.[0-9]+)")

我做错了什么?

我将从字符串中提取所有数字,然后使用
unnest\u wider
创建新列,而不是
separate
extract

library(tidyverse)

example_df %>%
  mutate(temp = str_extract_all(string, "[0-9]+\\.[0-9]+")) %>%
  unnest_wider(temp)

您可以根据自己的选择重命名列。

我们可以使用
regmatches/regexpr
from
base R

out <- regmatches(example_df$string, gregexpr("\\d+\\.\\d+", example_df$string))[[1]]
example_df[paste0("new", seq_along(out))] <- as.list(out)
example_df
# A tibble: 1 x 7
#  string                                                                     new1        new2         new3        new4       new5       new6       
#  <chr>                                                                      <chr>       <chr>        <chr>       <chr>      <chr>      <chr>      
#1 "[{\"positieVergelekenMetSchooladvies\":\"boven niveau\",\"percentage\":9… 9.09090909… 19.84341873… 81.8181818… 78.588214… 9.0909090… 1.56836700…
out
out <- regmatches(example_df$string, gregexpr("\\d+\\.\\d+", example_df$string))[[1]]
example_df[paste0("new", seq_along(out))] <- as.list(out)
example_df
# A tibble: 1 x 7
#  string                                                                     new1        new2         new3        new4       new5       new6       
#  <chr>                                                                      <chr>       <chr>        <chr>       <chr>      <chr>      <chr>      
#1 "[{\"positieVergelekenMetSchooladvies\":\"boven niveau\",\"percentage\":9… 9.09090909… 19.84341873… 81.8181818… 78.588214… 9.0909090… 1.56836700…