Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/design-patterns/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何删除行中除模式之外的所有内容_R - Fatal编程技术网

R 如何删除行中除模式之外的所有内容

R 如何删除行中除模式之外的所有内容,r,R,我有一个dataframe,它包含一列,列之间用分隔符分隔;像这样 AB00001;09843;AB00002;GD00001 AB84375;34 AB84375;AB84375 74859375;AB001;4455;FG3455 我想要的是删除所有的东西,除了以AB开头的代码 AB00001;AB00002 AB84375 AB84375;AB84375 AB001 我试着用separate将它们分开,但我不知道如何继续。有什么建议吗?如果您的数据框名为df,列名为V1,您可以尝试:

我有一个dataframe,它包含一列,列之间用分隔符分隔;像这样

AB00001;09843;AB00002;GD00001
AB84375;34
AB84375;AB84375
74859375;AB001;4455;FG3455
我想要的是删除所有的东西,除了以AB开头的代码

AB00001;AB00002
AB84375
AB84375;AB84375
AB001

我试着用separate将它们分开,但我不知道如何继续。有什么建议吗?

如果您的数据框名为df,列名为V1,您可以尝试:

sapply(strsplit(df$V1, ";"), function(x) paste(grep("^AB", x, value = TRUE), collapse = ";"))
#> [1] "AB00001;AB00002" "AB84375"         "AB84375;AB84375" "AB001" 
这将在所有分号处拆分,然后匹配以AB开头的所有字符串,然后用分号将它们连接在一起。

我想到了使用stringr和Daniel O的数据:

df%>% mutatedata=str\u extract\u alldata,AB\\w+ 这给了我们

数据 1 AB00001,AB00002 2 AB84375 3 AB84375,AB84375 4 AB001 1 Base R假设注释末尾重复显示DF,我们在每行前面加上分号,然后使用gsub和所示模式,最后删除添加的分号。没有使用任何软件包

transform(DF, V1 = sub("^;", "", gsub("(;AB\\d+)|;[^;]*", "\\1", paste0(";", V1))))
给予:

               V1
1 AB00001;AB00002
2         AB84375
3 AB84375;AB84375
4           AB001
# A tibble: 4 x 1
  V1             
  <chr>          
1 AB00001;AB00002
2 AB84375        
3 AB84375;AB84375
4 AB001          
               V1
1 AB00001;AB00002
2         AB84375
3 AB84375;AB84375
4           AB001
2 dplyr/tidyr这个答案比其他答案长,但它是直截了当的,没有复杂的正则表达式

library(dplyr)
library(tidyr)

DF %>%
  mutate(id = 1:n()) %>%
  separate_rows(V1, sep = ";") %>%
  filter(substr(V1, 1, 2) == "AB") %>%
  group_by(id) %>%
  summarize(V1 = paste(V1, collapse = ";")) %>%
  ungroup %>%
  select(-id)
给予:

               V1
1 AB00001;AB00002
2         AB84375
3 AB84375;AB84375
4           AB001
# A tibble: 4 x 1
  V1             
  <chr>          
1 AB00001;AB00002
2 AB84375        
3 AB84375;AB84375
4 AB001          
               V1
1 AB00001;AB00002
2         AB84375
3 AB84375;AB84375
4           AB001
给予:

               V1
1 AB00001;AB00002
2         AB84375
3 AB84375;AB84375
4           AB001
# A tibble: 4 x 1
  V1             
  <chr>          
1 AB00001;AB00002
2 AB84375        
3 AB84375;AB84375
4 AB001          
               V1
1 AB00001;AB00002
2         AB84375
3 AB84375;AB84375
4           AB001
笔记
快告诉我,我正要发布这个确切的答案。我将只添加示例数据结构ListData=cAB00001;09843;AB00002;GD00001,AB84375;34,AB84375;AB84375、74859375;AB001;4455;FG3455,class=data.frame,row.names=cNA,-4LIt有点像西部最快的一个问题@DanielO。akrun类型通常在我们其他人眨眼之前回答!另一个gsub“;$”,gsub'AB[^;]+;?|.,'\\1',x