Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/276.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何提取子字符串作为dplyr::mutate管道的一部分_R_Regex_Dplyr_Tidyverse - Fatal编程技术网

如何提取子字符串作为dplyr::mutate管道的一部分

如何提取子字符串作为dplyr::mutate管道的一部分,r,regex,dplyr,tidyverse,R,Regex,Dplyr,Tidyverse,我有以下数据框: 库(tidyverse) df#A tibble:10 x 3 #>pfc_chr pfc_chr st peak_名称 #> #>1 chr1 3046442 XXX-ad_peak_1 #>2 chr1 3119671 XXX-ad_peak_2a #>3 chr1 3164756 PMN_峰2 #>4 chr1 3167322 Ytb_峰3 #>5 chr1 3210838 PMN_peak_3 #>6 chr1 32121

我有以下数据框:


库(tidyverse)
df#A tibble:10 x 3
#>pfc_chr pfc_chr st peak_名称
#>                      
#>1 chr1 3046442 XXX-ad_peak_1
#>2 chr1 3119671 XXX-ad_peak_2a
#>3 chr1 3164756 PMN_峰2
#>4 chr1 3167322 Ytb_峰3
#>5 chr1 3210838 PMN_peak_3
#>6 chr1 3212196 XXX-ad_峰6
#>7 chr1 3249068 XXX-ad_peak_8
#>8 chr1 3268246 PMN_峰5
#>9 chr1 3444892 XXX-ad_峰11
#>10 chr1 3451544 XXX-ad_峰12
我想做的是提取
peak\u name
中的子字符串作为 dplyr管道。最终的预期结果是:

pfc_chr pfc_chr st peak_name new_col
1 chr1 3046442 XXX-ad_peak_1 XXX ad
2 chr1 3119671 XXX-ad_peak_2a XXX ad
3 chr1 3164756 PMN\u峰\u 2 PMN
4 chr1 3167322 Ytb_峰_3 Ytb
5 chr1 3210838 PMN\U peak\U 3 PMN
6 chr1 3212196 XXX-ad\U peak\U 6 XXX ad
7 chr1 3249068 XXX-ad_peak_8 XXX ad
8 chr1 3268246 PMN\U peak\U 5 PMN
9 chr1 3444892 XXX-ad_peak_11 XXX ad
10 chr1 3451544 XXX-ad_peak_12 XXX ad
我试过了,但失败了:

>df%>%变异(新列=stringr::str\u匹配(峰值名称“^(.*?\\\\\\\\\\\\\\\?”)
mutate_impl(.data,dots)中出错:
“new_col”列的长度必须为10(行数)或1,而不是20
正确的方法是什么?

选择第二列

df %>% mutate(new_col = stringr::str_match(peak_name, "^(.*?)\\_peak\\_*?")[, 2])
输出

    pfc_chr pfc_chr_st      peak_name new_col
1    chr1    3046442  XXX-ad_peak_1  XXX-ad
2    chr1    3119671 XXX-ad_peak_2a  XXX-ad
3    chr1    3164756     PMN_peak_2     PMN
4    chr1    3167322     Ytb_peak_3     Ytb
5    chr1    3210838     PMN_peak_3     PMN
6    chr1    3212196  XXX-ad_peak_6  XXX-ad
7    chr1    3249068  XXX-ad_peak_8  XXX-ad
8    chr1    3268246     PMN_peak_5     PMN
9    chr1    3444892 XXX-ad_peak_11  XXX-ad
10    chr1    3451544 XXX-ad_peak_12  XXX-ad

我建议
stringr::str_extract()
使用前瞻:

df %>%
  mutate(new_col = stringr::str_extract(peak_name, "^.*(?=_peak)"))
结果如下:

> df %>%
+   mutate(new_col = stringr::str_extract(peak_name, "^.*(?=_peak)"))
# A tibble: 10 x 4
   pfc_chr pfc_chr_st      peak_name new_col
     <chr>      <int>          <chr>   <chr>
 1    chr1    3046442  XXX-ad_peak_1  XXX-ad
 2    chr1    3119671 XXX-ad_peak_2a  XXX-ad
 3    chr1    3164756     PMN_peak_2     PMN
 4    chr1    3167322     Ytb_peak_3     Ytb
 5    chr1    3210838     PMN_peak_3     PMN
 6    chr1    3212196  XXX-ad_peak_6  XXX-ad
 7    chr1    3249068  XXX-ad_peak_8  XXX-ad
 8    chr1    3268246     PMN_peak_5     PMN
 9    chr1    3444892 XXX-ad_peak_11  XXX-ad
10    chr1    3451544 XXX-ad_peak_12  XXX-ad
>df%>%
+突变(新的列=stringr::str提取(峰名“^.*(=\u峰)”)
#一个tibble:10x4
pfc_chr pfc_chr st peak_name new_col
1 chr1 3046442 XXX-ad_peak_1 XXX ad
2 chr1 3119671 XXX-ad_peak_2a XXX ad
3 chr1 3164756 PMN\u峰\u 2 PMN
4 chr1 3167322 Ytb_峰_3 Ytb
5 chr1 3210838 PMN\U peak\U 3 PMN
6 chr1 3212196 XXX-ad\U peak\U 6 XXX ad
7 chr1 3249068 XXX-ad_peak_8 XXX ad
8 chr1 3268246 PMN\U peak\U 5 PMN
9 chr1 3444892 XXX-ad_peak_11 XXX ad
10 chr1 3451544 XXX-ad_peak_12 XXX ad
请注意,诸如“_peak_8”之类的数据将返回一个空字符串;诸如“peak_8”之类的数据返回
NA

尝试
sub(^(.*)peak_.*”,“\\1”,peak_name)
(),而不是
stringr::str_match(…)
或甚至
sub(“\u peak.*$”,“”,peak_name)