R：使用regexp计算数据帧列中字符串（括号内）的频率_R_Regex_String_Dataframe

R：使用regexp计算数据帧列中字符串（括号内）的频率

r regex string dataframe

R：使用regexp计算数据帧列中字符串（括号内）的频率,r,regex,string,dataframe,R,Regex,String,Dataframe,dataframeresearch的funders列在括号中列出了资助者的姓名，如下所示： Funder 1 (FWF) Another Funding Organization (FWF) Funder 2 (ERC) supported this research. Yet another one (Leverhulme Trust), and another (ERC). They helped us! We thank this funder (FWF) for their suppor

dataframe

research

的

funders

列在括号中列出了资助者的姓名，如下所示：

Funder 1 (FWF)
Another Funding Organization (FWF)
Funder 2 (ERC) supported this research.
Yet another one (Leverhulme Trust), and another (ERC). They helped us!
We thank this funder (FWF) for their support

我想提取括号内的所有资助者姓名，并根据频率计数对其进行排序

我无法使用此代码执行此操作：

df <- data.frame(table(research$funders))
funder <- "(?<=\\().*?(?=\\))"
sapply(df, function(x) {
  sapply(funder, function(y) {
    sum(grepl(y, x, perl=TRUE))
  })
})

我该怎么做？谢谢您的帮助。

您可以使用

regmatches

和

gregexpr

提取圆括号内的所有内容。然后使用

表格

计算其频率

table(unlist(regmatches(string, gregexpr('\\(.*?\\)', string))))

#             (ERC)              (FWF) (Leverhulme Trust) 
#                 2                  3                  1

使用

stringr:：str\u extract\u all

也可以做同样的事情：

table(unlist(stringr::str_extract_all(string, '\\(.*?\\)')))

谢谢，@WiktorStribiżew，我已经看过了，但是没有任何帮助：那里的所有解决方案都只是计算括号出现的次数，而不是计算括号内每个不同值的频率。这一个。谢谢！它起作用了；为了按频率计数排序，我使用了

排序（表…，递减=TRUE）

。

table(unlist(regmatches(string, gregexpr('\\(.*?\\)', string))))

#             (ERC)              (FWF) (Leverhulme Trust) 
#                 2                  3                  1

table(unlist(stringr::str_extract_all(string, '\\(.*?\\)')))