基于R中的第二个列表求一个列表中的词频总和_R_Regex_List

基于R中的第二个列表求一个列表中的词频总和

r regex list

基于R中的第二个列表求一个列表中的词频总和,r,regex,list,R,Regex,List,我需要根据单独的源列表，计算列表中单词或短语的出现频率。我有一个作者和研究领域的数据框架。每个作者都有一个与他们的名字相关的一个或多个研究领域（单词/词组）的列表。有时同一个研究领域会出现不止一次，我希望每次都对它们进行计数（即，不是唯一的列表）。我需要计算一位作者的研究领域与一系列研究领域相匹配的次数。我可以根据每个作者来做，但不能针对整个作者列表。（实际上，有4个集合列表，分为研究类别：生命科学、社会科学等，我需要统计每个研究类别中每个作者的研究领域的发生率，即列表中有多少生命科学

我需要根据单独的源列表，计算列表中单词或短语的出现频率。
我有一个作者和研究领域的数据框架。每个作者都有一个与他们的名字相关的一个或多个研究领域（单词/词组）的列表。
有时同一个研究领域会出现不止一次，我希望每次都对它们进行计数（即，不是唯一的列表）。
我需要计算一位作者的研究领域与一系列研究领域相匹配的次数。
我可以根据每个作者来做，但不能针对整个作者列表。
（实际上，有4个集合列表，分为研究类别：生命科学、社会科学等，我需要统计每个研究类别中每个作者的研究领域的发生率，即列表中有多少生命科学领域，列表中有多少社会科学领域等。下面是一个研究类别的简单示例，但在实际示例中有4个独立且唯一的“词汇”

test.small <- data.frame(AuthorID=c("Mavis", "Cleotha", "Yvonne"), 
                     RA=c("Fisheries, Fisheries, Geography, Marine Biology", "Fisheries", 
                          "Marine Biology, Marine Biology, Fisheries, Zoology"))
RA.text <- as.character(test.small$RA)
RA.list <- strsplit(RA.text, ", ", perl=TRUE)
lexicon <- c("Fisheries", "Marine Biology")

sum(RA.list[[3]] %in% lexicon)

test.small您可以创建一个函数，并使用lappy将该函数应用于所有行。如果我正确理解了您的问题，以下内容对我很有用：
test.small <- data.frame(AuthorID=c("Mavis", "Cleotha", "Yvonne"), 
                         RA=c("Fisheries, Fisheries, Geography, Marine Biology", "Fisheries", 
                              "Marine Biology, Marine Biology, Fisheries, Zoology"))

frequency_counter <- function(x,lexicon)
{
x<- as.character(x)
RA.list <- strsplit(x, ", ", perl=TRUE)
count = sum(RA.list[[1]] %in% lexicon)
return(count)
}

# apply the function
lexicon <- c("Fisheries", "Marine Biology")
test.small$count = lapply(test.small$RA,function(x) frequency_counter(x,lexicon))

test.small您可以创建一个函数，并使用lappy将该函数应用于所有行。如果我正确理解了您的问题，以下内容对我很有用：
test.small <- data.frame(AuthorID=c("Mavis", "Cleotha", "Yvonne"), 
                         RA=c("Fisheries, Fisheries, Geography, Marine Biology", "Fisheries", 
                              "Marine Biology, Marine Biology, Fisheries, Zoology"))

frequency_counter <- function(x,lexicon)
{
x<- as.character(x)
RA.list <- strsplit(x, ", ", perl=TRUE)
count = sum(RA.list[[1]] %in% lexicon)
return(count)
}

# apply the function
lexicon <- c("Fisheries", "Marine Biology")
test.small$count = lapply(test.small$RA,function(x) frequency_counter(x,lexicon))

test.small我们可以使用stringr
包中的stru count
。在下面的示例中，test.small2
是一个数据框，其中列count
显示单词计数
注意，我在创建test.small时添加了stringsAsFactors=FALSE
，以确保所有列都是字符，而不是因子
or1
是来自rebus
包的函数，它创建正则表达式语法|

通过使用str\u count
，我们可能不需要strsplit
字符串
# Create example data frame
test.small <- data.frame(AuthorID=c("Mavis", "Cleotha", "Yvonne"), 
                         RA=c("Fisheries, Fisheries, Geography, Marine Biology", "Fisheries", 
                              "Marine Biology, Marine Biology, Fisheries, Zoology"),
                         stringsAsFactors = FALSE)

# Load packages
library(dplyr)
library(stringr)
library(rebus)

# Define the lexicon
lexicon <- c("Fisheries", "Marine Biology")

# Create a new column showing the total number of words matching the lexicon
test.small2 <- test.small %>% mutate(Count = str_count(RA, or1(lexicon)))

#创建示例数据帧
test.small我们可以使用stringr
包中的stru count
。在下面的示例中，test.small 2
是一个数据框，其中列count
显示单词计数
注意，我在创建test.small时添加了stringsAsFactors=FALSE
，以确保所有列都是字符，而不是因子
or1
是来自rebus
包的函数，它创建正则表达式语法|

通过使用str\u count
，我们可能不需要strsplit
字符串
# Create example data frame
test.small <- data.frame(AuthorID=c("Mavis", "Cleotha", "Yvonne"), 
                         RA=c("Fisheries, Fisheries, Geography, Marine Biology", "Fisheries", 
                              "Marine Biology, Marine Biology, Fisheries, Zoology"),
                         stringsAsFactors = FALSE)

# Load packages
library(dplyr)
library(stringr)
library(rebus)

# Define the lexicon
lexicon <- c("Fisheries", "Marine Biology")

# Create a new column showing the total number of words matching the lexicon
test.small2 <- test.small %>% mutate(Count = str_count(RA, or1(lexicon)))

#创建示例数据帧
测试。小号请不要用那个巨大的字体写所有东西对不起，迪格马尔！不是故意的。看起来阿蒙克可能已经为我修好了-谢谢！@TessaFrancis这就是为什么蒙克存在。请不要用那个巨大的字体写所有东西对不起，迪格马尔！不是故意的。看起来阿蒙克可能已经为我修好了-谢谢！@TessaFrancis这就是蒙克存在的原因s