R 如何找到包含字符串符的通用文本表单2列？_R_Dataframe

R 如何找到包含字符串符的通用文本表单2列？

r dataframe

R 如何找到包含字符串符的通用文本表单2列？,r,dataframe,R,Dataframe,我有两个栏目“标题”包含“什么是物理？”等数据，另一个栏目“内容”包含“物理是对……的研究”等数据。我想要两者的共同文本，比如['is'，'Physics']。必须对所有数据行执行此操作。如何使用R实现这一点您好，我想您需要以下内容： df <- data.frame(col1=c('what is physics?', 'set cover is NP hard', 'abstract algebra'), col2=c('Physics is

我有两个栏目“标题”包含“什么是物理？”等数据，另一个栏目“内容”包含“物理是对……的研究”等数据。我想要两者的共同文本，比如['is'，'Physics']。必须对所有数据行执行此操作。如何使用R实现这一点

您好，

我想您需要以下内容：

df <- data.frame(col1=c('what is physics?', 'set cover is NP hard', 'abstract algebra'), 
                 col2=c('Physics is the study of...', 'Example of an NP complete problem is 3-SAT', 'linear algebra'),
                 stringsAsFactors = FALSE)
#       col1                col2
# 1     what is physics?    Physics is the study of...
# 2 set cover is NP hard    Example of an NP complete problem is 3-SAT
# 3     abstract algebra    linear algebra

apply(df, 1, function(x) intersect(tolower(unlist(strsplit(gsub('[^a-zA-Z\\s]+', ' ', x[1]), split=' '))), 
                               tolower(unlist(strsplit(gsub('[^a-zA-Z\\s]+', ' ', x[2]), split=' ')))))

#[[1]]
#[1] "is"      "physics"

#[[2]]
#[1] "is" "np"

#[[3]]
#[1] "algebra"

它说unlistrsplitgsub[^a-zA-Z\\s]、x[2]中有错误，split=：unused argument split=您现在可以检查吗？有个打字错误。