R 通过检查所有列的部分文本来筛选行_R_Dataframe

R 通过检查所有列的部分文本来筛选行

r dataframe

R 通过检查所有列的部分文本来筛选行,r,dataframe,R,Dataframe,您好，我是R新手，我找不到一种方法来检查一行的所有列，如果它们包含一个单词，然后只取在任何列中至少有一次包含这个单词的行。我制作了一个数据框示例，向您展示我的数据是什么样子的 > df Name currrent.Category Category.Month.1 Category.Month.2 Category.Month.3 1 Fund1 Abc Cautious Abc Cautious Abc Cautious Abc Cautious

您好，我是R新手，我找不到一种方法来检查一行的所有列，如果它们包含一个单词，然后只取在任何列中至少有一次包含这个单词的行。我制作了一个数据框示例，向您展示我的数据是什么样子的

> df
   Name currrent.Category Category.Month.1 Category.Month.2 Category.Month.3
1 Fund1      Abc Cautious     Abc Cautious     Abc Cautious     Abc Cautious
2 Fund2      Abc Cautious       Abc Global     Abc Cautious     Abc Cautious
3 Fund3        Abc Global       Abc Global       Abc Global       Abc Global
4 Fund4        Abc Global     Abc Cautious       Abc Global       Abc Global

现在我想提取所有包含单词“谨慎”的类别中的行。因此，返回的数据帧应该包含第1、2和4行。我将Abc添加到每个类别中，因为我的数据中的类别名称较长，并且在某些方面有所不同，但重要的是它们是否包含“谨慎”一词

在R中这样的操作可能吗

> dput(df)
structure(list(Name = structure(1:4, .Label = c("Fund1", "Fund2", 
"Fund3", "Fund4"), class = "factor"), currrent.Category = structure(c(1L, 
1L, 2L, 2L), .Label = c("Abc Cautious", "Abc Global"), class = "factor"), 
Category.Month.1 = structure(c(1L, 2L, 2L, 1L), .Label = c("Abc Cautious", 
"Abc Global"), class = "factor"), Category.Month.2 = structure(c(1L, 
1L, 2L, 2L), .Label = c("Abc Cautious", "Abc Global"), class = "factor"), 
Category.Month.3 = structure(c(1L, 1L, 2L, 2L), .Label = c("Abc Cautious", 
"Abc Global"), class = "factor")), .Names = c("Name", "currrent.Category", 
"Category.Month.1", "Category.Month.2", "Category.Month.3"), class = "data.frame", row.names = c(NA, 
-4L))

我希望这是发布dput的正确方法。

使用sqldf包：

library(sqldf)
sqldf("select * from df where 
[Name] like '%Cautious%' or 
[currrent.Category] like '%Cautious%' 
or [Category.Month.1] like '%Cautious%' 
or [Category.Month.2] like '%Cautious%' 
or [Category.Month.3] like '%Cautious%'")

您的数据不是，这就是为什么您在处理数据时遇到问题的原因。我可以在您的数据中看到一个季节和该季节的状态

gather来自tidyr包，filter和magrittr操作符%>%来自dplyr包。我使用right赋值->保持从左到右的数据流

library(tidyr)
library(dplyr)

df %>%
  gather(season, status, -Name) %>% 
  filter(grepl("Cautious", status)) ->
  dcautious

您可以添加eg group_byName%>%SummarseSecution=n，以获得一份包含数据集中注意事项数量的基金列表。

是这样的吗？检查我下面的答案。或者只是df[rowSumssapplydf[-1]，grepl，pattern=carred，fixed=TRUE>0]或者df[Reduce`+`，lapplydf[-1]，grepl，pattern=carred，fixed=TRUE>0]，用基础拉尔索df[Reduce` `，lapplydf[-1]，grepl，pattern=carred，fixed=TRUE]，这是有效的。非常感谢你！但在我最初的数据框架中，我有300多列。是否有一种方法可以包括所有列而不显式地提及它们，比如序列？我刚试过这个，但没用。sqldfselect*来自df，其中[Name]：[Category.Month.3]类似于“%cardiod%”您没有提到这一点。下面的答案应该会有帮助。我只是厌倦了名称：Category，希望它能检查从名称到类别的所有列。Month.3：建议像这样概括答案：librarysqldf；你是天才@G.grothendieck再次感谢你。成功了。现在我只得到任何类别的谨慎基金。对于一个基本上很简单的问题，这似乎是一个有点麻烦的方法。@mtoto总是取决于你的观点。将数据保留在原始的“繁琐”表示中会给打印和其他下游分析带来问题。你的解决方案可以做到，但是使用lambda函数和长度，我认为它不太清晰。。我更喜欢与我对数据实际操作的心理模型相一致的动词。在这里，我过滤状态中匹配的数据，或者使用grepl函数来过滤这种变化：df[applydf，1，functionx anygreplx，x，]。apply给出了一个逻辑向量，然后用于为df下标。

# Extract rows that contain "Cautious" more than once
sub <- apply(df, 1, function(row) length(grep("Cautious", row)) > 0) 

# Subset df
df[sub,]
#   Name currrent.Category Category.Month.1 Category.Month.2 Category.Month.3
#1 Fund1      Abc Cautious     Abc Cautious     Abc Cautious     Abc Cautious
#2 Fund2      Abc Cautious       Abc Global     Abc Cautious     Abc Cautious
#4 Fund4        Abc Global     Abc Cautious       Abc Global       Abc Global