R 删除具有特定字符串值的多行_R_Dplyr_Purrr

R 删除具有特定字符串值的多行

R 删除具有特定字符串值的多行,r,dplyr,purrr,R,Dplyr,Purrr,我有一个有几百列的数据框。我想删除值为“Item skipped”或“”的选定列的行例如，见下文。理想情况下，我希望删除列“animal”和“Insurance”中包含“Item skipped”或“”的所有行，但不希望这适用于其他列在我的实际数据帧中，有大约34列我想删除带有这些字符串的列，有128列我不想删除。如蒙指教，不胜感激 dat <- data.frame(animal=c("dog","cat","Item skipped", ""), Insurance=c("Y",

我有一个有几百列的数据框。我想删除值为“Item skipped”或“”的选定列的行

例如，见下文。理想情况下，我希望删除列“animal”和“Insurance”中包含“Item skipped”或“”的所有行，但不希望这适用于其他列

在我的实际数据帧中，有大约34列我想删除带有这些字符串的列，有128列我不想删除。如蒙指教，不胜感激

dat <- data.frame(animal=c("dog","cat","Item skipped", ""), Insurance=c("Y", "N","Item skipped",""), condition = c("",
                  "Asthma","Item skipped",""), age = rep(c(6,10), each = 2))

dat您始终可以使用for循环来实现这一点，尤其是因为您的数据集很小
> remove_cols <- c('animal', 'Insurance') # vector of names of all columns you'll use to drop rows
> remove_vals <- c('', 'Item skipped') # values which indicate a row that should be dropped
> 
> for(col in remove_cols){
+   dat <- dat[!dat[[col]] %in% remove_vals, ]
+ }
> 
> head(dat)
  animal Insurance condition age
1    dog         Y             6
2    cat         N    Asthma   6

>remove_cols您始终可以使用for循环执行此操作，尤其是因为您的数据集很小
> remove_cols <- c('animal', 'Insurance') # vector of names of all columns you'll use to drop rows
> remove_vals <- c('', 'Item skipped') # values which indicate a row that should be dropped
> 
> for(col in remove_cols){
+   dat <- dat[!dat[[col]] %in% remove_vals, ]
+ }
> 
> head(dat)
  animal Insurance condition age
1    dog         Y             6
2    cat         N    Asthma   6

>remove\u cols您可以对选定的列或列范围使用filter\u at

library(dplyr)

dat %>%
  filter_at(vars(animal,Insurance), all_vars(!. %in% c("Item skipped", "")))

#  animal Insurance condition age
#1    dog         Y             6
#2    cat         N    Asthma   6


或者，对于基本R，您可以使用行和

cols <- c('animal', 'Insurance')
dat[rowSums(dat[cols] == "Item skipped" | dat[cols] == "") == 0, ]

cols您可以对选定的列或列范围使用filter\u at

library(dplyr)

dat %>%
  filter_at(vars(animal,Insurance), all_vars(!. %in% c("Item skipped", "")))

#  animal Insurance condition age
#1    dog         Y             6
#2    cat         N    Asthma   6


或者，对于基本R，您可以使用行和

cols <- c('animal', 'Insurance')
dat[rowSums(dat[cols] == "Item skipped" | dat[cols] == "") == 0, ]

cols在基带R中，不带for
循环：
dat[!rownames(dat) %in% which(dat$animal %in% c("Item skipped", "") | dat$Insurance %in% c("Item skipped", "")), ]`

在不带for
循环的基本R中：
dat[!rownames(dat) %in% which(dat$animal %in% c("Item skipped", "") | dat$Insurance %in% c("Item skipped", "")), ]`

使用R base而无需应用更多软件包：
# Find rows that match content of 2 column cell values.
rows_to_delete <- which(dat$animal == "Item skipped" & dat$Insurance == "Item skipped")

# Delete row. 
# Add result in new dataframe [dat2].
# Keep old dataframe for comparison [dat].
dat2 <- dat[-rows_to_delete, ]

#查找与两列单元格值内容匹配的行。
使用R base删除行，无需应用更多包：
# Find rows that match content of 2 column cell values.
rows_to_delete <- which(dat$animal == "Item skipped" & dat$Insurance == "Item skipped")

# Delete row. 
# Add result in new dataframe [dat2].
# Keep old dataframe for comparison [dat].
dat2 <- dat[-rows_to_delete, ]

#查找与两列单元格值内容匹配的行。
行到行删除谢谢。如果我想使用dplyr，但想按名称列出列，因为它们不是连续的ie dat%>%filter_at（vars（animal，Insurance），any_vars（！.%in%c（“项目跳过”），monkeyshines我最初犯了一个错误，您可能需要所有_vars
，而不是任何_vars
。我已经更新了答案。谢谢。如果我想使用dplyr，但想按名称列出列，因为它们不是连续的ie dat%>%filter_at（vars（animal，Insurance），any_vars（！.%in%c（“项目跳过”），monkeyshines我最初犯了一个错误，您可能需要所有_vars
，而不是任何_vars
。我已经更新了答案。