根据R中的规则从data.frame中删除行
在data.frame中,如果在所有其他列中有另一行具有相同信息,则我希望自动删除根据R中的规则从data.frame中删除行,r,dataframe,row,remove,R,Dataframe,Row,Remove,在data.frame中,如果在所有其他列中有另一行具有相同信息,则我希望自动删除列上带有“NA”的行,例如: Column_A Column_B Column_C Column_D Column_E A121 NAME1 A321 2019-01-01 NA A121 NAME1 A321 2019-01-01 2020-02-01 A123 NAME2 A322
列
上带有“NA”的行,例如:
Column_A Column_B Column_C Column_D Column_E
A121 NAME1 A321 2019-01-01 NA
A121 NAME1 A321 2019-01-01 2020-02-01
A123 NAME2 A322 2019-01-01 2020-01-01
A123 NAME2 A322 2019-01-01 NA
A124 NAME3 A323 2019-01-01 2019-01-01
A124 NAME4 A324 2019-01-01 NA
输出应为:
Column_A Column_B Column_C Column_D Column_E
A121 NAME1 A321 2019-01-01 2020-02-01
A123 NAME2 A322 2019-01-01 2020-01-01
A124 NAME3 A323 2019-01-01 2019-01-01
A124 NAME4 A324 2019-01-01 NA
有什么想法吗?您可以选择没有
NA
值或组中只有一行的行
library(dplyr)
df %>%
group_by(across(Column_A:Column_D)) %>%
filter(!is.na(Column_E) | n() == 1)
# Column_A Column_B Column_C Column_D Column_E
# <chr> <chr> <chr> <chr> <chr>
#1 A121 NAME1 A321 2019-01-01 2020-02-01
#2 A123 NAME2 A322 2019-01-01 2020-01-01
#3 A124 NAME3 A323 2019-01-01 2019-01-01
#4 A124 NAME4 A324 2019-01-01 NA
和基准R:
subset(df, ave(!is.na(Column_E),Column_A, Column_B, Column_C, Column_D,
FUN = function(x) x | length(x) == 1))
数据
df <- structure(list(Column_A = c("A121", "A121", "A123", "A123", "A124",
"A124"), Column_B = c("NAME1", "NAME1", "NAME2", "NAME2", "NAME3",
"NAME4"), Column_C = c("A321", "A321", "A322", "A322", "A323",
"A324"), Column_D = c("2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-01", "2019-01-01"), Column_E = c(NA, "2020-02-01",
"2020-01-01", NA, "2019-01-01", NA)), class = "data.frame",
row.names = c(NA, -6L))
df您可以选择没有NA
值或组中只有一行的行
library(dplyr)
df %>%
group_by(across(Column_A:Column_D)) %>%
filter(!is.na(Column_E) | n() == 1)
# Column_A Column_B Column_C Column_D Column_E
# <chr> <chr> <chr> <chr> <chr>
#1 A121 NAME1 A321 2019-01-01 2020-02-01
#2 A123 NAME2 A322 2019-01-01 2020-01-01
#3 A124 NAME3 A323 2019-01-01 2019-01-01
#4 A124 NAME4 A324 2019-01-01 NA
和基准R:
subset(df, ave(!is.na(Column_E),Column_A, Column_B, Column_C, Column_D,
FUN = function(x) x | length(x) == 1))
数据
df <- structure(list(Column_A = c("A121", "A121", "A123", "A123", "A124",
"A124"), Column_B = c("NAME1", "NAME1", "NAME2", "NAME2", "NAME3",
"NAME4"), Column_C = c("A321", "A321", "A322", "A322", "A323",
"A324"), Column_D = c("2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-01", "2019-01-01"), Column_E = c(NA, "2020-02-01",
"2020-01-01", NA, "2019-01-01", NA)), class = "data.frame",
row.names = c(NA, -6L))
df这太棒了,谢谢!因为“跨越”对我不起作用,我改为“分组方式(A列、B列、C列、D列)”。这太棒了,谢谢!因为“跨越”不适用于我,所以我改为“分组方式(A列、B列、C列、D列)”。