R 我在一个数据集中有重复的ID，并且希望在数据列中保留NAs数量最少的ID_R_Duplicates_Na

R 我在一个数据集中有重复的ID，并且希望在数据列中保留NAs数量最少的ID

R 我在一个数据集中有重复的ID，并且希望在数据列中保留NAs数量最少的ID,r,duplicates,na,R,Duplicates,Na,我有一个具有重复ID的数据帧。我希望保留NAs数量最少的ID，以便为ID提供最完整的集合。在本例中，我希望保留第二个123和第二个124租赁NAs 我可以识别重复的，但我不能写一个代码来说明 1对于每个副本，保留NAs较少的副本代码还可以说 2对于每个副本，请删除具有更多NAs的副本下面是示例数据 id Col1 col1 2 col 3 col 4 123 10 NA NA 3 123 50 3 2

我有一个具有重复ID的数据帧。我希望保留NAs数量最少的ID，以便为ID提供最完整的集合。在本例中，我希望保留第二个123和第二个124租赁NAs

我可以识别重复的，但我不能写一个代码来说明 1对于每个副本，保留NAs较少的副本代码还可以说 2对于每个副本，请删除具有更多NAs的副本

下面是示例数据

id    Col1    col1 2   col 3  col 4
123   10       NA       NA     3
123   50       3        2      NA
124   30       5        7      NA 
124   30       8        1      2

您可以按每行中的NAs数量进行排序，然后删除重复项：

需要的年薪 df%>%arrangerowSumsis.nadf%>%filter！duplicatedid%>%arrangeid id Col1 col2 col3 col4 1 123 50 3 2 NA 2 124 30 8 1 2 数据：

df=read.tabletext='id Col1 col2 col3 col4 123 10 NA 3 1235032NA 124 30 5 7 NA 124 30 8 1 2'，收割台=T，带白色=T

您可以按每行中的NAs数量进行排序，然后删除重复项：

需要的年薪 df%>%arrangerowSumsis.nadf%>%filter！duplicatedid%>%arrangeid id Col1 col2 col3 col4 1 123 50 3 2 NA 2 124 30 8 1 2 数据：

df=read.tabletext='id Col1 col2 col3 col4 123 10 NA 3 1235032NA 124 30 5 7 NA 124 30 8 1 2'，收割台=T，带白色=T 我们可以用切片

数据我们可以用切片

数据与第一个类似：df%>%group\u byid%>%top\u n-1，rowSumsis.nacur\u data%>%ungroup与第一个类似：df%>%group\u byid%>%top\u n-1，rowSumsis.nacur\u data%>%ungroup

library(data.table)
setDT(df)

df[order(rowSums(is.na(df))), head(.SD, 1), by = id]

# id Col1 col2 col3 col4
# 1: 124   30    8    1    2
# 2: 123   50    3    2   NA

library(dplyr)
df %>% 
   group_by(id) %>%
   slice(which.min(rowSums(is.na(cur_data())))) %>%
   ungroup
# A tibble: 2 x 5
#     id  Col1  col2  col3  col4
#  <int> <int> <int> <int> <int>
#1   123    50     3     2    NA
#2   124    30     8     1     2

df %>%
  rowwise %>% 
  mutate(cnt = sum(is.na(c_across(-id)))) %>%
  ungroup %>% 
  arrange(id, cnt) %>%
  distinct(id, .keep_all = TRUE)

df <- structure(list(id = c(123L, 123L, 124L, 124L), Col1 = c(10L, 
50L, 30L, 30L), col2 = c(NA, 3L, 5L, 8L), col3 = c(NA, 2L, 7L, 
1L), col4 = c(3L, NA, NA, 2L)), class = "data.frame", row.names = c(NA, 
-4L))