R 将连续运行中的重复值替换为空白_R_Dplyr_Data.table_Grouping

R 将连续运行中的重复值替换为空白

R 将连续运行中的重复值替换为空白,r,dplyr,data.table,grouping,R,Dplyr,Data.table,Grouping,首先，一些数据： library(data.table) # 1. Input table df_input <- data.table( x = c("x1", "x1", "x1", "x2", "x2"), y = c("y1", "y1", "y2", "y1", "y1"), z = c(1:5)) 因此，所需的输出表： df_output <- data.table( x = c("x1", "", "", "x2", ""), y = c("y

首先，一些数据：

library(data.table)

# 1. Input table
df_input <- data.table(
  x = c("x1", "x1", "x1", "x2", "x2"),
  y = c("y1", "y1", "y2", "y1", "y1"),
  z = c(1:5))

因此，所需的输出表：

df_output <- data.table(
  x = c("x1", "", "",  "x2", ""),
  y = c("y1", "", "y2", "y1", ""),
  z = c(1:5))

#     x  y z
# 1: x1 y1 1
# 2:       2
# 3:    y2 3
# 4: x2 y1 4
# 5:       5

如何使用dplyr或data.table包获取输出表

谢谢

我们可以使用带duplicated的rleid将连续的重复值替换为空值

在dplyr中使用它：

我们可以使用带有duplicated的rleid将连续的重复值替换为空值

在dplyr中使用它：

我们可以使用set和data.table

我想不需要rleid我想不需要rleid

df_output <- data.table(
  x = c("x1", "", "",  "x2", ""),
  y = c("y1", "", "y2", "y1", ""),
  z = c(1:5))

#     x  y z
# 1: x1 y1 1
# 2:       2
# 3:    y2 3
# 4: x2 y1 4
# 5:       5

library(data.table)
df_input[, lapply(.SD, function(x) replace(x, duplicated(rleid(x)), ''))]


#    x  y z
#1: x1 y1 1
#2:       2
#3:    y2 3
#4: x2 y1 4
#5:       5

library(dplyr)
df_input %>% mutate_all(~replace(., duplicated(rleid(.)), ''))

library(data.table)
for(j in names(df_input)) 
  set(df_input, i = which(duplicated(rleid(df_input[[j]]))), j = j, value = '')

df_input
#    x  y z
#1: x1 y1 1
#2:       2
#3:    y2 3
#4: x2 y1 4
#5:       5