R 将连续运行中的重复值替换为空白
首先,一些数据:R 将连续运行中的重复值替换为空白,r,dplyr,data.table,grouping,R,Dplyr,Data.table,Grouping,首先,一些数据: library(data.table) # 1. Input table df_input <- data.table( x = c("x1", "x1", "x1", "x2", "x2"), y = c("y1", "y1", "y2", "y1", "y1"), z = c(1:5)) 因此,所需的输出表: df_output <- data.table( x = c("x1", "", "", "x2", ""), y = c("y
library(data.table)
# 1. Input table
df_input <- data.table(
x = c("x1", "x1", "x1", "x2", "x2"),
y = c("y1", "y1", "y2", "y1", "y1"),
z = c(1:5))
因此,所需的输出表:
df_output <- data.table(
x = c("x1", "", "", "x2", ""),
y = c("y1", "", "y2", "y1", ""),
z = c(1:5))
# x y z
# 1: x1 y1 1
# 2: 2
# 3: y2 3
# 4: x2 y1 4
# 5: 5
如何使用dplyr或data.table包获取输出表
谢谢我们可以使用带duplicated的rleid将连续的重复值替换为空值
在dplyr中使用它:
我们可以使用带有duplicated的rleid将连续的重复值替换为空值
在dplyr中使用它:
我们可以使用set和data.table
我们可以使用set和data.table
我想不需要rleid我想不需要rleid
df_output <- data.table(
x = c("x1", "", "", "x2", ""),
y = c("y1", "", "y2", "y1", ""),
z = c(1:5))
# x y z
# 1: x1 y1 1
# 2: 2
# 3: y2 3
# 4: x2 y1 4
# 5: 5
library(data.table)
df_input[, lapply(.SD, function(x) replace(x, duplicated(rleid(x)), ''))]
# x y z
#1: x1 y1 1
#2: 2
#3: y2 3
#4: x2 y1 4
#5: 5
library(dplyr)
df_input %>% mutate_all(~replace(., duplicated(rleid(.)), ''))
library(data.table)
for(j in names(df_input))
set(df_input, i = which(duplicated(rleid(df_input[[j]]))), j = j, value = '')
df_input
# x y z
#1: x1 y1 1
#2: 2
#3: y2 3
#4: x2 y1 4
#5: 5