R 二元年份数据中的重复列
正如标题所示,我有一个二元年份数据。问题是我(出于某种原因…)重复了二元列名——例如,如下所示,A到A和B到B的观察结果毫无意义。实际数据超过70000次观测R 二元年份数据中的重复列,r,function,dataframe,dplyr,R,Function,Dataframe,Dplyr,正如标题所示,我有一个二元年份数据。问题是我(出于某种原因…)重复了二元列名——例如,如下所示,A到A和B到B的观察结果毫无意义。实际数据超过70000次观测 PERSON1 PERSON2 year A A 1990 A A 1991 A A 1992 A B 1990
PERSON1 PERSON2 year
A A 1990
A A 1991
A A 1992
A B 1990
A B 1991
A B 1992
A C 1990
A C 1991
A C 1992
B B 1990
B B 1991
B B 1992
...
我想做的是生成一个虚拟变量,它将指示相同的并矢观测值
PERSON1 PERSON2 year
A A 1990
A A 1991
A A 1992
A B 1990
A B 1991
A B 1992
A C 1990
A C 1991
A C 1992
B B 1990
B B 1991
B B 1992
...
函数duplicated()
与其他基本R命令一起没有帮助,因为它是二元数据
这是一个可复制的例子
structure(list(PERSON1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "G"), class = "factor"),
PERSON2 = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,
1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
year = c(1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 1990L,
1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 1990L,
1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 1990L,
1991L, 1992L)), .Names = c("PERSON1", "PERSON2", "year"), class = "data.frame", row.names = c(NA,
-27L))
所需输出(复制虚拟)
通过比较“PERSON1”和“PERSON2”,我们可以很容易地做到这一点
setDT(df1)[, duplicate := as.integer(as.character(PERSON1) == as.character(PERSON2))]
head(df1, 15)
# PERSON1 PERSON2 year duplicate
# 1: A A 1990 1
# 2: A A 1991 1
# 3: A A 1992 1
# 4: A B 1990 0
# 5: A B 1991 0
# 6: A B 1992 0
# 7: A C 1990 0
# 8: A C 1991 0
# 9: A C 1992 0
#10: B A 1990 0
#11: B A 1991 0
#12: B A 1992 0
#13: B B 1990 1
#14: B B 1991 1
#15: B B 1992 1
或使用
base R
transform(df1, duplicate = as.integer(as.character(PERSON1)== as.character(PERSON2)))
通过比较“PERSON1”和“PERSON2”,我们可以很容易地做到这一点
setDT(df1)[, duplicate := as.integer(as.character(PERSON1) == as.character(PERSON2))]
head(df1, 15)
# PERSON1 PERSON2 year duplicate
# 1: A A 1990 1
# 2: A A 1991 1
# 3: A A 1992 1
# 4: A B 1990 0
# 5: A B 1991 0
# 6: A B 1992 0
# 7: A C 1990 0
# 8: A C 1991 0
# 9: A C 1992 0
#10: B A 1990 0
#11: B A 1991 0
#12: B A 1992 0
#13: B B 1990 1
#14: B B 1991 1
#15: B B 1992 1
或使用
base R
transform(df1, duplicate = as.integer(as.character(PERSON1)== as.character(PERSON2)))
可能df$dupe可能df$dupe