Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/79.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 二元年份数据中的重复列_R_Function_Dataframe_Dplyr - Fatal编程技术网

R 二元年份数据中的重复列

R 二元年份数据中的重复列,r,function,dataframe,dplyr,R,Function,Dataframe,Dplyr,正如标题所示,我有一个二元年份数据。问题是我(出于某种原因…)重复了二元列名——例如,如下所示,A到A和B到B的观察结果毫无意义。实际数据超过70000次观测 PERSON1 PERSON2 year A A 1990 A A 1991 A A 1992 A B 1990

正如标题所示,我有一个二元年份数据。问题是我(出于某种原因…)重复了二元列名——例如,如下所示,A到A和B到B的观察结果毫无意义。实际数据超过70000次观测

PERSON1     PERSON2      year     
   A           A          1990    
   A           A          1991    
   A           A          1992    
   A           B          1990    
   A           B          1991    
   A           B          1992   
   A           C          1990   
   A           C          1991   
   A           C          1992    
   B           B          1990    
   B           B          1991    
   B           B          1992    
   ...
我想做的是生成一个虚拟变量,它将指示相同的并矢观测值

PERSON1     PERSON2      year     
   A           A          1990    
   A           A          1991    
   A           A          1992    
   A           B          1990    
   A           B          1991    
   A           B          1992   
   A           C          1990   
   A           C          1991   
   A           C          1992    
   B           B          1990    
   B           B          1991    
   B           B          1992    
   ...
函数
duplicated()
与其他基本R命令一起没有帮助,因为它是二元数据

这是一个可复制的例子

structure(list(PERSON1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "G"), class = "factor"), 
    PERSON2 = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 
    1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 
    3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), 
    year = c(1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 
    1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 
    1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 
    1991L, 1992L)), .Names = c("PERSON1", "PERSON2", "year"), class = "data.frame", row.names = c(NA, 
-27L))
所需输出(复制虚拟)


通过比较“PERSON1”和“PERSON2”,我们可以很容易地做到这一点

setDT(df1)[, duplicate := as.integer(as.character(PERSON1) == as.character(PERSON2))]
 head(df1, 15)
#    PERSON1 PERSON2 year duplicate
# 1:       A       A 1990         1
# 2:       A       A 1991         1
# 3:       A       A 1992         1
# 4:       A       B 1990         0
# 5:       A       B 1991         0
# 6:       A       B 1992         0
# 7:       A       C 1990         0
# 8:       A       C 1991         0
# 9:       A       C 1992         0
#10:       B       A 1990         0
#11:       B       A 1991         0
#12:       B       A 1992         0
#13:       B       B 1990         1
#14:       B       B 1991         1
#15:       B       B 1992         1

或使用
base R

transform(df1, duplicate = as.integer(as.character(PERSON1)== as.character(PERSON2)))

通过比较“PERSON1”和“PERSON2”,我们可以很容易地做到这一点

setDT(df1)[, duplicate := as.integer(as.character(PERSON1) == as.character(PERSON2))]
 head(df1, 15)
#    PERSON1 PERSON2 year duplicate
# 1:       A       A 1990         1
# 2:       A       A 1991         1
# 3:       A       A 1992         1
# 4:       A       B 1990         0
# 5:       A       B 1991         0
# 6:       A       B 1992         0
# 7:       A       C 1990         0
# 8:       A       C 1991         0
# 9:       A       C 1992         0
#10:       B       A 1990         0
#11:       B       A 1991         0
#12:       B       A 1992         0
#13:       B       B 1990         1
#14:       B       B 1991         1
#15:       B       B 1992         1

或使用
base R

transform(df1, duplicate = as.integer(as.character(PERSON1)== as.character(PERSON2)))
可能
df$dupe可能
df$dupe