Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 删除数据帧中变量的镜像组合_R_Purrr - Fatal编程技术网

R 删除数据帧中变量的镜像组合

R 删除数据帧中变量的镜像组合,r,purrr,R,Purrr,我希望得到两个变量的每个唯一组合: library(purrr) cross_df(list(id1 = seq_len(3), id2 = seq_len(3)), .filter = `==`) # A tibble: 6 x 2 id1 id2 <int> <int> 1 2 1 2 3 1 3 1 2 4 3 2 5 1 3 6 2 3 基本R方法:

我希望得到两个变量的每个唯一组合:

library(purrr)
cross_df(list(id1 = seq_len(3), id2 = seq_len(3)), .filter = `==`)
# A tibble: 6 x 2
    id1   id2
  <int> <int>
1     2     1
2     3     1
3     1     2
4     3     2
5     1     3
6     2     3
基本R方法:

# create a string with the sorted elements of the row
df$temp <- apply(df, 1, function(x) paste(sort(x), collapse=""))

# then you can simply keep rows with a unique sorted-string value
df[!duplicated(df$temp), 1:2]
#使用行的排序元素创建一个字符串

df$temp丹答案的简洁版本:

cross_df(list(id1 = seq_len(3), id2 = seq_len(3)), .filter = `==`) %>% 
  mutate(min = pmap_int(., min), max = pmap_int(., max)) %>% # Find the min and max in each row
  unite(check, c(min, max), remove = FALSE) %>% # Combine them in a "check" variable
  distinct(check, .keep_all = TRUE) %>% # Remove duplicates of the "check" variable
  select(id1, id2)

# A tibble: 3 x 2
    id1   id2
  <int> <int>
1     2     1
2     3     1
3     3     2
cross_-df(列表(id1=seq_-len(3),id2=seq_-len(3)),.filter=`=`=`]>%
变异(min=pmap_int(,min),max=pmap_int(,max))%>%#在每行中找到最小值和最大值
unite(检查,c(最小值,最大值),remove=FALSE)%>%#将它们组合在一个“检查”变量中
不同(检查,.keep_all=TRUE)%>%#删除“check”变量的重复项
选择(id1、id2)
#一个tibble:3x2
id1 id2
1     2     1
2     3     1
3     3     2

刚刚想出来:
df%%>%mutate(sum=id1+id2)%%>%distinct(sum,.keep_all=TRUE)%%>%select(-sum)
应该这样做。我担心的是,如果有两行具有相同的和但不同的数字,例如,(2,4)和(1,5),那么您将只保留其中一行。请看下面我提出的解决方案。嗨,丹,你上面的评论很有道理。谢谢你。我还开发了一个tidyverse解决方案,它的功能与您的答案类似。我会分开发的。
# create a string with the sorted elements of the row
df$temp <- apply(df, 1, function(x) paste(sort(x), collapse=""))

# then you can simply keep rows with a unique sorted-string value
df[!duplicated(df$temp), 1:2]
cross_df(list(id1 = seq_len(3), id2 = seq_len(3)), .filter = `==`) %>% 
  mutate(min = pmap_int(., min), max = pmap_int(., max)) %>% # Find the min and max in each row
  unite(check, c(min, max), remove = FALSE) %>% # Combine them in a "check" variable
  distinct(check, .keep_all = TRUE) %>% # Remove duplicates of the "check" variable
  select(id1, id2)

# A tibble: 3 x 2
    id1   id2
  <int> <int>
1     2     1
2     3     1
3     3     2