Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/82.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
根据条件对data.frame进行分区_R_Partitioning - Fatal编程技术网

根据条件对data.frame进行分区

根据条件对data.frame进行分区,r,partitioning,R,Partitioning,我有一个data.frame形状如下: c <- data.frame(name=c("a", "a", "b", "b", "c", "c","d","d"), value=c(1,3,2,4,5,3,4,5), address=c("rrrr","rrrr","zzzz","aaaa","ssss","jjjj","qqqq","qqqq")) > c name value address 1 a 1 rrrr 2 a 3 rrrr 3

我有一个data.frame形状如下:

c <- data.frame(name=c("a", "a", "b", "b", "c", "c","d","d"), value=c(1,3,2,4,5,3,4,5), address=c("rrrr","rrrr","zzzz","aaaa","ssss","jjjj","qqqq","qqqq"))
> c
  name value address
1    a     1    rrrr
2    a     3    rrrr
3    b     2    zzzz 
4    b     4    aaaa
5    c     5    ssss
6    c     3    jjjj
7    d     4    qqqq
8    d     5    qqqq 

使用
dplyr

library(dplyr)
z<-c %>% group_by(name) %>% 
         mutate(changed = n_distinct(address))
split(z, z$changed)
库(dplyr)
z%分组单位(名称)%>%
变异(已更改=n_不同(地址))
拆分(z,z$已更改)

感谢@akrun提醒我n_distinct

@jeremycg的答案很好,我正在尝试学习dplyr,但这里还有非dplyr版本

numAddresses <- sapply(split(c, c$name), function(x)
    length(unique(x$address)))
split(c, numAddresses[c$address])

NumadAddresses这将在此基础上计算地址和拆分的数量。有一个障碍需要克服,它涉及到始终从
ave
获取
,直到使用
作为.character
。有一条警告消息,我正在从中复制开头,因此搜索者可能能够找到:

Warning messages:
1: In `[<-.factor`(`*tmp*`, i, value = c(1L, 1L)) :
如果您真的想要二分体拆分,请使用
>1
转换为逻辑拆分:

 split(cc, ave(as.character(cc$address), cc$name, FUN=function(x) sum(!duplicated(x)) ) >1)

$`FALSE`
  name value address
1    a     1    rrrr
2    a     3    rrrr
7    d     4    qqqq
8    d     5    qqqq

$`TRUE`
  name value address
3    b     2    zzzz
4    b     4    aaaa
5    c     5    ssss
6    c     3    jjjj
我不明白这个评论。这就是我得到的
str(dat)


您不应该使用
c
作为变量,因为它是一个用于连接对象的内置函数。这实际上是一个划分为非重叠子集的请求。@Bonded您的意思是我必须更改标题还是重写问题?@cr1msonB1ade您说得对!Thanks@SimoneGabbriellini当前位置我添加了一个我认为最准确的标签,它似乎没有卡住。我会再试一次,但将标题更改为将
子集
替换为
分区
也可能有助于搜索者。您可以使用
n_distinct
而不是
长度(unique
这是否有效?当输入为我工作时,我会得到整个数据集。如果希望保存拆分调用,显然必须将其存储到另一个变量中,但拆分调用的输出应与请求的内容相对应。唯一的缺点是,返回的是两个列表,而不是两个数据帧,并使用as.data.frames设置我不理解的列名称。首先,我得到一个列表。其次,
dat[[1]]
dat[[2]]
都应该是与源具有相同列名的数据帧。
 split(cc,  ave(as.character(cc$address), cc$name, FUN=function(x) sum(!duplicated(x)) ) )

$`1`
  name value address
1    a     1    rrrr
2    a     3    rrrr
7    d     4    qqqq
8    d     5    qqqq

$`2`
  name value address
3    b     2    zzzz
4    b     4    aaaa
5    c     5    ssss
6    c     3    jjjj
 split(cc, ave(as.character(cc$address), cc$name, FUN=function(x) sum(!duplicated(x)) ) >1)

$`FALSE`
  name value address
1    a     1    rrrr
2    a     3    rrrr
7    d     4    qqqq
8    d     5    qqqq

$`TRUE`
  name value address
3    b     2    zzzz
4    b     4    aaaa
5    c     5    ssss
6    c     3    jjjj
List of 2
 $ FALSE:'data.frame':  4 obs. of  3 variables:
  ..$ name   : Factor w/ 4 levels "a","b","c","d": 1 1 4 4
  ..$ value  : num [1:4] 1 3 4 5
  ..$ address: Factor w/ 6 levels "aaaa","jjjj",..: 4 4 3 3
 $ TRUE :'data.frame':  4 obs. of  3 variables:
  ..$ name   : Factor w/ 4 levels "a","b","c","d": 2 2 3 3
  ..$ value  : num [1:4] 2 4 5 3
  ..$ address: Factor w/ 6 levels "aaaa","jjjj",..: 6 1 5 2