R 如何获取每个唯一id的摘要
我想为多列中的许多值提取一些摘要统计信息。我的数据如下R 如何获取每个唯一id的摘要,r,unique,plyr,R,Unique,Plyr,我想为多列中的许多值提取一些摘要统计信息。我的数据如下 id pace type value abundance 51 (T) (JC) (L) 0 51 (T) (JC) (L) 0 51
id pace type value abundance
51 (T) (JC) (L) 0
51 (T) (JC) (L) 0
51 (T) (JC) (H) 0
52 (T) (JC) (H) 0
52 (R) (JC) (H) 0
53 (T) (JC) (L) 1
53 (T) (JC) (H) 1
53 (R) (JC) (H) 1
53 (R) (JC) (H) 1
53 (R) (JC) (H) 1
54 (T) (BC) <blank> 0
54 (T) (BC) <blank> 0
54 (T) (BC) <blank> 0
我已经开始编写一些代码:
for (i in levels(df$id))
{
extract.event <- df[df$id==i,]# To identify each section
ppace <- table(extract.event$pace) #count table of pace
ptype <- extract.event$type[1] # extract the first line to be the type
nvalues <- table(extract.event$value) #count table of value
nabundance <- min(extract.event$abundance) #minimum of abundance
d <- cbind(ppace,ptype,forbeh,nvalues,nabundance)
for(i级(df$id))
{
extract.event我不得不重写你的data.frame(为了将来的参考,请粘贴dput的结果,因为我们讨厌重写你的数据),但这是我的尝试。我猜你在寻找与聚合函数类似的东西:
df <- data.frame(id = as.factor(c(51,51,51,52,52,53,53,53,53,53,54,54,54)),
pace = c("(T)","(T)","(T)","(T)","(R)","(T)","(T)","(R)","(R)","(R)","(T)","(T)","(T)"),
type = c("(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(BC)","(BC)","(BC)"), value = c("(L)","(L)","(H)","(H)","(H)","(L)","(H)","(H)","(H)","(H)","<blank>","<blank>","<blank>"),
abundance = c(0,0,0,0,0,1,1,1,1,1,0,0,0))
smallnames <- colnames(do.call("cbind",as.list(aggregate(cbind(value, pace, abundance) ~ id + type, data = lapply(df, as.character), table))))
smallnames
[1] "id" "type" "(H)" "(L)" "<blank>" "(R)" "(T)" "0"
[9] "1"
df.new <- do.call("data.frame", as.list(aggregate(cbind(value, pace, abundance) ~ id + type, data = lapply(df, as.character), table)))
colnames(df.new) <- smallnames
df.new$abundance <- df.new$`1`
df.new
id type (H) (L) <blank> (R) (T) 0 1 abundance
1 54 (BC) 0 0 3 0 3 3 0 0
2 51 (JC) 1 2 0 0 3 3 0 0
3 52 (JC) 2 0 0 1 1 2 0 0
4 53 (JC) 4 1 0 3 2 0 5 5
df.final <- df.new[, -which(colnames(df.new) %in% c("<blank>","0","1"))]
df.final
id type (H) (L) (R) (T) abundance
1 54 (BC) 0 0 0 3 0
2 51 (JC) 1 2 0 3 0
3 52 (JC) 2 0 1 1 0
4 53 (JC) 4 1 3 2 5
df参见第2部分与所需数据帧输出类似的编辑aggregate(cbind(值、速度、丰度)~id+type,data=lapply(df,as.character),table)
似乎更容易实现这一点。这是一行整洁的代码,但您会注意到,一旦添加了大量的代码,您将处理0和1的计数。此外,聚合实际上会留下一个包含矩阵的数据帧。Part1在as.list聚合对象上使用do.call+cbind,以使名称更清晰(按要求)并正确格式化(一个简单的data.frame)。第2部分(虽然看起来很难看)只是强迫事情看起来像OP要求的那样。不过我确实喜欢使用cbind,+1是的,你是对的,它需要do.call(…
在前面。从丰度的预期结果来看,似乎他们想要1
如果id有任何值,否则为零,所以通过分离丰度很容易得到。(ps我想你可以做do.call(data.frame
查看我的编辑,我对其进行了修改,包括采纳了用户2957945的建议。我还纠正了丰度,因为我意识到它是在计算零,而不是只计算1。
df <- data.frame(id = as.factor(c(51,51,51,52,52,53,53,53,53,53,54,54,54)),
pace = c("(T)","(T)","(T)","(T)","(R)","(T)","(T)","(R)","(R)","(R)","(T)","(T)","(T)"),
type = c("(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(JC)","(BC)","(BC)","(BC)"), value = c("(L)","(L)","(H)","(H)","(H)","(L)","(H)","(H)","(H)","(H)","<blank>","<blank>","<blank>"),
abundance = c(0,0,0,0,0,1,1,1,1,1,0,0,0))
smallnames <- colnames(do.call("cbind",as.list(aggregate(cbind(value, pace, abundance) ~ id + type, data = lapply(df, as.character), table))))
smallnames
[1] "id" "type" "(H)" "(L)" "<blank>" "(R)" "(T)" "0"
[9] "1"
df.new <- do.call("data.frame", as.list(aggregate(cbind(value, pace, abundance) ~ id + type, data = lapply(df, as.character), table)))
colnames(df.new) <- smallnames
df.new$abundance <- df.new$`1`
df.new
id type (H) (L) <blank> (R) (T) 0 1 abundance
1 54 (BC) 0 0 3 0 3 3 0 0
2 51 (JC) 1 2 0 0 3 3 0 0
3 52 (JC) 2 0 0 1 1 2 0 0
4 53 (JC) 4 1 0 3 2 0 5 5
df.final <- df.new[, -which(colnames(df.new) %in% c("<blank>","0","1"))]
df.final
id type (H) (L) (R) (T) abundance
1 54 (BC) 0 0 0 3 0
2 51 (JC) 1 2 0 3 0
3 52 (JC) 2 0 1 1 0
4 53 (JC) 4 1 3 2 5