获取基于另一列的dataframe列中字符串连续出现的计数

获取基于另一列的dataframe列中字符串连续出现的计数,r,aggregate,R,Aggregate,我需要知道在某个数据帧的列中,一个值出现了多少次 主要逻辑是根据另一列获取特定字符串的出现次数 例如: df<- data.frame(fruits = c("apples", "apples", "orange", "pears", "apples", "pears", "pears", "papaya", "papaya"), veggies = c("beans", "carrots", "carrots", "carrots", "brinjal"

我需要知道在某个数据帧的列中,一个值出现了多少次

主要逻辑是根据另一列获取特定字符串的出现次数

例如:

df<- data.frame(fruits = c("apples", "apples", "orange", "pears", "apples", "pears", "pears", "papaya", "papaya"), 
                veggies = c("beans", "carrots", "carrots", "carrots", "brinjal","carrots", "brinjal", "brinjal", "beans"),
                branches=c( "Area1", "Area1", "Area1", "Area2","Area2","Area2", "Area2", "Area3", "Area3" ))
输出通常显示所有分支的苹果和其余水果的总数。我需要得到每个分支的准确计数

我所需的输出应基于列
df$branchs

for Area1
   apples-2 orange-1,
for Area2 
   pears-3 apples-1
for Area3 
   papaya-3
试试这个:

library(data.table)
setDT(df)[,list(count=.N),list(branches, fruits)]

#   branches fruits count
#1:    Area1 apples     2
#2:    Area1 orange     1
#3:    Area2  pears     3
#4:    Area2 apples     1
#5:    Area3 papaya     2

也许只需使用
ftable

> ftable(fruits ~ branches, data = df)
         fruits apples orange papaya pears
branches                                  
Area1                2      1      0     0
Area2                1      0      0     3
Area3                0      0      2     0
> ftable(veggies ~ branches, data = df)
         veggies beans brinjal carrots
branches                              
Area1                1       0       2
Area2                0       2       2
Area3                1       1       0

我不知道您期望的输出,但您可以使用dplyr包获取计数:

例如:

library(dplyr)
df %>% count(fruits, branches)
# OR
count(df, fruits, branches)
输出:

Source: local data frame [5 x 3]
Groups: fruits

  fruits branches n
1 apples    Area1 2
2 apples    Area2 1
3 orange    Area1 1
4 papaya    Area3 2
5  pears    Area2 3

谢谢您的回复,上校……但它在setDT(df)中为我抛出了错误:无法通过引用将'df'转换为data.table,因为绑定已锁定。“df”很可能位于一个包(或环境)中,该包被锁定以防止修改其变量绑定。尝试将对象复制到您当前的环境中,例如:var您可以执行df1=df,然后将上述操作应用于df1吗?@Neha,您粘贴的错误消息准确地告诉您要执行的操作!使用data.table的另一个原因是错误消息非常详细!我们可以根据特定时间间隔的时间戳列获取计数吗?例如,在10分钟间隔之间,我们可以得到重复字符串的计数吗?可能吗?@Neha您的示例数据没有时间戳。如果您有更具体的问题,请编辑您的原始帖子,以包含此信息和您想要的结果。例如,我在data.frame中有date with time列,从05-SEP-14 07.22.13 Am到05-SEP-14 10.22.13 PM。我可以得到每30分钟售出的苹果数吗?这称为聚合。具体来说,您希望通过分支聚合水果或蔬菜。
Source: local data frame [5 x 3]
Groups: fruits

  fruits branches n
1 apples    Area1 2
2 apples    Area2 1
3 orange    Area1 1
4 papaya    Area3 2
5  pears    Area2 3