长向量中的R计数出现次数_R_Count

长向量中的R计数出现次数

长向量中的R计数出现次数,r,count,R,Count,我有一个6249行长的数据框，里面充满了字符类型的数据，可能会变得更大我想计算每个字符串出现的次数。通常我会使用表（df）或但他们似乎都在排了250行之后停了下来是否有其他函数或方法强制count（）或table（）继续6000多个结果？正如@Gregor所注意到的，您似乎错误地解释了table输出，而实际上它正在进行正确的计数。无论如何，这里有一个使用Reduce的解决方案，您应该将数据帧指示的df和string列名替换为您正在计算的实际数据帧的列名 # let's create so

我有一个6249行长的数据框，里面充满了字符类型的数据，可能会变得更大

我想计算每个字符串出现的次数。通常我会使用

表（df）

或

但他们似乎都在排了250行之后停了下来

是否有其他函数或方法强制count（）或table（）继续6000多个结果？

正如@Gregor所注意到的，您似乎错误地解释了

table

输出，而实际上它正在进行正确的计数。无论如何，这里有一个使用

Reduce

的解决方案，您应该将数据帧指示的

df

和

string

列名替换为您正在计算的实际数据帧的列名

# let's create some dataframe with three strings randomly distributed of length 1000
df <- data.frame(string = unlist(lapply(round(runif(1000, 1, 3)), function(i) c('hi', 'ok', 'my cat')[i])))
my.count <- function(word, df) {
  # now let's count how many 'b' we found
  Reduce(function(acc, r) {
    # replace 'string' by the name of the column of your dataframe over which you want to count
    if(r$string == word)
      acc + 1
    else
      acc
  }, apply(df, 1, as.list), init = 0)
}

# count how many 'my cat' strings are in the df dataframe at column 'string', replace with yours
my.count('my cat', df)
# now let's try to find the frequency of all of them
uniq <- unique(df$string)
freq <- unlist(lapply(uniq, my.count, df))
names(freq) <- uniq
freq
# output 
# ok my cat     hi 
# 490    261    249
# we can check indeed that the sum is 1000
sum(freq)
# [1] 1000

#让我们创建一些数据帧，其中三个字符串随机分布，长度为1000
df对任何大小的数据帧执行此操作的简单方法是在数据帧中添加一个count
字段，然后使用doBy
包对string
字段进行汇总，如下所示：
require(doBy)
df$count <- 1
result <- summaryBy(count ~ string, data = df, FUN = sum, keep.names = TRUE)

require（多比）
df$count好吧，这不会很流行，但最终我通过for循环和获取子集中的行数达到了预期的结果
y <- as.numeric(vector())
x <- as.numeric(vector())
for (i in test$token){
x <- as.numeric(nrow(df[(df$token == i),]))

y <- c(y, x)

}

y您只是想计算数据帧中的行数吗？如果是这样，请使用nrow（df）
No，我试图计算每个结果出现的tme数。例如，如果我有一个向量x表
并没有停止计数-默认的打印行为只是将其截断。尝试tt=table（runif（6000））
，length（tt）
，head（tt）
，tail（tt）
…是否在df
中设置计数列？如果是这样，请尝试聚合（.~string，df，function（x）length（unique（x））
。否则，正如@Gregor所说，表应该可以工作。
y <- as.numeric(vector())
x <- as.numeric(vector())
for (i in test$token){
x <- as.numeric(nrow(df[(df$token == i),]))

y <- c(y, x)

}