R 使用频率计数为每个唯一项（单词）创建新列_R

R 使用频率计数为每个唯一项（单词）创建新列

R 使用频率计数为每个唯一项（单词）创建新列,r,R,一般来说，我对R和编程相当陌生，并且一直在努力解决以下问题我有一个如下所示的数据帧： id animals 1 cat dog 2 cat pig dog fish fish 3 horse horse 我想为每只动物创建一个新列，其中包含每个id的频率计数： id cat dog fish horse pig 1 1 1 0 0 0 2 1 1 2 0 1

一般来说，我对R和编程相当陌生，并且一直在努力解决以下问题

我有一个如下所示的数据帧：

id     animals
 1     cat dog
 2     cat pig dog fish fish
 3     horse horse

我想为每只动物创建一个新列，其中包含每个id的频率计数：

id    cat  dog  fish  horse  pig
 1     1    1     0     0     0
 2     1    1     2     0     1
 3     0    0     0     2     0

我如何做到这一点

示例dput：

structure(list(id = 1:3, animals = structure(1:3, .Label = c("cat dog", 
    "cat pig dog fish fish", "horse horse"), class = "factor")), .Names = c("id", 
    "animals"), class = "data.frame", row.names = c(NA, -3L))

我们可以做到以下几点：

df %>%
    separate_rows(animals) %>%
    count(id, animals) %>%
    spread(animals, n, fill = 0)
## A tibble: 3 x 6
#     id   cat   dog  fish horse   pig
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1    1.    1.    1.    0.    0.    0.
#2    2.    1.    1.    2.    0.    1.
#3    3.    0.    0.    0.    2.    0.

df%>%
分隔行（动物）%>%
计数（id，动物）%>%
扩散（动物，n，填充=0）
##一个tibble:3x6
#猫狗鱼马猪
#       
#1    1.    1.1.000
#2    2.    1.1.2.01.
#3    3.    0002.0

样本数据

df我们可以执行以下操作：
df %>%
    separate_rows(animals) %>%
    count(id, animals) %>%
    spread(animals, n, fill = 0)
## A tibble: 3 x 6
#     id   cat   dog  fish horse   pig
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1    1.    1.    1.    0.    0.    0.
#2    2.    1.    1.    2.    0.    1.
#3    3.    0.    0.    0.    2.    0.

df%>%
分隔行（动物）%>%
计数（id，动物）%>%
扩散（动物，n，填充=0）
##一个tibble:3x6
#猫狗鱼马猪
#       
#1    1.    1.1.000
#2    2.    1.1.2.01.
#3    3.    0002.0


样本数据
df包含数据的一行。表可能是：
library(data.table)
dcast(setDT(df)[, unlist(strsplit(as.character(animals), " ")), by = id], id ~  V1)

#  id cat dog fish horse pig
#1  1   1   1    0     0   0
#2  2   1   1    2     0   1
#3  3   0   0    0     2   0


或者，作为另一个选项，您可以在重塑2
中使用dcast
：
library(reshape2)
spl <- strsplit(as.character(df$animals), " ")
df_m <- data.frame(id = rep(df$id, times = lengths(spl)), animals = unlist(spl))
dcast(df_m, id ~ animals)

library（重塑2）
spl包含数据的一行。表可能是：
library(data.table)
dcast(setDT(df)[, unlist(strsplit(as.character(animals), " ")), by = id], id ~  V1)

#  id cat dog fish horse pig
#1  1   1   1    0     0   0
#2  2   1   1    2     0   1
#3  3   0   0    0     2   0


或者，作为另一个选项，您可以在重塑2
中使用dcast
：
library(reshape2)
spl <- strsplit(as.character(df$animals), " ")
df_m <- data.frame(id = rep(df$id, times = lengths(spl)), animals = unlist(spl))
dcast(df_m, id ~ animals)

library（重塑2）
spl您可以从tidytext
中选择unnest\u令牌
：
library(tidyverse)
library(tidytext)

x %>%  unnest_tokens(word,animals) %>%  table()

数据：
x <- structure(list(id = 1:3, animals = c("cat dog", "cat pig dog fish fish", 
"horse horse")), .Names = c("id", "animals"), row.names = c(NA, 
-3L), class = "data.frame")

顺便说一句：我喜欢这本书，如果你对tidytext分析感兴趣，那它是一本必读的书：
你可以从tidytext
中选择unnest_tokens
：
library(tidyverse)
library(tidytext)

x %>%  unnest_tokens(word,animals) %>%  table()

数据：
x <- structure(list(id = 1:3, animals = c("cat dog", "cat pig dog fish fish", 
"horse horse")), .Names = c("id", "animals"), row.names = c(NA, 
-3L), class = "data.frame")

顺便说一句：我喜欢这本书，如果你对tidytext分析感兴趣，这本书是必读的：
你可以使用单独的@rowsinded@kath，谢谢！spread有一个填充选项，您可以在其中再次指定fill=0
Right@kath；现在显然已经太晚了。我应该注销。你可以使用单独的_rowsindecd@kath和谢谢，而不是变异和不耐烦！spread有一个填充选项，您可以在其中再次指定fill=0
Right@kath；现在显然已经太晚了。我该签字了。