R 使用字数将数据帧转换为TIBLE_R_Dataframe_Tibble_Tidytext

R 使用字数将数据帧转换为TIBLE

r dataframe

R 使用字数将数据帧转换为TIBLE,r,dataframe,tibble,tidytext,R,Dataframe,Tibble,Tidytext,我正试图根据这些数据进行情绪分析。在执行情绪分析之前，我需要将我的数据集转换为整洁的格式我的数据集的格式为： x <- c( "test1" , "test2") y <- c( "this is test text1" , "this is test text2") res <- data.frame( "url" = x, "text" = y) res url text 1 test1 this is test text1 2 tes

我正试图根据这些数据进行情绪分析。在执行情绪分析之前，我需要将我的数据集转换为整洁的格式

我的数据集的格式为：

x <- c( "test1" , "test2")
y <- c( "this is test text1" , "this is test text2")
res <- data.frame( "url" = x, "text" = y)
res
    url               text
1 test1 this is test text1
2 test2 this is test text2

我正在尝试转换为tibble，因为这似乎是tidytextmining情绪分析所需的格式：

您正在寻找类似的格式吗？当您想使用tidytext软件包处理情绪分析时，需要使用

unnest_tokens（）

将每个字符串中的单词分开。这个函数可以做的不仅仅是将文本分割成单词。如果您想稍后查看该函数。一旦每行有一个单词，就可以使用

count（）

计算每个单词在每个文本中出现的次数。然后，您要删除停止词。tidytext包中有数据，因此可以调用它。最后，你需要有情绪信息。在这里，我选择了AFINN，但如果你愿意，你可以选择另一家。我希望这对你有帮助

x <- c( "text1" , "text2")
y <- c( "I am very happy and feeling great." , "I am very sad and feeling low")
res <- data.frame( "url" = x, "text" = y, stringsAsFactors = F)

#    url                               text
#1 text1 I am very happy and feeling great.
#2 text2      I am very sad and feeling low

library(tidytext)
library(dplyr)

data(stop_words)
afinn <- get_sentiments("afinn")

unnest_tokens(res, input = text, output = word) %>%
count(url, word) %>%
filter(!word %in% stop_words$word) %>%
inner_join(afinn, by = "word")

#    url    word     n score
#  <chr>   <chr> <int> <int>
#1 text1 feeling     1     1
#2 text1   happy     1     3
#3 text2 feeling     1     1
#4 text2     sad     1    -2

x你在找这样的东西吗？当您想使用tidytext软件包处理情绪分析时，需要使用unnest_tokens（）
将每个字符串中的单词分开。这个函数可以做的不仅仅是将文本分割成单词。如果您想稍后查看该函数。一旦每行有一个单词，就可以使用count（）
计算每个单词在每个文本中出现的次数。然后，您要删除停止词。tidytext包中有数据，因此可以调用它。最后，你需要有情绪信息。在这里，我选择了AFINN，但如果你愿意，你可以选择另一家。我希望这对你有帮助
x <- c( "text1" , "text2")
y <- c( "I am very happy and feeling great." , "I am very sad and feeling low")
res <- data.frame( "url" = x, "text" = y, stringsAsFactors = F)

#    url                               text
#1 text1 I am very happy and feeling great.
#2 text2      I am very sad and feeling low

library(tidytext)
library(dplyr)

data(stop_words)
afinn <- get_sentiments("afinn")

unnest_tokens(res, input = text, output = word) %>%
count(url, word) %>%
filter(!word %in% stop_words$word) %>%
inner_join(afinn, by = "word")

#    url    word     n score
#  <chr>   <chr> <int> <int>
#1 text1 feeling     1     1
#2 text1   happy     1     3
#3 text2 feeling     1     1
#4 text2     sad     1    -2

x为什么需要将其转换为TIBLE？换句话说，你的标题似乎并不代表真正的问题。似乎你只需要一个词可以每个网址。我认为一种可能的tibbliverse方法可以是res%>%groupby（url）%%>%transform（text=strsplit（text，“，fixed=TRUE））%%>%unest（）%%>%count（url，text）
（假设text
是一个字符串而不是一个因子）@davidernburg请查看更新为什么需要将其转换为tibble？换句话说，你的标题似乎并不代表真正的问题。似乎你只需要一个词可以每个网址。我认为一种可能的tibbliverse方法可以是res%>%groupby（url）%%>%transform（text=strsplit（text，“，fixed=TRUE））%%>%unest（）%%>%count（url，text）
（假设text是一个字符串而不是一个因子）@davidernburg请查看更新
x <- c( "test1" , "test2")
y <- c( "this is test text1" , "this is test text2")
res <- data.frame( "url" = x, "text" = y)
res

res %>%
group_by(url) %>%
transform(text = strsplit(text, " ", fixed = TRUE)) %>%
unnest() %>%
count(url, text) 

Error in strsplit(text, " ", fixed = TRUE) : non-character argument

x <- c( "text1" , "text2")
y <- c( "I am very happy and feeling great." , "I am very sad and feeling low")
res <- data.frame( "url" = x, "text" = y, stringsAsFactors = F)

#    url                               text
#1 text1 I am very happy and feeling great.
#2 text2      I am very sad and feeling low

library(tidytext)
library(dplyr)

data(stop_words)
afinn <- get_sentiments("afinn")

unnest_tokens(res, input = text, output = word) %>%
count(url, word) %>%
filter(!word %in% stop_words$word) %>%
inner_join(afinn, by = "word")

#    url    word     n score
#  <chr>   <chr> <int> <int>
#1 text1 feeling     1     1
#2 text1   happy     1     3
#3 text2 feeling     1     1
#4 text2     sad     1    -2