在使用group_by进行分组后，是否有方法对变量的级别进行重新排序？_R_Dplyr

在使用group_by进行分组后，是否有方法对变量的级别进行重新排序？

在使用group_by进行分组后，是否有方法对变量的级别进行重新排序？,r,dplyr,R,Dplyr,我试图重现“使用R进行文本挖掘”一书第4.1.3节中的图4.3 本节试图通过四个关键否定词“不”、“不”、“从不”和“没有”对所有的大字进行分组，并为每组绘制对本书的情感贡献（仅通过否定词后面的词，这意味着错误的贡献）因此，我将把单词绘制为y轴，贡献绘制为x轴，为了使绘制看起来更美观，我还希望每个组的条形图按降序排列。所以和前面的章节一样，我使用贡献值对单词的级别进行了重新排序但这里的问题是，在每一组下，这些词都有不同的贡献。例如，在第一组中，“快乐”比“希望”更明显，因此它的贡献更大，但

我试图重现“使用R进行文本挖掘”一书第4.1.3节中的图4.3

本节试图通过四个关键否定词“不”、“不”、“从不”和“没有”对所有的大字进行分组，并为每组绘制对本书的情感贡献（仅通过否定词后面的词，这意味着错误的贡献）

因此，我将把单词绘制为y轴，贡献绘制为x轴，为了使绘制看起来更美观，我还希望每个组的条形图按降序排列。所以和前面的章节一样，我使用贡献值对单词的级别进行了重新排序

但这里的问题是，在每一组下，这些词都有不同的贡献。例如，在第一组中，“快乐”比“希望”更明显，因此它的贡献更大，但在第二组中，情况正好相反。更糟糕的是，当数据帧被分组（word1）时，我不能进行

变异（word2=重新排序（word2，贡献））

这本书能够很好地绘制出它应该是的情节，所以我想有一些方法可以根据不同的组重新排列级别

下面是代码，在

#准备绘图数据之前的任何内容都取自本书，因此不应该有任何问题，从那里开始，代码由我负责
library(dplyr)
library(tidytext)
library(janeaustenr)
library(tidyr)

#getting bigrams

austen_bigrams <- austen_books() %>%
  unnest_tokens(bigram, text, token = "ngrams", n = 2)  
bigrams_separated <- austen_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ")  

#four negation words to look at

negation_words <- c("not", "no", "never", "without")
AFINN <- get_sentiments("afinn")

#get the sentiment score of words preceded by the four negation words

negated_words <- bigrams_separated %>%
  filter(word1 %in% negation_words) %>%  #word1 as negation words
  inner_join(AFINN, by = c(word2 = "word")) %>%  #word2 as the word following negation words
  count(word1, word2, score, sort = TRUE) %>%
  ungroup()

#preparing the data for plotting

bigrams_plot <- bigrams_separated %>%
  filter(word1 %in% negation_words) %>% 
  inner_join(AFINN, by = c(word2 = "word")) %>%  #getting sentiment score
  count(word1, word2, score, sort = TRUE) %>%
  mutate(contribution = n * score) %>%  #defining contribution as n*score
  group_by(word1) %>%  #group by negation words
  top_n(12,abs(contribution)) %>%
  arrange(desc(abs(contribution))) %>%
  ungroup() %>%
  mutate(word2 = reorder(word2, contribution)) 

#plotting sentiment score contribution grouped by the four negation words

ggplot(bigrams_plot, aes(word2, n * score, fill = n * score > 0)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~word1, ncol = 2, scales = "free") +
  coord_flip()

库（dplyr）
图书馆（tidytext）
图书馆（珍妮奥斯汀）
图书馆（tidyr）
#获得大人物
奥斯汀·比格拉姆斯%
unnest_标记（bigram，text，token=“ngrams”，n=2）
bigrams_分离%
单独（bigram，c（“单词1”，“单词2”），sep=“”）
#要看的四个否定词
否定词%#单词2作为否定词后的单词
计数（字1、字2、分数、排序=真）%>%
解组（）
#准备用于打印的数据
bigrams_地块%
筛选器（字1%在%negation\u字中）%%>%
内部连接（AFINN，by=c（word2=“word”））%>%#获得情绪分数
计数（字1、字2、分数、排序=真）%>%
变异（贡献=n*分数）%>%#将贡献定义为n*分数
按（单词1）%>%分组#按否定词分组
排名靠前的（12，abs（贡献））%>%
排列（描述（绝对值（贡献）））%>%
解组（）%>%
变异（word2=重新排序（word2，贡献））
#根据四个否定词分组绘制情绪分数贡献
ggplot（bigrams_plot，aes（单词2，n*分数，填充=n*分数>0））+
几何坐标（show.legend=FALSE）+
面_包裹（~word1，ncol=2，scales=“free”）+
coord_flip（）

我在下面创建了一个更简单的版本：
v1_grp <- c(rep('A',10),rep('B',10))
v2_Aterm <- sample(letters[1:10],10,replace=F)
v2_Bterm <- sample(letters[1:10],10,replace=F)
v3_score <- sample(-10:10,20,replace=T)

data1 <- data_frame(grp=v1_grp,term=c(v2_Aterm,v2_Bterm),score=v3_score)

dataplot <- data1 %>%
  arrange(desc(score)) %>%
  mutate(term=reorder(term,score)) 

ggplot(dataplot, aes(term,score,fill=score>0)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~grp, ncol = 2, scales = "free") +
  coord_flip()

v1\u grp（改编自）
dataplot%
排列（grp，分数）%>%
变异（顺序=行号（）
ggplot（数据图，aes（顺序，分数，填充=分数>0））+
几何坐标（show.legend=FALSE）+
镶嵌面包裹（~grp，ncol=2，scales=“free”）+
coord_flip（）+
比例x连续(
breaks=dataplot$order，
标签=dataplot$术语，
expand=c（0,0）
)

方法是创建一个新列，将y轴术语和镶嵌面术语与粘贴相结合，然后您可以对其重新排序并将其放置在y轴上（使用scales=“free_y”
），但使用原始列中的相应值作为轴标签。如果您创建一个更简单的示例（比如说，<4个方面，<6个级别，不需要我安装一堆我没有的软件包）我会很高兴地写一个答案。@Gregor谢谢你的建议，我在下面添加了一个更简单的版本。嘿，只是一个小问题，如果不同的组没有太多的共同术语，情节中会有很多空条，有没有办法避免呢？我不太明白。上面答案中的空条是分数为零的地方。它仍然是如果我们将示例数据切换到v2_Aterm啊，我明白了，我设法解决了它，我以前可能有一部分代码错了。谢谢。
dataplot <- data1 %>%
  arrange(grp, score) %>%
  mutate(order = row_number())

ggplot(dataplot, aes(order,score,fill=score>0)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~grp, ncol = 2, scales = "free") +
  coord_flip() +
  scale_x_continuous(
    breaks = dataplot$order,
    labels = dataplot$term,
    expand = c(0,0)
  )