R 单个项上的多个共生簇_R_Nlp_Cluster Analysis_Quanteda

R 单个项上的多个共生簇

r nlp

R 单个项上的多个共生簇,r,nlp,cluster-analysis,quanteda,R,Nlp,Cluster Analysis,Quanteda,我有一个语料库，其中一个关键术语至少出现一次。根据这个，我制作了看起来很像的fcm txts <- c("a a a b b c", "a a c e", "a c b e f g", "e d j b", "b g k l", "b a a g l", "e c b j k l", "b g w m") total <- fcm(txts, context = "document", count = "frequency") Feature co-occurrence matri

我有一个语料库，其中一个关键术语至少出现一次。根据这个，我制作了看起来很像的fcm

txts <- c("a a a b b c", "a a c e", "a c b e f g", "e d j b", "b g k l", "b a a g l", "e c b j k l", "b g w m")
total <- fcm(txts, context = "document", count = "frequency")

Feature co-occurrence matrix of: 12 by 12 features.
12 x 12 sparse Matrix of class "fcm"
    features
features a b c e f g d j k l w m
   a 5 9 6 3 1 3 0 0 0 2 0 0
   b 0 1 4 3 1 4 1 2 2 3 1 1
   c 0 0 0 3 1 1 0 1 1 1 0 0
   e 0 0 0 0 1 1 1 2 1 1 0 0
   f 0 0 0 0 0 1 0 0 0 0 0 0
   g 0 0 0 0 0 0 0 0 1 2 1 1
   d 0 0 0 0 0 0 0 1 0 0 0 0
   j 0 0 0 0 0 0 0 0 1 1 0 0
   k 0 0 0 0 0 0 0 0 0 2 0 0
   l 0 0 0 0 0 0 0 0 0 0 0 0
   w 0 0 0 0 0 0 0 0 0 0 0 1
   m 0 0 0 0 0 0 0 0 0 0 0 0

我的目标是可视化关键术语（）周围的集群，并从中创建术语列表

在我的搜索中，我也偶然发现了cooccurNet软件包，但我不知道如何熟练使用它

quanteda有

textstat\u simil（）

，它返回一个

dist

对象进行分层聚类。此函数仅接受DFM，但可以使用

as.DFM（）

将FCM转换为对象

require(quanteda)
txt <- c("a a a b b c", "a a c e", "a c b e f g", "e d j b", "b g k l", "b a a g l", "e c b j k l", "b g w m")
dmt <- dfm(txt)
# dmt <- dfm_trim(dmt, min_termfreq = 10) # you might need this to reduce the size of fcm
fmt <- fcm(dmt, context = "document")

dist <- textstat_simil(as.dfm(fmt), margin = "features")
tree <- hclust(dist)
cutree(tree, 2)

require（quanteda）
文本
require(quanteda)
txt <- c("a a a b b c", "a a c e", "a c b e f g", "e d j b", "b g k l", "b a a g l", "e c b j k l", "b g w m")
dmt <- dfm(txt)
# dmt <- dfm_trim(dmt, min_termfreq = 10) # you might need this to reduce the size of fcm
fmt <- fcm(dmt, context = "document")

dist <- textstat_simil(as.dfm(fmt), margin = "features")
tree <- hclust(dist)
cutree(tree, 2)