R 单个项上的多个共生簇

R 单个项上的多个共生簇,r,nlp,cluster-analysis,quanteda,R,Nlp,Cluster Analysis,Quanteda,我有一个语料库,其中一个关键术语至少出现一次。根据这个,我制作了看起来很像的fcm txts <- c("a a a b b c", "a a c e", "a c b e f g", "e d j b", "b g k l", "b a a g l", "e c b j k l", "b g w m") total <- fcm(txts, context = "document", count = "frequency") Feature co-occurrence matri

我有一个语料库,其中一个关键术语至少出现一次。根据这个,我制作了看起来很像的fcm

txts <- c("a a a b b c", "a a c e", "a c b e f g", "e d j b", "b g k l", "b a a g l", "e c b j k l", "b g w m")
total <- fcm(txts, context = "document", count = "frequency")

Feature co-occurrence matrix of: 12 by 12 features.
12 x 12 sparse Matrix of class "fcm"
    features
features a b c e f g d j k l w m
   a 5 9 6 3 1 3 0 0 0 2 0 0
   b 0 1 4 3 1 4 1 2 2 3 1 1
   c 0 0 0 3 1 1 0 1 1 1 0 0
   e 0 0 0 0 1 1 1 2 1 1 0 0
   f 0 0 0 0 0 1 0 0 0 0 0 0
   g 0 0 0 0 0 0 0 0 1 2 1 1
   d 0 0 0 0 0 0 0 1 0 0 0 0
   j 0 0 0 0 0 0 0 0 1 1 0 0
   k 0 0 0 0 0 0 0 0 0 2 0 0
   l 0 0 0 0 0 0 0 0 0 0 0 0
   w 0 0 0 0 0 0 0 0 0 0 0 1
   m 0 0 0 0 0 0 0 0 0 0 0 0
我的目标是可视化关键术语()周围的集群,并从中创建术语列表


在我的搜索中,我也偶然发现了cooccurNet软件包,但我不知道如何熟练使用它

quanteda有
textstat\u simil()
,它返回一个
dist
对象进行分层聚类。此函数仅接受DFM,但可以使用
as.DFM()
将FCM转换为对象

require(quanteda)
txt <- c("a a a b b c", "a a c e", "a c b e f g", "e d j b", "b g k l", "b a a g l", "e c b j k l", "b g w m")
dmt <- dfm(txt)
# dmt <- dfm_trim(dmt, min_termfreq = 10) # you might need this to reduce the size of fcm
fmt <- fcm(dmt, context = "document")

dist <- textstat_simil(as.dfm(fmt), margin = "features")
tree <- hclust(dist)
cutree(tree, 2)
require(quanteda)
文本
require(quanteda)
txt <- c("a a a b b c", "a a c e", "a c b e f g", "e d j b", "b g k l", "b a a g l", "e c b j k l", "b g w m")
dmt <- dfm(txt)
# dmt <- dfm_trim(dmt, min_termfreq = 10) # you might need this to reduce the size of fcm
fmt <- fcm(dmt, context = "document")

dist <- textstat_simil(as.dfm(fmt), margin = "features")
tree <- hclust(dist)
cutree(tree, 2)