Java Mallet输出主题权重0.0或1.0,两者之间没有任何内容
所以用mallet的API创建了一个小程序。但是,我不了解最终重量输出 程序运行时,会向每个主题输出合理的权重(见下文):Java Mallet输出主题权重0.0或1.0,两者之间没有任何内容,java,nlp,topic-modeling,mallet,Java,Nlp,Topic Modeling,Mallet,所以用mallet的API创建了一个小程序。但是,我不了解最终重量输出 程序运行时,会向每个主题输出合理的权重(见下文): 除标记为1的权重外,所有权重均标记为0。有人能解释一下这里发生了什么吗?您所指的代码是打印出第一个文档的主题分布,几乎100%分配给主题19 看起来这个集合非常小(3万字),文档也相当大(高达5千字)。如果主题多于文档,则模型可以通过将每个文档放在自己的主题中来最大化其目标 你会从更多的文档中得到更好的结果,并且可能需要考虑将文档分解成更小的块。当每个片段足够短,可以合理地
除标记为1的权重外,所有权重均标记为0。有人能解释一下这里发生了什么吗?您所指的代码是打印出第一个文档的主题分布,几乎100%分配给主题19 看起来这个集合非常小(3万字),文档也相当大(高达5千字)。如果主题多于文档,则模型可以通过将每个文档放在自己的主题中来最大化其目标
<>你会从更多的文档中得到更好的结果,并且可能需要考虑将文档分解成更小的块。当每个片段足够短,可以合理地假设其具有同质的主题组合时,LDA效果最佳。换句话说,你不会期望片段的开头与片段的结尾有所不同。200-500字是一个典型的范围。300000总代币也可能是您期望获得好结果的最低数量。是的,就是这样。起初,我只是把所有东西都放在一个文档中进行测试。现在将按段落分开。
Mallet LDA: 20 topics, 5 topic bits, 11111 topic mask
max tokens: 5179
total tokens: 31712
<10> LL/token: -7,88809
<20> LL/token: -7,54327
<30> LL/token: -7,44727
<40> LL/token: -7,3755
0 0,5 parses files browser creates selects docking entity
1 0,5 boolean listener handles enabled directory mouse lines
2 0,5 text area selected inserts creates deletes user
3 0,5 int line offset caret screen moves end
4 0,5 creates node container widget namespace block grid
5 0,5 selection key event processes shows word start
6 0,5 boolean search index indent hyper bundle dialog
7 0,5 string element adds starts ends reader map
8 0,5 handles changed message properties mode content loads
9 0,5 creates fold plugin list marker model handler
10 0,5 action set invokes edit creates char token
11 0,5 pane option saves inits error save creates
12 0,5 component adds size layout removes dockable window
13 0,5 converts type view tostring rule parser closes
14 0,5 buffer update updates handles status invalidates byte
15 0,5 evals creates menu callstack eval inits document
16 0,5 class manager path url bsh impl chunk
17 0,5 handles variable expression color property primitive icon
18 0,5 file creates vfs request literal parent runs
19 0,5 string parse editor.getexpansion preferredlayoutsize(parent preprocesskeyevent startlinecomment getstringproperty
[...]
0 0,07447 parses string files entity selected decl lists
1 0,09965 handles boolean listener adds mouse enabled drag
2 0,09124 text area selected selects user input int
3 0,14501 int line offset screen start count end
4 0,07821 node creates container widget closes namespace grid
5 0,05882 key event selection processes viewer extends handles
6 0,16431 boolean indent list index equals updates modifiers
7 0,08873 element string starts ends adds document map
8 0,14141 handles changed message properties mode content loads
9 0,12078 fold creates plugin marker model handler list
10 0,11112 action creates invokes edit set token stream
11 0,11896 option pane inits saves view creates color
12 0,11379 component layout size adds dockable window removes
13 0,11022 string converts tostring type char marks segment
14 0,10636 buffer update handles updates byte status edit
15 0,11183 evals creates menu callstack error eval reader
16 0,09098 class path url manager impl classes loader
17 0,09077 handles variable expression property creates bsh primitive
18 0,12605 file string search vfs dialog creates literal
19 0,02491 string parse setvalueat disposedockablewindow getpreviousbuffer buffered rewinds
[beta: 0,02113]
<500> LL/token: -6,90397
Total time: 16 seconds
0 0.000 parses (115) string (90) files (53) entity (33) selected (29)
1 0.000 handles (110) boolean (82) listener (71) mouse (48) adds (44)
2 0.000 text (230) area (126) selected (61) user (28) selects (27)
3 0.000 int (588) line (295) offset (67) screen (54) start (49)
4 0.000 node (71) creates (48) widget (34) closes (33) container (32)
5 0.000 key (130) event (110) selection (81) processes (67) viewer (17)
6 0.000 boolean (586) indent (55) index (51) list (51) updates (23)
7 0.000 element (99) string (76) starts (48) ends (46) adds (43)
8 0.000 handles (464) changed (153) message (150) properties (96) mode (96)
9 0.000 fold (108) creates (107) plugin (97) marker (56) model (55)
10 0.000 action (132) creates (89) invokes (64) set (61) edit (58)
11 0.000 option (119) pane (118) inits (114) saves (77) view (68)
12 0.000 component (128) adds (89) layout (87) size (76) dockable (63)
13 0.000 string (488) converts (114) tostring (65) type (41) char (30)
14 0.000 buffer (289) update (89) handles (71) updates (49) byte (30)
15 0.000 evals (157) creates (121) menu (102) callstack (92) error (66)
16 0.000 class (243) path (76) url (47) manager (42) impl (28)
17 0.000 handles (134) variable (79) expression (73) creates (47) property (46)
18 0.000 file (126) string (111) search (89) vfs (64) int (52)
19 1.000 string (2705) parse (2605) parser.reinittokeninput(in (1) image (1) candidates[i (1)
0 0.930564405720232