Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/314.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
包含20多个主题的Python LDA Gensim模型无法正确打印_Python_Gensim_Lda - Fatal编程技术网

包含20多个主题的Python LDA Gensim模型无法正确打印

包含20多个主题的Python LDA Gensim模型无法正确打印,python,gensim,lda,Python,Gensim,Lda,使用Gensim软件包(LDA和Mallet),我注意到当我创建一个包含20多个主题的模型时,我使用print_topics函数,它最多会打印20个主题(请注意,不是前20个主题,而是任意20个主题),并且它们会乱序 所以我的问题是,我怎样才能把所有的主题都打印出来?我不确定这是一个错误还是我这边的问题。我回顾了我的LDA模型库(超过5000个,不同的数据源),并注意到在所有主题超过20个的模型库中都会发生这种情况 下面是带有输出的示例代码。在输出中,您将看到主题没有排序(应该排序),并且缺少主

使用Gensim软件包(LDA和Mallet),我注意到当我创建一个包含20多个主题的模型时,我使用print_topics函数,它最多会打印20个主题(请注意,不是前20个主题,而是任意20个主题),并且它们会乱序

所以我的问题是,我怎样才能把所有的主题都打印出来?我不确定这是一个错误还是我这边的问题。我回顾了我的LDA模型库(超过5000个,不同的数据源),并注意到在所有主题超过20个的模型库中都会发生这种情况

下面是带有输出的示例代码。在输出中,您将看到主题没有排序(应该排序),并且缺少主题,例如主题3

lda_model = gensim.models.ldamodel.LdaModel(corpus=jr_dict_corpus,
                                           id2word=jr_dict,
                                           num_topics=25, 
                                           random_state=100,
                                           update_every=1,
                                           chunksize=100,
                                           passes=10,
                                           alpha='auto',
                                           per_word_topics=True)

pprint(lda_model.print_topics())
#note, if the model contained 20 topics, the topics would be listed in order 0-19
[(21,
  '0.001*"commitment" + 0.001*"study" + 0.001*"evolve" + 0.001*"outlook" + '
  '0.001*"value" + 0.001*"people" + 0.001*"individual" + 0.001*"client" + '
  '0.001*"structure" + 0.001*"proposal"'),
 (18,
  '0.001*"self" + 0.001*"insurance" + 0.001*"need" + 0.001*"trend" + '
  '0.001*"statistic" + 0.001*"propose" + 0.001*"analysis" + 0.001*"perform" + '
  '0.001*"impact" + 0.001*"awareness"'),
 (2,
  '0.001*"link" + 0.001*"task" + 0.001*"collegiate" + 0.001*"universitie" + '
  '0.001*"banking" + 0.001*"origination" + 0.001*"security" + 0.001*"standard" '
  '+ 0.001*"qualifications_bachelor" + 0.001*"greenfield"'),
 (11,
  '0.024*"collegiate" + 0.016*"interpersonal" + 0.016*"prepare" + '
  '0.016*"invite" + 0.016*"aspect" + 0.016*"college" + 0.016*"statistic" + '
  '0.016*"continent" + 0.016*"structure" + 0.016*"project"'),
 (10,
  '0.049*"enjoy" + 0.049*"ambiguity" + 0.017*"accordance" + 0.017*"liberalize" '
  '+ 0.017*"developing" + 0.017*"application" + 0.017*"vacancie" + '
  '0.017*"service" + 0.017*"initiative" + 0.017*"discontinuing"'),
 (20,
  '0.028*"negotiation" + 0.028*"desk" + 0.018*"enhance" + 0.018*"engage" + '
  '0.018*"discussion" + 0.018*"ability" + 0.018*"depth" + 0.018*"derive" + '
  '0.018*"enjoy" + 0.018*"balance"'),
 (12,
  '0.036*"individual" + 0.024*"validate" + 0.018*"greenfield" + '
  '0.018*"capability" + 0.018*"coordinate" + 0.018*"create" + '
  '0.018*"programming" + 0.018*"safety" + 0.010*"evaluation" + '
  '0.002*"reliability"'),
 (1,
  '0.028*"negotiation" + 0.021*"responsibility" + 0.014*"master" + '
  '0.014*"mind" + 0.014*"experience" + 0.014*"worker" + 0.014*"ability" + '
  '0.007*"summary" + 0.007*"proposal" + 0.007*"alert"'),
 (23,
  '0.043*"banking" + 0.026*"origination" + 0.026*"round" + 0.026*"credibility" '
  '+ 0.026*"entity" + 0.018*"standard" + 0.017*"range" + 0.017*"pension" + '
  '0.017*"adapt" + 0.017*"information"'),
 (13,
  '0.034*"priority" + 0.034*"reconciliation" + 0.034*"purchaser" + '
  '0.023*"reporting" + 0.023*"offer" + 0.023*"investor" + 0.023*"share" + '
  '0.023*"region" + 0.023*"service" + 0.023*"manipulate"'),
 (22,
  '0.017*"analyst" + 0.017*"modelling" + 0.016*"producer" + 0.016*"return" + '
  '0.016*"self" + 0.009*"scope" + 0.008*"mind" + 0.008*"need" + 0.008*"detail" '
  '+ 0.008*"statistic"'),
 (9,
  '0.021*"decision" + 0.014*"invite" + 0.014*"balance" + 0.014*"commercialize" '
  '+ 0.014*"transform" + 0.014*"manage" + 0.014*"optionality" + '
  '0.014*"problem_solving" + 0.014*"fuel" + 0.014*"stay"'),
 (7,
  '0.032*"commitment" + 0.032*"study" + 0.016*"impact" + 0.016*"outlook" + '
  '0.011*"operation" + 0.011*"expand" + 0.011*"exchange" + 0.011*"management" '
  '+ 0.011*"conde" + 0.011*"evolve"'),
 (15,
  '0.032*"agility" + 0.019*"feasibility" + 0.019*"self" + 0.014*"deploy" + '
  '0.014*"define" + 0.013*"investment" + 0.013*"option" + 0.013*"control" + '
  '0.013*"action" + 0.013*"incubation"'),
 (5,
  '0.020*"desk" + 0.018*"agility" + 0.016*"vender" + 0.016*"coordinate" + '
  '0.016*"committee" + 0.012*"acquisition" + 0.012*"target" + '
  '0.012*"counterparty" + 0.012*"approval" + 0.012*"trend"'),
 (17,
  '0.022*"option" + 0.017*"working" + 0.017*"niche" + 0.011*"business" + '
  '0.011*"constrain" + 0.011*"meeting" + 0.011*"correspond" + 0.011*"exposure" '
  '+ 0.011*"element" + 0.011*"face"'),
 (0,
  '0.025*"expertise" + 0.025*"banking" + 0.021*"universitie" + '
  '0.017*"spreadsheet" + 0.013*"negotiation" + 0.013*"shipment" + '
  '0.013*"arise" + 0.013*"billing" + 0.013*"assistance" + 0.013*"sector"'),
 (4,
  '0.024*"provide" + 0.017*"consider" + 0.017*"allow" + 0.015*"outlook" + '
  '0.015*"value" + 0.015*"contract" + 0.012*"study" + 0.012*"technology" + '
  '0.012*"scenario" + 0.012*"indicator"'),
 (6,
  '0.058*"impulse" + 0.027*"shall" + 0.027*"shape" + 0.024*"marketer" + '
  '0.017*"availability" + 0.014*"determine" + 0.014*"load" + '
  '0.014*"constantly_change" + 0.014*"instrument" + 0.014*"interface"'),
 (19,
  '0.042*"task" + 0.038*"tariff" + 0.038*"recommend" + 0.024*"example" + '
  '0.023*"future" + 0.021*"people" + 0.021*"math" + 0.021*"capacity" + '
  '0.021*"spirit" + 0.020*"price"')]
与上述模型相同,但使用20个主题。如您所见,输出是按主题排序的,它包含所有主题

lda_model = gensim.models.ldamodel.LdaModel(corpus=jr_dict_corpus,
                                           id2word=jr_dict,
                                           num_topics=20, 
                                           random_state=100,
                                           update_every=1,
                                           chunksize=100,
                                           passes=10,
                                           alpha='auto',
                                           per_word_topics=True)

pprint(lda_model.print_topics())

[(0,
  '0.031*"enjoy" + 0.031*"ambiguity" + 0.028*"accordance" + 0.016*"statistic" '
  '+ 0.016*"initiative" + 0.016*"service" + 0.016*"liberalize" + '
  '0.016*"application" + 0.011*"community" + 0.011*"identifie"'),
 (1,
  '0.016*"transformation" + 0.016*"negotiation" + 0.016*"community" + '
  '0.016*"clock" + 0.011*"marketer" + 0.011*"desk" + 0.011*"mandate" + '
  '0.011*"closing" + 0.011*"initiative" + 0.011*"experience"'),
 (2,
  '0.026*"priority" + 0.026*"reconciliation" + 0.026*"purchaser" + '
  '0.020*"safety" + 0.020*"region" + 0.020*"query" + 0.020*"share" + '
  '0.020*"manipulate" + 0.020*"ibex" + 0.020*"investor"'),
 (3,
  '0.022*"improve" + 0.021*"committee" + 0.021*"affect" + 0.012*"target" + '
  '0.012*"acquisition" + 0.011*"basis" + 0.011*"profitability" + '
  '0.011*"economic" + 0.011*"natural" + 0.011*"profit"'),
 (4,
  '0.024*"provide" + 0.019*"value" + 0.017*"consider" + 0.017*"allow" + '
  '0.015*"scenario" + 0.015*"outlook" + 0.015*"contract" + 0.014*"forecast" + '
  '0.014*"decision" + 0.012*"indicator"'),
 (5,
  '0.037*"desk" + 0.030*"coordinate" + 0.030*"agility" + 0.030*"vender" + '
  '0.023*"counterparty" + 0.023*"immature_emerge" + 0.023*"metric" + '
  '0.022*"approval" + 0.015*"maximization" + 0.015*"undergraduate"'),
 (6,
  '0.053*"impulse" + 0.025*"shall" + 0.025*"shape" + 0.018*"availability" + '
  '0.018*"marketer" + 0.012*"determine" + 0.012*"language" + '
  '0.012*"monitoring" + 0.012*"integration" + 0.012*"month"'),
 (7,
  '0.026*"commitment" + 0.026*"study" + 0.013*"impact" + 0.013*"outlook" + '
  '0.009*"operation" + 0.009*"management" + 0.009*"expand" + 0.009*"exchange" '
  '+ 0.009*"conde" + 0.009*"balance"'),
 (8,
  '0.057*"insurance" + 0.029*"propose" + 0.028*"rule" + 0.026*"self" + '
  '0.023*"product" + 0.023*"asset" + 0.023*"pricing" + 0.023*"amount" + '
  '0.023*"result" + 0.020*"liquidity"'),
 (9,
  '0.012*"universitie" + 0.012*"need" + 0.012*"statistic" + 0.012*"trend" + '
  '0.008*"invite" + 0.008*"commercialize" + 0.008*"transform" + 0.008*"manage" '
  '+ 0.008*"problem_solving" + 0.008*"optionality"'),
 (10,
  '0.024*"background" + 0.024*"curve" + 0.020*"allow" + 0.019*"collect" + '
  '0.019*"basis" + 0.017*"accordance" + 0.013*"improve" + 0.013*"datum" + '
  '0.013*"component" + 0.013*"reliability"'),
 (11,
  '0.054*"task" + 0.049*"tariff" + 0.049*"recommend" + 0.031*"future" + '
  '0.027*"spirit" + 0.027*"capacity" + 0.027*"math" + 0.022*"ensure" + '
  '0.022*"profit" + 0.022*"variable_margin"'),
 (12,
  '0.001*"impulse" + 0.001*"availability" + 0.001*"reliability" + '
  '0.001*"shall" + 0.001*"component" + 0.001*"agent" + 0.001*"marketer" + '
  '0.001*"shape" + 0.001*"assisting" + 0.001*"supply"'),
 (13,
  '0.021*"region" + 0.016*"greenfield" + 0.016*"collegiate" + 0.011*"transfer" '
  '+ 0.011*"remuneration" + 0.011*"organization" + 0.011*"structure" + '
  '0.011*"continent" + 0.011*"project" + 0.011*"prepare"'),
 (14,
  '0.033*"originator" + 0.025*"vender" + 0.025*"expertise" + 0.025*"banking" + '
  '0.019*"evolve" + 0.017*"management" + 0.017*"market" + 0.017*"site" + '
  '0.012*"component" + 0.012*"discontinuing"'),
 (15,
  '0.027*"agility" + 0.022*"mind" + 0.022*"negotiation" + 0.011*"deploy" + '
  '0.011*"define" + 0.011*"ecosystem" + 0.011*"control" + 0.011*"lead" + '
  '0.011*"industry" + 0.011*"option"'),
 (16,
  '0.001*"region" + 0.001*"master" + 0.001*"orginiation" + 0.001*"greenfield" '
  '+ 0.001*"agent" + 0.001*"identifie" + 0.001*"remuneration" + 0.001*"mark" + '
  '0.001*"reviewing" + 0.001*"closing"'),
 (17,
  '0.030*"banking" + 0.018*"option" + 0.018*"round" + 0.018*"credibility" + '
  '0.018*"origination" + 0.018*"entity" + 0.016*"working" + 0.015*"niche" + '
  '0.015*"standard" + 0.012*"coordinate"'),
 (18,
  '0.027*"negotiation" + 0.018*"reporting" + 0.018*"perform" + 0.018*"world" + '
  '0.015*"offer" + 0.015*"manipulate" + 0.011*"query" + 0.010*"control" + '
  '0.010*"working" + 0.009*"self"'),
 (19,
  '0.047*"example" + 0.039*"people" + 0.039*"price" + 0.039*"excel" + '
  '0.039*"excellent" + 0.038*"base" + 0.031*"office" + 0.031*"optimizing" + '
  '0.031*"participate" + 0.031*"package"')]

打印主题的默认主题数为20。您必须使用num_topics参数来包括Gensim文档––中所述的20…

以上的主题&它指出您可以专门使用
num_topics=-1
来列出所有主题。
print(lda_model.print_topics(num_topics=25, num_words=10))