Python 如何解决；ValueError:使用序列“设置数组元素”；_Python_Dataframe_Nlp_Tf Idf_Lda

Python 如何解决；ValueError:使用序列“设置数组元素”；

python dataframe nlp

Python 如何解决；ValueError:使用序列“设置数组元素”；,python,dataframe,nlp,tf-idf,lda,Python,Dataframe,Nlp,Tf Idf,Lda,这是我的数据集的一个例子 d = {'TEXT': ['History: A 59 year old female, was sent to R/O lung nodule. Findings: Lungs and airway: The study reveals a speculated nodule with pleural tagging at anterior basal segment of LLL, measured 1.9x1.4x2.0 cm in size. Pleu

这是我的数据集的一个例子

d = {'TEXT': ['History: A 59  year  old female, was sent to R/O lung nodule. Findings:  Lungs and airway:  The study reveals a speculated nodule with pleural tagging at anterior basal segment of LLL, measured 1.9x1.4x2.0 cm in size. Pleural tagging is seen. Partial encasement of subsegmental bronchi is seen.  CA lung is considered.','History: A 59  year  old woman with history of lung cancer S/P left lower lobectomy with close to pleural margin and left adrenal nodule , was sent for evaluation before post  operative RT. Findings: Comparison is made to the prior study on 03/02/2009. Chest:   The study reveals evidence of left lower lobectomy with compensatory hyperinflation of the LUL.']}
df2 = pd.DataFrame(data=d)

我想为每个句子的上下文生成实现潜在的Diritchlet分配（LDA）。我已经为它单独训练了我的模型，并希望对这些数据进行测试

为了达到LDA，我将文本标记为句子，因为我有兴趣用主题对每个句子进行分类。在句子标记化之后，我实现了TFIDF，然后实现了LDA。当到达LDA时，我得到了这个错误。下面是我的代码

df2["sent_token"] = df2["TEXT"].apply(nltk.sent_tokenize)
vectoriser = TfidfVectorizer(tokenizer=identity_tokenizer,stop_words='english',lowercase=False)
df2['tfidf1'] = vectoriser.fit_transform(df2['sent_token'])
lda = LatentDirichletAllocation(n_components =5)
df2['tfidf_lda']= lda.fit_transform(df2['tfidf1'])

这里是我得到这个错误的地方“ValueError:设置一个数组元素和一个序列”。在经历类似的错误时，我发现这可能是因为行中有不同数量的句子，导致不同的长度或序列。但这就是我的异质性，我不确定问题出在哪里。请帮忙

我无法调试。没有下划线数据的线索。你能提供一些虚假的数据让它更容易吗？请展示你为标记器编写的函数标识\u标记器