Python 3.x 与'相关的错误；列表'；没有属性的对象'；下'；_Python 3.x_Cluster Analysis_Attributeerror_Tfidfvectorizer

Python 3.x 与'相关的错误；列表'；没有属性的对象'；下'；

python-3.x

Python 3.x 与'相关的错误；列表'；没有属性的对象'；下'；,python-3.x,cluster-analysis,attributeerror,tfidfvectorizer,Python 3.x,Cluster Analysis,Attributeerror,Tfidfvectorizer,我正在做多个集群项目。在我当前的集群项目中，我得到了一个与我的tfidf_矢量器代码相关的错误以下是我导入的文档： description_1 = open('description1.txt', encoding="utf8").read().lower().split('\n') description_2 = open('description2.txt', encoding="utf8").read().lower().split('\n') description_3 = ope

我正在做多个集群项目。在我当前的集群项目中，我得到了一个与我的tfidf_矢量器代码相关的错误

以下是我导入的文档：

description_1 = open('description1.txt', 
encoding="utf8").read().lower().split('\n')
description_2 = open('description2.txt', 
encoding="utf8").read().lower().split('\n')
description_3 = open('description3.txt', 
encoding="utf8").read().lower().split('\n')
description_4 = open('description4.txt', 
encoding="utf8").read().lower().split('\n')
description_5 = open('description5.txt', 
encoding="utf8").read().lower().split('\n')
description_6 = open('description6.txt', 
encoding="utf8").read().lower().split('\n')
description_7 = open('description7.txt', 
encoding="utf8").read().lower().split('\n')

descriptions_on = (description_1, description_2, description_3, 
description_4, description_5, description_6, description_7)

descriptions = []

for i in range(len(descriptions_on)):
    item = descriptions_on[i]
    descriptions.append(item)

AttributeError                            Traceback (most recent call 
last)
<timed exec> in <module>

D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
fit_transform(self, raw_documents, y)
   1611         """
   1612         self._check_params()
->     1613         X = super(TfidfVectorizer, 
self).fit_transform(raw_documents)
   1614         self._tfidf.fit(X)
   1615         # X is already a transformed view of raw_documents so

 D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
fit_transform(self, raw_documents, y)
   1029 
   1030         vocabulary, X = self._count_vocab(raw_documents,
 ->1031                                           self.fixed_vocabulary_)
   1032 
   1033         if self.binary:

D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
_count_vocab(self, raw_documents, fixed_vocab)
    941         for doc in raw_documents:
    942             feature_counter = {}
--> 943             for feature in analyze(doc):
    944                 try:
    945                     feature_idx = vocabulary[feature]

D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
<lambda>(doc)
    327                                                tokenize)
    328             return lambda doc: self._word_ngrams(
--> 329                 tokenize(preprocess(self.decode(doc))), 
stop_words)
    330 
    331         else:

 D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
<lambda>(x)
    255 
    256         if self.lowercase:
--> 257             return lambda x: strip_accents(x.lower())
    258         else:
    259             return strip_accents

AttributeError: 'list' object has no attribute 'lower'

print(descriptions)

然后我将这些文档组合起来：

description_1 = open('description1.txt', 
encoding="utf8").read().lower().split('\n')
description_2 = open('description2.txt', 
encoding="utf8").read().lower().split('\n')
description_3 = open('description3.txt', 
encoding="utf8").read().lower().split('\n')
description_4 = open('description4.txt', 
encoding="utf8").read().lower().split('\n')
description_5 = open('description5.txt', 
encoding="utf8").read().lower().split('\n')
description_6 = open('description6.txt', 
encoding="utf8").read().lower().split('\n')
description_7 = open('description7.txt', 
encoding="utf8").read().lower().split('\n')

descriptions_on = (description_1, description_2, description_3, 
description_4, description_5, description_6, description_7)

descriptions = []

for i in range(len(descriptions_on)):
    item = descriptions_on[i]
    descriptions.append(item)

AttributeError                            Traceback (most recent call 
last)
<timed exec> in <module>

D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
fit_transform(self, raw_documents, y)
   1611         """
   1612         self._check_params()
->     1613         X = super(TfidfVectorizer, 
self).fit_transform(raw_documents)
   1614         self._tfidf.fit(X)
   1615         # X is already a transformed view of raw_documents so

 D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
fit_transform(self, raw_documents, y)
   1029 
   1030         vocabulary, X = self._count_vocab(raw_documents,
 ->1031                                           self.fixed_vocabulary_)
   1032 
   1033         if self.binary:

D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
_count_vocab(self, raw_documents, fixed_vocab)
    941         for doc in raw_documents:
    942             feature_counter = {}
--> 943             for feature in analyze(doc):
    944                 try:
    945                     feature_idx = vocabulary[feature]

D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
<lambda>(doc)
    327                                                tokenize)
    328             return lambda doc: self._word_ngrams(
--> 329                 tokenize(preprocess(self.decode(doc))), 
stop_words)
    330 
    331         else:

 D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
<lambda>(x)
    255 
    256         if self.lowercase:
--> 257             return lambda x: strip_accents(x.lower())
    258         else:
    259             return strip_accents

AttributeError: 'list' object has no attribute 'lower'

print(descriptions)

问题出现在这些代码行的某个地方

from sklearn.feature_extraction.text import TfidfVectorizer
from spacy.lang.fr.stop_words import STOP_WORDS as fr_stop
from spacy.lang.en.stop_words import STOP_WORDS as en_stop
#from warnings import filterwarnings
#filterwarnings('ignore')

final_stopwords_list = list(fr_stop) + list(en_stop)

tfidf_vectorizer = TfidfVectorizer(max_df=0.90, max_features=200000,
                             min_df=0.10, stop_words=final_stopwords_list,
                             use_idf=True, tokenizer=tokenize_and_stem, 
ngram_range=(1,3))


%time tfidf_matrix = tfidf_vectorizer.fit_transform(descriptions)

tokenizer=tokenize_and_-stem与一个函数“tokenize_and_-stem”相关，该函数已创建，但未包含在该问题的代码列表中，因为它不恰当

这是我从上面的代码中得到的错误消息：

description_1 = open('description1.txt', 
encoding="utf8").read().lower().split('\n')
description_2 = open('description2.txt', 
encoding="utf8").read().lower().split('\n')
description_3 = open('description3.txt', 
encoding="utf8").read().lower().split('\n')
description_4 = open('description4.txt', 
encoding="utf8").read().lower().split('\n')
description_5 = open('description5.txt', 
encoding="utf8").read().lower().split('\n')
description_6 = open('description6.txt', 
encoding="utf8").read().lower().split('\n')
description_7 = open('description7.txt', 
encoding="utf8").read().lower().split('\n')

descriptions_on = (description_1, description_2, description_3, 
description_4, description_5, description_6, description_7)

descriptions = []

for i in range(len(descriptions_on)):
    item = descriptions_on[i]
    descriptions.append(item)

AttributeError                            Traceback (most recent call 
last)
<timed exec> in <module>

D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
fit_transform(self, raw_documents, y)
   1611         """
   1612         self._check_params()
->     1613         X = super(TfidfVectorizer, 
self).fit_transform(raw_documents)
   1614         self._tfidf.fit(X)
   1615         # X is already a transformed view of raw_documents so

 D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
fit_transform(self, raw_documents, y)
   1029 
   1030         vocabulary, X = self._count_vocab(raw_documents,
 ->1031                                           self.fixed_vocabulary_)
   1032 
   1033         if self.binary:

D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
_count_vocab(self, raw_documents, fixed_vocab)
    941         for doc in raw_documents:
    942             feature_counter = {}
--> 943             for feature in analyze(doc):
    944                 try:
    945                     feature_idx = vocabulary[feature]

D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
<lambda>(doc)
    327                                                tokenize)
    328             return lambda doc: self._word_ngrams(
--> 329                 tokenize(preprocess(self.decode(doc))), 
stop_words)
    330 
    331         else:

 D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
<lambda>(x)
    255 
    256         if self.lowercase:
--> 257             return lambda x: strip_accents(x.lower())
    258         else:
    259             return strip_accents

AttributeError: 'list' object has no attribute 'lower'

print(descriptions)

结果：

description_1 = open('description1.txt', 
encoding="utf8").read().lower().split('\n')
description_2 = open('description2.txt', 
encoding="utf8").read().lower().split('\n')
description_3 = open('description3.txt', 
encoding="utf8").read().lower().split('\n')
description_4 = open('description4.txt', 
encoding="utf8").read().lower().split('\n')
description_5 = open('description5.txt', 
encoding="utf8").read().lower().split('\n')
description_6 = open('description6.txt', 
encoding="utf8").read().lower().split('\n')
description_7 = open('description7.txt', 
encoding="utf8").read().lower().split('\n')

descriptions_on = (description_1, description_2, description_3, 
description_4, description_5, description_6, description_7)

descriptions = []

for i in range(len(descriptions_on)):
    item = descriptions_on[i]
    descriptions.append(item)

AttributeError                            Traceback (most recent call 
last)
<timed exec> in <module>

D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
fit_transform(self, raw_documents, y)
   1611         """
   1612         self._check_params()
->     1613         X = super(TfidfVectorizer, 
self).fit_transform(raw_documents)
   1614         self._tfidf.fit(X)
   1615         # X is already a transformed view of raw_documents so

 D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
fit_transform(self, raw_documents, y)
   1029 
   1030         vocabulary, X = self._count_vocab(raw_documents,
 ->1031                                           self.fixed_vocabulary_)
   1032 
   1033         if self.binary:

D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
_count_vocab(self, raw_documents, fixed_vocab)
    941         for doc in raw_documents:
    942             feature_counter = {}
--> 943             for feature in analyze(doc):
    944                 try:
    945                     feature_idx = vocabulary[feature]

D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
<lambda>(doc)
    327                                                tokenize)
    328             return lambda doc: self._word_ngrams(
--> 329                 tokenize(preprocess(self.decode(doc))), 
stop_words)
    330 
    331         else:

 D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py in 
<lambda>(x)
    255 
    256         if self.lowercase:
--> 257             return lambda x: strip_accents(x.lower())
    258         else:
    259             return strip_accents

AttributeError: 'list' object has no attribute 'lower'

print(descriptions)

[['\UFEFF成立于1991年蒙特利尔。dex是平价奢侈品领域的真正先驱，”，加勒比媒体与传播学院（carimac）新闻专业二年级学生萨萨娜·桑德森的客座帖子坐落在西印度群岛大学……“DaMeNETS提供了一种简单而可靠的个性化医学解决方案，更精确和DaMeNETS正在开发一种心理健康护理的第一诊断工具，这将使临床医生获得更多的……”DIDACTE授权培训供应商。在几分钟内，“，”di-o-matic：发现你最喜欢的cg角色背后的技术“，”proship facilite le Recrument des talents and la prét du preparation du come dués qui blouira vos invités，tout en assurant la gestion la total de la problitémes administratifis and contracuels.”，“，”dk spec spec participe au congress de montés de montéal sur le bois，du 20 au 22 mars，au fairmont le reineélisabeth.venez Neus rencontrer！”，“do networks limited致力于为具有丰富光通信线路专业知识的企业提供一站式服务，do networks limited整个团队致力于……”，“道格拉斯顾问公司（douglas consultants inc.）于1999年在《建筑设计》杂志上发表了一篇关于道格拉斯顾问公司的文章。“cooper目前担任dream、dream office reit、dream global reit、dream industrial reit和e-l financial corporation limited的董事会成员。p.jane gavan女士是dream的资产管理总裁，在房地产行业拥有30多年的经验。”，“dromadaire géo-innovations是一家公司，环境服务、企业利用géo-Localization pour faire l…”“电子产品专家委员会，魁北克省自动化和控制委员会，电子产品专家委员会”，邓迪可持续技术（dst）从事环保技术的开发和商业化，用于处理中的材料。您的文档是

列表，而不是字符串
因此，他们没有标记器的正确格式。
你能试着打印说明
并给我们看一下样本吗？@UpasanaMittal绝对，我已经编辑了帖子…结果是打印的一部分（说明）。完整输出太长，无法发布。看起来这是列表而不是文本列表，tfidf需要文本列表。我基本上有7个文档，在文章开头发布，每个文档有100行文本。如果您将该文档的行转换为段落，可以吗？